Hands-On Concurrency with Rust
上QQ阅读APP看书,第一时间看更新

The CPU

The CPU is a device that interprets a stream of instructions, manipulating storage and other devices connected to it in the process. The simplest model of a CPU, and the one that is often introduced first in undergrad computer science, is that a CPU receives an instruction from some nebulous place, performs the interpretation, receives the next instruction, interprets that, and so forth. The CPU maintains an internal pace via an oscillator circuit, and all instructions take a defined number of oscillator pulses—or clock cycles—to execute. In some CPU models, every instruction will take the same number of clock cycles to execute, and in others the cycle count of instructions will vary. Some CPU instructions modify registers, have very low latency with specialized but exceedingly finite memory locations built into the CPU. Other CPU instructions modify the main memory, RAM. Other instructions move or copy information between registers and RAM or vice versa. RAM—whose read/write latency is much higher than that of registers but is much more plentiful—and other storage devices, such as SSDs, are connected to the CPU by specialized buses.

The exact nature of these buses, their bandwidth and transmission latency, varies between machine architectures. On some systems, every location in the RAM is addressable—meaning that it can be read or written to—in constant time from the available CPUs. In other systems, this is not the case—some RAM is CPU-local and some is CPU-remote. Some instructions control special hardware interrupts that cause memory ranges to be written to other bus-connected storage devices. Mostly, these devices are exceedingly slow, compared to the RAM, which is itself slow compared to the registers.

All of this is to explain that in the simplest model of a CPU, where instructions are executed serially, instructions may well end up stalling for many clock cycles while waiting for reads or writes to be executed. To that end, it's important to understand that almost all CPUs—and especially the CPUs we'll concern ourselves with in this book—perform out-of-order executions of their instructions. Just so long as a CPU can prove that two sequences of instructions access the memory distinctly—that is, they do not interfere with each other—then the CPU is free and will probably reorder instructions. This is one of the things that makes C's undefined behavior concerning uninitialized memory so interesting. Perhaps your program's future has already filled in the memory, or perhaps not. Out-of-order execution makes reasoning about a processor's behavior difficult, but its benefit is that CPUs can execute programs much faster by deferring a sequence of instructions that is stalled on some kind of memory access.

In the same spirit, most modern CPUs—and especially the CPUs we'll concern ourselves with in this book—perform branch prediction. Say that branches in our programs tend to branch the same way at execution time—for example, say we have a feature-flag test that is configured to be enabled for the lifetime of a program. CPUs that perform branch prediction will speculatively execute one side of a branch that tends to branch in a certain way while they wait on other stalled instruction pipelines' instructions. When the branch instruction sequence catches up to its branch test, and if the test goes the predicted way, there's already been a great deal of work done and the instruction sequence can skip well ahead. Unfortunately, if the branch was mispredicted, then all this prior work must be torn down and thrown away, and the correct branch will have to be computed, which is quite expensive. It's for this reason that you'll find that programmers who worry about the nitty-gritty performance characteristics of their programs will tend to fret about branches and try to remove them.

All of this is quite power-hungry, reordering instructions to avoid stalls or racing ahead to perform computations that may be thrown away. Power-hungry implies hot, which implies cooling, which implies more power expenditure. All of which is not necessarily great for the sustainability of technological civilization, depending on how the electricity for all this is generated and where the waste heat is dumped. To that end, many modern CPUs integrate some kind of power-scaling features. Some CPUs will lengthen the time between their clock pulses, meaning they execute fewer instructions in a certain span of time than they might otherwise have. Other CPUs race ahead as fast as they normally would and then shut themselves off for a spell, drawing minimal electricity and cooling in the meantime. The exact method by which to build and run a power-efficient CPU is well beyond the scope of this book. What's important to understand is that, as a result of all this, your program's execution speed will vary from run to run, all other things being equal, as the CPU decides whether or not it's time to save power. We'll see this in the next chapter when we manually set power-saving settings.