3.1. Motivation for Caches

Modern processors can run at clock speeds of several GHz and are able to execute several instructions per clock cycle. This means that a processor may have a peak execution speed of several instructions per nano-second. For example, a 3 GHz processor capable of executing 3 instructions per cycle has a peak execution speed of 9 instructions per ns.

Modern RAM memories on the other hand are quite slow. An access to RAM memory takes 50 ns or more, causing the processor to stall waiting for the data to arrive. This makes RAM accesses one of the slowest operations a processor can perform. For example, a processor capable of executing 9 instructions per ns could have executed up to 450 instructions in the time it takes to perform a single RAM access with a latency of 50 ns.

The time it takes to load the data from memory is called the latency of the memory operation. It is usually measured in processor clock cycles or ns.

Since memory accesses are very common in programs and can account for more than 25% of the instructions, memory access latencies would have a devastating impact on processor execution speed if they could not be avoided in some way.

To solve this problem computer designers have introduced cache memories, which are small, but extremely fast, memories between the processor and the slow main memory. Frequently used data is automatically copied to the cache memories. This allows well written programs to make most of their memory accesses to the fast cache memory and only rarely make accesses to the slow main memory.

Often a computer does not just use a single cache, but a hierarchy of caches of increasing size and decreasing speed. For example, it may have a 64 kilobyte cache with a latency of 3 cycles for the most frequently accessed data, and a 1 megabyte cache with a latency of 15 cycles for less frequently accessed data. The caches in such a configuration are called the first level cache (the 64 kilobyte cache) and the second level cache (the 1 megabyte cache), or shorter the L1 and L2 caches. Some computers may also also have an additional third cache level, the L3 cache.