Get new post automatically.

Enter your email address:


The Memory Hierarchy in Modern Processors

All the figures in this section are typical of
commercially available high-performance processors in September, 1996.
There is a distinct possibility that they will be somewhat out of date
by the time you are reading this (even if it is only October, 1996!).
Even in 1996, some very large machines were larger or faster
than the figures below would indicate.
One of the most important considerations in understanding the performance capabilities of a modern processor is the memory hierarchy. We can classify memory based on its "distance" from the processor: here distance is measured by the number of machine cycles required to access it. As memory becomes further away from the main processor (ie becomes slower to access) the number of words in a typical system increases. Some indicative numbers for 1996 processors would be:
NameAccess Time
(cycles)
Number of words
Register132
Cache
Level 1
216x103
Cache
Level 2
50.25x106
Main memory30108
Disc106109
In 1996, high performance processors had clock frequencies of of 200-400 MHz or cycle times of 2.5-5.0 nanoseconds.

Registers

Registers are a core part of the processor itself. A RISC processor performs all operations except loads and stores on operands stored in the registers. In a typical RISC processor, there will be 32 32-bit integer registers and 32 64-bit floating point registers. (True 64-bit machines with 64-bit registers are starting to appear.)

 

Cache

Cache memory sits between the processor and main memory. It stores recently accessed memory words "closer" to the processor than the main memory. Cache is transparent to a programmer (or compiler writer!): cache hardware will intercept all memory requests from the processor and satisfy the request from the cache if possible - otherwise it is forwarded to the main memory. Many high performance systems will have a number of levels of cache: a small level 1 cache "close" the processor (typically needing 2 cycles to access) and as many as 2 more levels of successively lower and larger caches built from high performance (but expensive!) memory chips.
For state-of-the-art (2 cycle access) performance, a processor needs to have the level 1 cache on the same die as the processor itself. Sizes of 16 Kwords (64 Kbytes, often separated into instruction and data cache) were common.
The bus between the cache and main memory is a significant bottleneck: system designers usually organise the cache as a set of "lines" of, typically, 8 words. A whole line of 8 contiguous memory locations will be fetched into the cache each time it is updated. (8 words will be fetched in a 4-cycle burst - 2 words in each cycle on a 64-bit bus.) This means that when one memory word is fetched into the cache, 7 of its neighbours will also be fetched. A program which is able to use this factor (by, for instance, keeping closely related data in contiguous memory locations) will make more effective use of the cache and see a reduced effective memory access time.
At least one processor (DEC's Alpha) has a larger level 2 cache on the processor die. Other systems place the level 2 cache on a separate die within the same physical package (such packages are sometimes referred to as multi-chip modules).

 

Level x Cache

Systems with more than 1 Mbyte of Level 2 cache - built from fast, but expensive and less dense, static RAM (SRAM) memory devices are becoming common. The SRAM memories have access times of 10 ns (or slightly less), but the total access time in a system would be of the order of 5 cycles or more.

 

Main memory

High density dynamic RAM (DRAM) technology provides the cheapest and densest semiconductor memory available. Chip access times are about 60 ns, but system access times will be 25-30 cycles (and perhaps more for high clock frequency systems). DRAM manufacturers have tended to concentrate on increasing capacity rather than increasing speed, so that access times measured in processor cycles are increasing as processor clock speeds increase faster than DRAM access times decrease. However DRAM capacities are increasing at similar rates to processor clock speeds. 

 

Disc

Some forecasters have been suggesting that magnetic discs will be obsolete soon for some years now. Although the cost and density gap between DRAM and disc memory has been narrowing, some heroic efforts by disc manufacturers have seen disc capacities increase and prices drop, so that the point at which magnetic disc becomes obsolete is still a way off. Discs with 4 GByte (109 words) are commonplace and access times are of the order of 10 ms or 106 processor cycles. The large gap between access times (a factor of 104) for the last two levels of the hierarchy is probably one of the factor that is driving DRAM research and development towards higher density rather than higher speed. However work on cache-DRAMs and synchronous DRAM is pushing its access time down.