Memory Bandwidth Explained
Author: Peter Rundberg.
Posted : 2001-03-07 00:00:00.0
Category : Hardware
Cache memory
Before we can journey in to the depth of memory bandwidth, it is essential that you understand the basics of caches.
A modern microprocessor use caches to help reduce the actual latency of memory operations. By keeping often used data and instructions in small fast memories close to the processor core, the cache memory, the effective latency is drastically reduced.
Caches work on the principle of locality. Two types of locality exist, the temporal locality and the spatial locality. Temporal locality is the fact that if a program has used a piece of data or instruction, it is likely that it will use the same data soon again. This is clear for instructions if you consider program structures such as loops. Spatial locality is the fact that if a program uses a piece of data or instruction, it is likely to use data or instructions close to the previous access soon. For data this can be thought of as consecutive accesses to an array of elements, and for instructions it occurs because of the sequential nature of programs.
When data that is needed is not present in the cache, a cache miss occurs. The processor have to go to the slow main memory, fetch the data there and load it into the cache. During this time the processor must sit idle and wait for the new data to arrive. This is not entirely true for modern microprocessors with out of order execution, but in principle that is how it works. When the new data arrives it comes in a chunk called a cache block, or a cache line. This block contains what was accessed but also the data close to what was accessed so that we can benefit from spatial locality.
Another important aspect of cache memories is the write policy. Most modern microprocessors use write-back, write-allocate caches. Write-back means that when the processor performs a write, that write will only take place in the cache and not in the memory. When the block that was written is evicted from the cache because some other block needs that space, the modified block must first be written to memory. Write-allocate means that if a write miss in the cache, the block is first loaded from memory to the cache, and then the write occurs in the cache. The result of this is that a simple memory copy operation does not take two main memory transfers as one could imagine, one from the old location to the cache and one to the new location from the cache. No, in fact a memory copy operation results in three main memory transfers, if it is not done in some clever way.
First you read the data from the memory into the cache, then you write the data into a new location. With write allocate, this write will first read the old contents this memory location into the cache, only to have it totally overwritten by the memory copy operation. This new data will then be written to the main memory when the modified cache block is evicted from the cache. Thus a memory copy involves three main memory operations.
This was not meant to be a course in cache memories, it was only meant to be a quick intro into a very important part of the memory system. Now we are ready to look at the memory access.

