Shikrapur Call Girls Most Awaited Fun 6297143586 High Profiles young Beautie...
Cache memory
1. Cache Memory
Abstract:-This research paper investigatesabout cache
memory and its organizationand optimization.Theearly
beginningpart of the paper makes you familiarwith the term
cache. Further ahead, the paper covers the importance of
cache optimization in microprocessors and cache
organisationin Intel Nehalem Computer Architecture.
Introduction:-The use of cache memories are so
pervasive in today’scomputer systems it is difficult to
imagine processors without them. The active portionsof the
program and data are placed in a fast small memory access
time can be reduced, thus reducing the total execution time
of the program. Such a fast small memory is referred to as a
cache memory. The CPU uses cache memory to store
instructionsthat are repeatedly required to run programs,
improving overall system speed. The cache is the fastest
component in memory hierarchy and approachesthe speed
of CPU components.Cache built into the CPU itself is referred
to as Level 1 (L1) cache. Cache that resides on a separate chip
next to the CPU is called Level 2 (L2) cache. Some CPUs have
both L1 and L2 cache built-inand designate the separate
cache chip as Level 3 (L3) cache.
2. When an applicationstarts or data is to be read/written or
any operation is to be performed then the data and
commands associated with the specific operationare shifted
from a slow moving storage device (magnetic device - hard
disk, opticaldevice – CD drive etc.) to a faster device. This
faster device is RAM – Random Access Memory. ThisRAM is
type of DRAM (Dynamic Random Access Memory). RAM is
placed here because it is a faster device, and whenever data/
commands/instructionsare needed by Processor, they
provide them at a faster rate than slow storage devices. They
serve as a cache memory for the storage devices. Although
they are much faster than slow storage device but the
processor processes at much a faster pace and they are not
able to provide the needed data/instructionsat that rate. So
there is need of a device that is faster than RAM which could
keep up with the speed of processor needs. Therefore the
3. data required is transmitted to the next level of fast memory,
which is known as CACHE memory. CACHE is also a type of
RAM, but it is Static RAM – SRAM. SRAM are faster and
costlier then DRAM because it has flip-flops(6 transistors) to
store data unlikeDRAM which uses 1 transistor and capacitor
to store data in form of charge. Moreover they need not be
refreshed periodically(because of bistable latching circuitry)
unlike DRAM making it faster.
Organization:-Cache row entries usuallyhave the
following structure
tag data block flag bits
The data block (cache line) containsthe actual data fetched
from the main memory. The tag contains(part of) the
address of the actualdata fetched from the main memory.
The flag bits are discussed below.
4. The "size" of the cache is the amount of main memory data it
can hold. Thissize can be calculatedas the number of bytes
stored in each data block times the number of blocks stored
in the cache. (The number of tag and flag bits is irrelevant to
this calculation,althoughit does affect the physical area of a
cache.)
An effective memory address is split (MSB to LSB) into the
tag, the index and the block offset.
tag Index block offset
The indexdescribes which cache row (which cache line) that
the data has been put in. The indexlength is bits
for r cache rows. The block offset specifies the desired data
within the stored data block within the cache row. Typically
the effective address is in bytes, so the block offset length
is bits, where b is the number of bytes per data block.
The tag containsthe most significant bits of the address,
which are checked against the current row (the row has been
retrieved by index) to see if it is the one we need or another,
irrelevant memory locationthat happenedto have the same
index bits as the one we want. The tag length in bits is
address_length - index_length - block_offset_length.
Some authors refer to the block offset as simply the "offset"
or the "displacement".
Nehalem Architecture:-
The predecessor to Nehalem,Intel’s Core architecture, made
use of multiplecores on a single die to improve performance
over traditionalsingle-core architectures. But as more cores
and processors were added to a high-performance system,
some serious weaknesses and bandwidthbottlenecks began
5. to appear. After the initialgeneration of dual-core Core
processors, Intel began a Core 2 series processor which was
not much more than using two or more pairs of dual-core
dies. The cores communicatedvia system memory which
caused large delaysdue to limitedbandwidthon the
processor bus. Adding more cores increased the burden on
the processor and memory buses, which diminishedthe
performance gains that couldbe possible with more cores.
Optimization:-Data access optimizationsare code
transformations which change the order in which iterations
in a loop nest are executed. The goal of these
transformations is mainly to improve temporal locality.
Moreover, they can also expose parallelismand make loop
iterationsvectorizable. Note that the data access
optimizationswe present in this section maintainall data
dependenciesand do not change the results of the numerical
computations1.Usually,it is difficult to decide which
combinationof transformations must be appliedin order to
achieve a maximum performance gain.
A number of cache optimizationtechniques that were
implemented in single core processors were successfully
implemented in multi core processors. Multi-levelcache with
the current structure of two-level has been implemented
since the very first multi core processor visualized in (Fig.1).
In this configuration,the first-level cache is private to each
core and coherence is maintainedbetween them with MESI
or MOESI protocols (Villa,F.J., et al., 2005). The second-level
cache has been implemented with different design optionsin
6. variousarchitectures. In general, the second-level cache is
shared among all cores with a number of optimizationsto be
discussed in this section.
Conclusion:-In this paper we came to know about Cache
memories play a significant role in improving the
performance of today’scomputer systems. Numerous
techniques for the use of caches and for maintaining
coherent data have been proposed and implemented in
commercial SMP systems. Then we gave a glance on
organisationof cache memory in Nehalemarchitecture.
Several of the basic optimizationtechniquescan
automaticallybe introduced by optimizingcompilers, most of
the tuning effort is left to the programmer. Further the
research is going on how to use cache in significant way. The
research on cache memories is very active. Furthermore
7. research is currently active on the use of cache memories in
fully distributed systems, includingweb-based computing.
References:-