SlideShare a Scribd company logo
1 of 59
ECE 4100/6100
Advanced Computer Architecture
Lecture 9 Memory Hierarchy Design (I)
Prof. Hsien-Hsin Sean Lee
School of Electrical and Computer Engineering
Georgia Institute of Technology
Why Care About Memory Hierarchy?
Processor
60%/year
(2X/1.5 years)
DRAM
9%/year
(2X/10 years)
1
10
100
1000
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
DRAM
CPU
1982
Performance
Time
“Moore’s Law”
Processor-DRAM Performance Gap grows 50% / year
An Unbalanced System
Source: Bob Colwell keynote ISCA’29 2002
Memory Issues
• Latency
– Time to move through the longest circuit path
(from the start of request to the response)
• Bandwidth
– Number of bits transported at one time
• Capacity
– Size of memory
• Energy
– Cost of accessing memory (to read and write)
Model of Memory Hierarchy
RegReg
FileFile
L1L1
Data cacheData cache
L1L1
Inst cacheInst cache
L2L2
CacheCache
MainMain
MemoryMemory
DISKDISK
SRAMSRAM DRAMDRAM
Levels of the Memory Hierarchy
CPU Registers
100s Bytes
<10 ns
Cache
K Bytes
10-100 ns
1-0.1 cents/bit
Main Memory
M Bytes
200ns- 500ns
$.0001-.00001 cents /bit
Disk
G Bytes, 10 ms
(10,000,000 ns)
10 - 10 cents/bit
-5 -6
Capacity
Access Time
Cost
Tape
infinite
sec-min
10 -8
Registers
Cache
Memory
Disk
Tape
Instr. Operands
Cache Lines
Pages
Files
Staging
Transfer Unit
Compiler
1-8 bytes
Cache controller
8-128 bytes
Operating system
512-4K bytes
User
Mbytes
Upper Level
Lower Level
faster
Larger
This Lecture
Topics covered
• Why do caches work
– Principle of program locality
• Cache hierarchy
– Average memory access time (AMAT)
• Types of caches
– Direct mapped
– Set-associative
– Fully associative
• Cache policies
– Write back vs. write through
– Write allocate vs. No write allocate
Principle of Locality
• Programs access a relatively small portion of
address space at any instant of time.
• Two Types of Locality:
– Temporal Locality (Locality in Time): If an address is
referenced, it tends to be referenced again
• e.g., loops, reuse
– Spatial Locality (Locality in Space): If an address is
referenced, neighboring addresses tend to be referenced
• e.g., straightline code, array access
• Traditionally, HW has relied on locality for speed
Locality is a program property that is exploited in machine design.
Example of Locality
int A[100], B[100], C[100], D;
for (i=0; i<100; i++) {
C[i] = A[i] * B[i] + D;
}
A[0]A[1]A[2]A[3]A[5]A[6]A[7] A[4]
A[96]A[97]A[98]A[99]B[1]B[2]B[3] B[0]
. . . . . . . . . . . . . .
B[5]B[6]B[7] B[4]B[9]B[10]B[11] B[8]
C[0]C[1]C[2]C[3]C[5]C[6]C[7] C[4]
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
C[96]C[97]C[98]C[99]D
A Cache Line (One fetch)
Modern Memory Hierarchy
• By taking advantage of the principle of locality:
– Present the user with as much memory as is available in
the cheapest technology.
– Provide access at the speed offered by the fastest
technology.
Control
Datapath
Secondary
Storage
(Disk)
Processor
Registers
Main
Memory
(DRAM)
Second
Level
Cache
(SRAM)
L1D
Cache
Tertiary
Storage
(Disk/Tape)
Third
Level
Cache
(SRAM)
L1I
Cache
Example: Intel Core2 Duo
L2 Cache
Core0 Core1L1 32 KB, 8-Way, 64
Byte/Line, LRU, WB
3 Cycle Latency
L2 4.0 MB, 16-Way, 64
Byte/Line, LRU, WB
14 Cycle Latency
Source: http://www.sandpile.org
DL1 DL1
IL1 IL1
Example : Intel Itanium 2
3MB
Version
180nm
421 mm2
6MB
Version
130nm
374 mm2
Intel Nehalem
3MB 3MB
3MB3MB
3MB 3MB
3MB3MB
24MB L3
Core 0
Core 1
Core 0
Example : STI Cell Processor
SPE = 21M transistors (14M array; 7M logic)
Local Storage
Cell Synergistic Processing Element
Each SPE contains 128 x128 bit registers,
256KB, 1-port, ECC-protected local SRAM (Not cache)
Cache Terminology
• Hit: data appears in some block
– Hit Rate: the fraction of memory accesses found in the level
– Hit Time: Time to access the level (consists of RAM access time +
Time to determine hit)
• Miss: data needs to be retrieved from a block in the lower level (e.g.,
Block Y)
– Miss Rate = 1 - (Hit Rate)
– Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block to the processor
• Hit Time << Miss Penalty
Lower Level
MemoryUpper Level
Memory
To Processor
From Processor Blk X
Blk Y
Average Memory Access Time
• Average memory-access time
= Hit time + Miss rate x Miss penalty
• Miss penalty: time to fetch a block from lower
memory level
– access time: function of latency
– transfer time: function of bandwidth b/w levels
•Transfer one “cache line/block” at a time
•Transfer at the size of the memory-bus width
Memory Hierarchy Performance
• Average Memory Access Time (AMAT)
= Hit Time + Miss rate * Miss Penalty
= Thit(L1) + Miss%(L1) * T(memory)
• Example:
– Cache Hit = 1 cycle
– Miss rate = 10% = 0.1
– Miss penalty = 300 cycles
– AMAT = 1 + 0.1 * 300 = 31 cycles
• Can we improve it?
Main
Memory
(DRAM)
First-level
Cache
Hit Time
Miss % * Miss penalty
1 clk 300 clks
Reducing Penalty: Multi-Level Cache
Average Memory Access Time (AMAT)
= Thit(L1) + Miss%(L1)* (Thit(L2) + Miss%(L2)* (Thit(L3) + Miss%
(L3)*T(memory) ) )
Main
Memory
(DRAM)
Second
Level
Cache
First-level
Cache
Third
Level
Cache
1 clk 300 clks20 clks10 clks
On-die
L1
L2
L3
AMAT of multi-level memory
= Thit(L1) + Miss%(L1)* Tmiss(L1)
= Thit(L1) + Miss%(L1)* { Thit(L2) + Miss%(L2)*
(Tmiss(L2) }
= Thit(L1) + Miss%(L1)* { Thit(L2) + Miss%(L2)*
(Tmiss(L2) }
= Thit(L1) + Miss%(L1)* { Thit(L2) + Miss%(L2) *
[ Thit(L3) + Miss%(L3) * T(memory) ] }
AMAT Example
= Thit(L1) + Miss%(L1)* (Thit(L2) + Miss%(L2)* (Thit(L3)
+ Miss%(L3)*T(memory) ) )
• Example:
– Miss rate L1=10%, Thit(L1) = 1 cycle
– Miss rate L2=5%, Thit(L2) = 10 cycles
– Miss rate L3=1%, Thit(L3) = 20 cycles
– T(memory) = 300 cycles
• AMAT = ?
– 2.115 (compare to 31 with no multi-levels)
14.7x speed-up!
Types of Caches
Type of
cache
Mapping of data from
memory to cache
Complexity of searching
the cache
Direct
mapped
(DM)
A memory value can be
placed at a single
corresponding
location in the cache
Fast indexing
mechanism
Set-
associative
(SA)
A memory value can be
placed in any of a set
of locations in the
cache
Slightly more involved
search mechanism
Fully-
associative
(FA)
A memory value can be
placed in any location
in the cache
Extensive hardware
resources required to
search (CAM)
•DM and FA can be thought
as special cases of SA
•DM  1-way SA
•FA  All-way SA
0xF011111
11111 0xAA
0x0F00000
00000 0x55
Direct Mapping
0
1
000001
0
1
0
10x0F
00000 0x55
11111 0xAA
0xF011111
Tag Index Data
Direct mapping:
A memory value can only be placed
at a single corresponding location
in the cache
0000000000
11111
Set Associative Mapping (2-Way)
0
10x0F
0x55
0xAA
0xF0
Tag Index Data
0
1
0
0
1
Set-associative mapping:
A memory value can be placed in
any location of a set in the cache
Way 0 Way 1
0000 00000 0 0x55
0000 10000 1 0x0F
1111 01111 0 0xAA
1111 11111 1 0xF0
0xF01111
1111 0xAA
0x0F0000
0000 0x55
Fully Associative Mapping
0x0F
0x55
0xAA
0xF0
Tag
Data
000110
000001
000000
111110
111111 0xF01111
1111 0xAA
0x0F0000
0000 0x55
0x0F
0x55
0xAA
0xF0
000110
000001
000000
111110
111111
Fully-associative mapping:
A memory value can be placed
anywhere in the cache
Direct Mapped Cache
Memory
DM Cache
Address
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
Cache Index
0
1
2
3
• Cache location 0 is occupied by data from:
– Memory locations 0, 4, 8, and C
• Which one should we place in the cache?
• How can we tell which one is in the cache?
A Cache Line (or Block)
Three (or Four) Cs (Cache Miss Terms)
• Compulsory Misses:
– cold start misses (Caches do not
have valid data at the start of the
program)
• Capacity Misses:
– Increase cache size
• Conflict Misses:
– Increase cache size and/or
associativity.
– Associative caches reduce conflict
misses
• Coherence Misses:
– In multiprocessor systems (later
lectures…)
Processor Cache
0x1234
0x5678
0x91B1
0x1111
Processor Cache
0x1234
0x5678
0x91B1
0x1111
Processor Cache
0x1234
Example: 1KB DM Cache, 32-byte Lines
• The lowest M bits are the Offset (Line Size = 2M
)
• Index = log2 (# of sets)
Index
0
1
2
3
:
Cache Data
Byte 0
0431
:
Tag
Ex: 0x01
Valid Bit
:
31
Byte 1Byte 31
:
Byte 32Byte 33Byte 63
:
Byte 992Byte 1023
:
Cache Tag
Offset
Ex: 0x00
9
#ofset
Address
Example of Caches
• Given a 2MB, direct-mapped physical caches, line size=64bytes
• Support up to 52-bit physical address
• Tag size?
• Now change it to 16-way, Tag size?
• How about if it’s fully associative, Tag size?
Example: 1KB DM Cache, 32-byte Lines
• lw from 0x77FF1C68
77FF1C68 = 0111 0111 1111 1111 0001 1100 0101 1000
DM Cache
Tag array Data array
Tag Index Offset
2
24
25
26
27
DM Cache Speed Advantage
• Tag and data access happen in parallel
– Faster cache access!
Index
Tag Index Offset
Tag array Data array
Associative Caches Reduce Conflict Misses
• Set associative (SA) cache
– multiple possible locations in a set
• Fully associative (FA) cache
– any location in the cache
• Hardware and speed overhead
– Comparators
– Multiplexors
– Data selection only after Hit/Miss
determination (i.e., after tag comparison)
Set Associative Cache (2-way)
• Cache index selects a “set” from the cache
• The two tags in the set are compared in parallel
• Data is selected based on the tag result
Cache Data
Cache Line 0
Cache TagValid
:: :
Cache Data
Cache Line 0
Cache Tag Valid
: ::
Cache Index
Mux 01Sel1 Sel0
Cache Line
Compare
Adr Tag
Compare
OR
Hit
•Additional circuitry as compared to DM caches
•Makes SA caches slower to access than DM of
comparable size
Set-Associative Cache (2-way)
• 32 bit address
• lw from 0x77FF1C78
Tag array1Data array1
Tag Index offset
Tag array0 Data aray0
Fully Associative Cache
tag offset
Multiplexor
Associative
Search
Tag
=
=
=
=
Data
Rotate and Mask
Fully Associative Cache
Tag Data
compare
Tag Data
compare
Tag Data
compare
Tag Data
compare
Address
Write Data
Read Data
Tag offset
Additional circuitry as compared to DM caches
More extensive than SA caches
Makes FA caches slower to access than either DM
or SA of comparable size
Cache Write Policy
• Write through -The value is written to both the cache
line and to the lower-level memory.
• Write back - The value is written only to the cache
line. The modified cache line is written to main
memory only when it has to be replaced.
– Is the cache line clean (holds the same value as
memory) or dirty (holds a different value than
memory)?
0x12340x1234
Write-through Policy
0x1234
Processor Cache
Memory
0x1234
0x56780x5678
Write Buffer
– Processor: writes data into the cache and the write buffer
– Memory controller: writes contents of the buffer to memory
• Write buffer is a FIFO structure:
– Typically 4 to 8 entries
– Desirable: Occurrence of Writes << DRAM write cycles
• Memory system designer’s nightmare:
– Write buffer saturation (i.e., Writes  DRAM write cycles)
Processor
Cache
Write Buffer
DRAM
0x12340x1234
Writeback Policy
0x1234
Processor Cache
Memory
0x1234
0x5678
0x56780x5678
0x9ABC
?????
Write miss
On Write Miss
• Write allocate
– The line is allocated on a write miss, followed by
the write hit actions above.
– Write misses first act like read misses
• No write allocate
– Write misses do not interfere cache
– Line is only modified in the lower level memory
– Mostly use with write-through cache
Quick recap
• Processor-memory performance gap
• Memory hierarchy exploits program locality to
reduce AMAT
• Types of Caches
– Direct mapped
– Set associative
– Fully associative
• Cache policies
– Write through vs. Write back
– Write allocate vs. No write allocate
Cache Replacement Policy
• Random
– Replace a randomly chosen line
• FIFO
– Replace the oldest line
• LRU (Least Recently Used)
– Replace the least recently used line
• NRU (Not Recently Used)
– Replace one of the lines that is not recently used
– In Itanium2 L1 Dcache, L2 and L3 caches
LRU Policy
AA BB CC DD
MRU LRULRU+1MRU-1
Access C
CC AA BB DD
Access D
DD CC AA BB
Access E
EE DD CC AA
Access C
CC EE DD AA
Access G
GG CC EE DD
MISS, replacement
needed
MISS, replacement
needed
LRU From Hardware Perspective
AA BB CC DD
Way0Way1Way2Way3 StateState
machinemachine
LRU
Access
update
Access D
LRU policy increases cache access times
Additional hardware bits needed for LRU state machine
LRU Algorithms
• True LRU
– Expensive in terms of speed and hardware
– Need to remember the order in which all N lines
were last accessed
– N! scenarios – O(log N!) ≈ O(N log N)O(N log N) LRU bits
•2-ways  AB BA = 2 = 2!
•3-ways  ABC ACB BAC BCA CAB CBA = 6 = 3!
• Pseudo LRU: O(N)O(N)
– Approximates LRU policy with a binary tree
Pseudo LRU Algorithm (4-way SA)
AB/CD bit (AB/CD bit (L0L0))
A/B bit (A/B bit (L1L1)) C/D bit (C/D bit (L2L2))
Way AWay A Way BWay B Way CWay C Way DWay D
AA BB CC DD
Way0Way1Way2Way3
• Tree-based
• O(N): 3 bits for 4-way
• Cache ways are the
leaves of the tree
• Combine ways as we
proceed towards the root
of the tree
Pseudo LRU Algorithm
L2L2 L1L1 L0L0 Way to replaceWay to replace
X 0 0 Way A
X 1 0 Way B
0 X 1 Way C
1 X 1 Way D
Way hitWay hit L2L2 L1L1 L0L0
Way A --- 1 1
Way B --- 0 1
Way C 1 --- 0
Way D 0 --- 0
LRU update algorithmLRU update algorithm Replacement DecisionReplacement Decision
AB/CD bit (AB/CD bit (L0L0))
A/B bit (A/B bit (L1L1)) C/D bit (C/D bit (L2L2))
Way AWay A Way BWay B Way CWay C Way DWay D
AB/CDAB/CDABABCDCD AB/CDAB/CDABABCDCD
• Less hardware than LRU
• Faster than LRU
•L2L1L0 = 000,
there is a hit in Way B,
what is the new
updated L2L1L0?
•L2L1L0 = 001,
a way needs to be
replaced, which way
would be chosen?
Not Recently Used (NRU)
• Use R(eferenced) and M(odified) bits
– 0 (not referenced or not modified)
– 1 (referenced or modified)
• Classify lines into
– C0: R=0, M=0
– C1: R=0, M=1
– C2: R=1, M=0
– C3: R=1, M=1
• Chose the victim from the lowest class
– (C3 > C2 > C1 > C0)
• Periodically clear R and M bits
Reducing Miss Rate
• Enlarge Cache
• If cache size is fixed
– Increase associativity
– Increase line size
2 5 6
4 0 %
3 5 %
3 0 %
2 5 %
2 0 %
1 5 %
1 0 %
5 %
0 %
Missrate
6 41 64
B lo c k s iz e ( b y te s )
1 K B
8 K B
1 6 K B
6 4 K B
2 5 6 K B
•Does this always work?
Increasing cache pollution
Reduce Miss Rate/Penalty: Way Prediction
• Best of both worlds: Speed as that of a DM cache
and reduced conflict misses as that of a SA cache
• Extra bits predict the way of the next access
• Alpha 21264 Way Prediction (next line predictor)
– If correct, 1-cycle I-cache latency
– If incorrect, 2-cycle latency from I-cache
fetch/branch predictor
– Branch predictor can override the decision of the
way predictor
Alpha 21264 Way Prediction
(2-way)
(offset)
Note: Alpha advocates to align the branch targets on octaword (16 bytes)
Reduce Miss Rate: Code Optimization
• Misses occur if sequentially accessed array
elements come from different cache lines
• Code optimizations  No hardware change
– Rely on programmers or compilers
• Examples:
– Loop interchange
• In nested loops: outer loop becomes inner loop and vice versa
– Loop blocking
• partition large array into smaller blocks, thus fitting the accessed
array elements into cache size
• enhances cache reuse
j=0
i=0
Loop Interchange
/* Before */
for (j=0; j<100; j++)
for (i=0; i<5000; i++)
x[i][j] = 2*x[i][j]
/* After */
for (i=0; i<5000; i++)
for (j=0; j<100; j++)
x[i][j] = 2*x[i][j]
j=0
i=0
Improved cache efficiency
Row-major ordering
Is this always safe transformation?
Does this always lead to higher efficiency?
What is the worst that could happen?
Hint: DM cache
Loop Blocking
/* Before */
for (i=0; i<N; i++)
for (j=0; j<N; j++) {
r=0;
for (k=0; k<N; k++)
r += y[i][k]*z[k][j];
x[i][j] = r;
}
i
k
k
j
y[i][k]y[i][k] z[k][j]z[k][j]
i
X[i][j]X[i][j]
Does not exploit localityDoes not exploit locality
Loop Blocking
i
k
k
j
y[i][k]y[i][k] z[k][j]z[k][j]
i
j
X[i][j]X[i][j]
•Partition the loop’s iteration space into many smaller chunks
•Ensure that the data stays in the cache until it is reused
Other Miss Penalty Reduction Techniques
• Critical value first and Restart early
– Send requested data in the leading edge transfer
– Trailing edge transfer continues in the background
• Give priority to read misses over writes
– Use write buffer (WT) and writeback buffer (WB)
• Combining writes
– combining write buffer
– Intel’s WC (write-combining) memory type
• Victim caches
• Assist caches
• Non-blocking caches
• Data Prefetch mechanism
Write Combining Buffer
For WC buffer, combine neighbor addresses
100100
108108
116116
124124
11
11
11
11
Mem[100]Mem[100]
Mem[108]Mem[108]
Mem[116]Mem[116]
Mem[124]Mem[124]
VWr. addr
00
00
00
00
V
00
00
00
00
V
00
00
00
00
V
100100 11
00
00
00
Mem[100]Mem[100]
VWr. addr
11
V
00
00
00
Mem[108]Mem[108] 11
00
00
00
V
Mem[116]Mem[116] 11
00
00
00
Mem[124]Mem[124]
V
• Need to initiate 4
separate writes
back to lower level
memory
• One single write
back to lower
level memory
WC memory type
• Intel 32 (starting in P6) supports USWC (or WC) memory
type
– Uncacheable, speculative Write Combining
– Expensive (in terms of time) for individual write
– Combine several individual writes into a bursty write
– Effective for video memory data
•Algorithm writing 1 byte at a time
•Combine 32 of 1-byte data into one 32-byte write
•Ordering is not important

More Related Content

What's hot

Memory map
Memory mapMemory map
Memory map
aviban
 
Project Presentation Final
Project Presentation FinalProject Presentation Final
Project Presentation Final
Dhritiman Halder
 
Ram and-rom-chips
Ram and-rom-chipsRam and-rom-chips
Ram and-rom-chips
Anuj Modi
 
05 instruction set design and architecture
05 instruction set design and architecture05 instruction set design and architecture
05 instruction set design and architecture
Waqar Jamil
 

What's hot (20)

Memoryhierarchy
MemoryhierarchyMemoryhierarchy
Memoryhierarchy
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
 
Memory map
Memory mapMemory map
Memory map
 
Project Presentation Final
Project Presentation FinalProject Presentation Final
Project Presentation Final
 
Caches microP
Caches microPCaches microP
Caches microP
 
cache memory management
cache memory managementcache memory management
cache memory management
 
Cache memory
Cache memoryCache memory
Cache memory
 
cache memory
 cache memory cache memory
cache memory
 
Cache memory
Cache memoryCache memory
Cache memory
 
cache memory
 cache memory cache memory
cache memory
 
Cachememory
CachememoryCachememory
Cachememory
 
Cache memory
Cache memoryCache memory
Cache memory
 
Homework solutionsch8
Homework solutionsch8Homework solutionsch8
Homework solutionsch8
 
Computer architecture cache memory
Computer architecture cache memoryComputer architecture cache memory
Computer architecture cache memory
 
Ram and-rom-chips
Ram and-rom-chipsRam and-rom-chips
Ram and-rom-chips
 
Cache Memory
Cache MemoryCache Memory
Cache Memory
 
05 instruction set design and architecture
05 instruction set design and architecture05 instruction set design and architecture
05 instruction set design and architecture
 
Cache memory
Cache  memoryCache  memory
Cache memory
 
04 cache memory
04 cache memory04 cache memory
04 cache memory
 
Cache Memory
Cache MemoryCache Memory
Cache Memory
 

Viewers also liked

2.3 sequantial logic circuit
2.3 sequantial logic circuit2.3 sequantial logic circuit
2.3 sequantial logic circuit
Wan Afirah
 

Viewers also liked (20)

Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
 
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
 
Lec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- Coherence
Lec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- CoherenceLec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- Coherence
Lec14 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech --- Coherence
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
 
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
 
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
 
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
 
Chapter 9 ap psych- Memory
Chapter 9 ap psych- MemoryChapter 9 ap psych- Memory
Chapter 9 ap psych- Memory
 
Semiconductor
SemiconductorSemiconductor
Semiconductor
 
B sc cs i bo-de u-iii counters & registers
B sc cs i bo-de u-iii counters & registersB sc cs i bo-de u-iii counters & registers
B sc cs i bo-de u-iii counters & registers
 
Shift Register
Shift RegisterShift Register
Shift Register
 
Digital 9 16
Digital 9 16Digital 9 16
Digital 9 16
 
digital Counter
digital Counterdigital Counter
digital Counter
 
Counter And Sequencer Design- Student
Counter And Sequencer Design- StudentCounter And Sequencer Design- Student
Counter And Sequencer Design- Student
 
14827 shift registers
14827 shift registers14827 shift registers
14827 shift registers
 
2.3 sequantial logic circuit
2.3 sequantial logic circuit2.3 sequantial logic circuit
2.3 sequantial logic circuit
 
Overview of Shift register and applications
Overview of Shift register and applicationsOverview of Shift register and applications
Overview of Shift register and applications
 
Shift Registers
Shift RegistersShift Registers
Shift Registers
 
Shift registers
Shift registersShift registers
Shift registers
 
Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...
Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...
Lec20 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Da...
 

Similar to Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1

Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
Haris456
 
Chapter 8 1 Digital Design and Computer Architecture, 2n.docx
Chapter 8 1 Digital Design and Computer Architecture, 2n.docxChapter 8 1 Digital Design and Computer Architecture, 2n.docx
Chapter 8 1 Digital Design and Computer Architecture, 2n.docx
christinemaritza
 
cache cache memory memory cache memory.pptx
cache cache memory memory cache memory.pptxcache cache memory memory cache memory.pptx
cache cache memory memory cache memory.pptx
saimawarsi
 
Memory Organizationsssssssssssssssss.ppt
Memory Organizationsssssssssssssssss.pptMemory Organizationsssssssssssssssss.ppt
Memory Organizationsssssssssssssssss.ppt
k2w9psdb96
 
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOCSOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
SnehaLatha68
 
Bringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to MahoutBringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to Mahout
sscdotopen
 

Similar to Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1 (20)

7_mem_cache.ppt
7_mem_cache.ppt7_mem_cache.ppt
7_mem_cache.ppt
 
Lecture 25
Lecture 25Lecture 25
Lecture 25
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
 
Chapter 8 1 Digital Design and Computer Architecture, 2n.docx
Chapter 8 1 Digital Design and Computer Architecture, 2n.docxChapter 8 1 Digital Design and Computer Architecture, 2n.docx
Chapter 8 1 Digital Design and Computer Architecture, 2n.docx
 
memory.ppt
memory.pptmemory.ppt
memory.ppt
 
memory.ppt
memory.pptmemory.ppt
memory.ppt
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization
 
cache cache memory memory cache memory.pptx
cache cache memory memory cache memory.pptxcache cache memory memory cache memory.pptx
cache cache memory memory cache memory.pptx
 
Memory Organizationsssssssssssssssss.ppt
Memory Organizationsssssssssssssssss.pptMemory Organizationsssssssssssssssss.ppt
Memory Organizationsssssssssssssssss.ppt
 
hierarchical memory technology.pptx
hierarchical memory technology.pptxhierarchical memory technology.pptx
hierarchical memory technology.pptx
 
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStoreBig Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
 
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOCSOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
 
Bringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to MahoutBringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to Mahout
 
cache memory.ppt
cache memory.pptcache memory.ppt
cache memory.ppt
 
cache memory.ppt
cache memory.pptcache memory.ppt
cache memory.ppt
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
 

More from Hsien-Hsin Sean Lee, Ph.D.

Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...
Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...
Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...
Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...
Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...
Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...
Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...
Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...
Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...
Hsien-Hsin Sean Lee, Ph.D.
 
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
Hsien-Hsin Sean Lee, Ph.D.
 

More from Hsien-Hsin Sean Lee, Ph.D. (19)

Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
 
Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...
Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...
Lec18 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- In...
 
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...
 
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
Lec16 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Fi...
 
Lec15 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Re...
Lec15 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Re...Lec15 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Re...
Lec15 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Re...
 
Lec14 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Se...
Lec14 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Se...Lec14 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Se...
Lec14 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Se...
 
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
Lec13 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Sh...
 
Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...
Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...
Lec12 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Ad...
 
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
Lec11 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- De...
 
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
Lec10 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Mu...
 
Lec9 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Com...
Lec9 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Com...Lec9 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Com...
Lec9 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Com...
 
Lec8 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Qui...
Lec8 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Qui...Lec8 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Qui...
Lec8 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Qui...
 
Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...
Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...
Lec7 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Kar...
 
Lec6 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Can...
Lec6 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Can...Lec6 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Can...
Lec6 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Can...
 
Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...
Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...
Lec5 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Boo...
 
Lec4 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMOS
Lec4 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMOSLec4 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMOS
Lec4 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMOS
 
Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...
Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...
Lec3 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- CMO...
 
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
 
Lec1 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Intro
Lec1 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- IntroLec1 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Intro
Lec1 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Intro
 

Recently uploaded

怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证
怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证
怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证
tufbav
 
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
drmarathore
 
(👉Ridhima)👉VIP Model Call Girls Mulund ( Mumbai) Call ON 9967824496 Starting ...
(👉Ridhima)👉VIP Model Call Girls Mulund ( Mumbai) Call ON 9967824496 Starting ...(👉Ridhima)👉VIP Model Call Girls Mulund ( Mumbai) Call ON 9967824496 Starting ...
(👉Ridhima)👉VIP Model Call Girls Mulund ( Mumbai) Call ON 9967824496 Starting ...
motiram463
 
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
wpkuukw
 
一比一定(购)新西兰林肯大学毕业证(Lincoln毕业证)成绩单学位证
一比一定(购)新西兰林肯大学毕业证(Lincoln毕业证)成绩单学位证一比一定(购)新西兰林肯大学毕业证(Lincoln毕业证)成绩单学位证
一比一定(购)新西兰林肯大学毕业证(Lincoln毕业证)成绩单学位证
wpkuukw
 
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
ehyxf
 
Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)
amitlee9823
 
Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Call Girls Pimple Saudagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Pimple Saudagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Pimple Saudagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Pimple Saudagar Call Me 7737669865 Budget Friendly No Advance Booking
 
(ISHITA) Call Girls Service Aurangabad Call Now 8617697112 Aurangabad Escorts...
(ISHITA) Call Girls Service Aurangabad Call Now 8617697112 Aurangabad Escorts...(ISHITA) Call Girls Service Aurangabad Call Now 8617697112 Aurangabad Escorts...
(ISHITA) Call Girls Service Aurangabad Call Now 8617697112 Aurangabad Escorts...
 
SM-N975F esquematico completo - reparación.pdf
SM-N975F esquematico completo - reparación.pdfSM-N975F esquematico completo - reparación.pdf
SM-N975F esquematico completo - reparación.pdf
 
Shikrapur Call Girls Most Awaited Fun 6297143586 High Profiles young Beautie...
Shikrapur Call Girls Most Awaited Fun  6297143586 High Profiles young Beautie...Shikrapur Call Girls Most Awaited Fun  6297143586 High Profiles young Beautie...
Shikrapur Call Girls Most Awaited Fun 6297143586 High Profiles young Beautie...
 
Top Rated Pune Call Girls Ravet ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Ravet ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Ravet ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Ravet ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证
怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证
怎样办理维多利亚大学毕业证(UVic毕业证书)成绩单留信认证
 
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
Abort pregnancy in research centre+966_505195917 abortion pills in Kuwait cyt...
 
(👉Ridhima)👉VIP Model Call Girls Mulund ( Mumbai) Call ON 9967824496 Starting ...
(👉Ridhima)👉VIP Model Call Girls Mulund ( Mumbai) Call ON 9967824496 Starting ...(👉Ridhima)👉VIP Model Call Girls Mulund ( Mumbai) Call ON 9967824496 Starting ...
(👉Ridhima)👉VIP Model Call Girls Mulund ( Mumbai) Call ON 9967824496 Starting ...
 
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
一比一定(购)坎特伯雷大学毕业证(UC毕业证)成绩单学位证
 
HLH PPT.ppt very important topic to discuss
HLH PPT.ppt very important topic to discussHLH PPT.ppt very important topic to discuss
HLH PPT.ppt very important topic to discuss
 
Escorts Service Daryaganj - 9899900591 College Girls & Models 24/7
Escorts Service Daryaganj - 9899900591 College Girls & Models 24/7Escorts Service Daryaganj - 9899900591 College Girls & Models 24/7
Escorts Service Daryaganj - 9899900591 College Girls & Models 24/7
 
Top Rated Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
 
一比一定(购)新西兰林肯大学毕业证(Lincoln毕业证)成绩单学位证
一比一定(购)新西兰林肯大学毕业证(Lincoln毕业证)成绩单学位证一比一定(购)新西兰林肯大学毕业证(Lincoln毕业证)成绩单学位证
一比一定(购)新西兰林肯大学毕业证(Lincoln毕业证)成绩单学位证
 
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
怎样办理圣芭芭拉分校毕业证(UCSB毕业证书)成绩单留信认证
 
Top Rated Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated  Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...Top Rated  Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
 
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Chickpet ☎ 7737669865☎ Book Your One night Stand (Bangalore)
 
Deira Dubai Escorts +0561951007 Escort Service in Dubai by Dubai Escort Girls
Deira Dubai Escorts +0561951007 Escort Service in Dubai by Dubai Escort GirlsDeira Dubai Escorts +0561951007 Escort Service in Dubai by Dubai Escort Girls
Deira Dubai Escorts +0561951007 Escort Service in Dubai by Dubai Escort Girls
 
Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Bommasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
 

Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1

  • 1. ECE 4100/6100 Advanced Computer Architecture Lecture 9 Memory Hierarchy Design (I) Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute of Technology
  • 2. Why Care About Memory Hierarchy? Processor 60%/year (2X/1.5 years) DRAM 9%/year (2X/10 years) 1 10 100 1000 1980 1981 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 DRAM CPU 1982 Performance Time “Moore’s Law” Processor-DRAM Performance Gap grows 50% / year
  • 3. An Unbalanced System Source: Bob Colwell keynote ISCA’29 2002
  • 4. Memory Issues • Latency – Time to move through the longest circuit path (from the start of request to the response) • Bandwidth – Number of bits transported at one time • Capacity – Size of memory • Energy – Cost of accessing memory (to read and write)
  • 5. Model of Memory Hierarchy RegReg FileFile L1L1 Data cacheData cache L1L1 Inst cacheInst cache L2L2 CacheCache MainMain MemoryMemory DISKDISK SRAMSRAM DRAMDRAM
  • 6. Levels of the Memory Hierarchy CPU Registers 100s Bytes <10 ns Cache K Bytes 10-100 ns 1-0.1 cents/bit Main Memory M Bytes 200ns- 500ns $.0001-.00001 cents /bit Disk G Bytes, 10 ms (10,000,000 ns) 10 - 10 cents/bit -5 -6 Capacity Access Time Cost Tape infinite sec-min 10 -8 Registers Cache Memory Disk Tape Instr. Operands Cache Lines Pages Files Staging Transfer Unit Compiler 1-8 bytes Cache controller 8-128 bytes Operating system 512-4K bytes User Mbytes Upper Level Lower Level faster Larger This Lecture
  • 7. Topics covered • Why do caches work – Principle of program locality • Cache hierarchy – Average memory access time (AMAT) • Types of caches – Direct mapped – Set-associative – Fully associative • Cache policies – Write back vs. write through – Write allocate vs. No write allocate
  • 8. Principle of Locality • Programs access a relatively small portion of address space at any instant of time. • Two Types of Locality: – Temporal Locality (Locality in Time): If an address is referenced, it tends to be referenced again • e.g., loops, reuse – Spatial Locality (Locality in Space): If an address is referenced, neighboring addresses tend to be referenced • e.g., straightline code, array access • Traditionally, HW has relied on locality for speed Locality is a program property that is exploited in machine design.
  • 9. Example of Locality int A[100], B[100], C[100], D; for (i=0; i<100; i++) { C[i] = A[i] * B[i] + D; } A[0]A[1]A[2]A[3]A[5]A[6]A[7] A[4] A[96]A[97]A[98]A[99]B[1]B[2]B[3] B[0] . . . . . . . . . . . . . . B[5]B[6]B[7] B[4]B[9]B[10]B[11] B[8] C[0]C[1]C[2]C[3]C[5]C[6]C[7] C[4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . C[96]C[97]C[98]C[99]D A Cache Line (One fetch)
  • 10. Modern Memory Hierarchy • By taking advantage of the principle of locality: – Present the user with as much memory as is available in the cheapest technology. – Provide access at the speed offered by the fastest technology. Control Datapath Secondary Storage (Disk) Processor Registers Main Memory (DRAM) Second Level Cache (SRAM) L1D Cache Tertiary Storage (Disk/Tape) Third Level Cache (SRAM) L1I Cache
  • 11. Example: Intel Core2 Duo L2 Cache Core0 Core1L1 32 KB, 8-Way, 64 Byte/Line, LRU, WB 3 Cycle Latency L2 4.0 MB, 16-Way, 64 Byte/Line, LRU, WB 14 Cycle Latency Source: http://www.sandpile.org DL1 DL1 IL1 IL1
  • 12. Example : Intel Itanium 2 3MB Version 180nm 421 mm2 6MB Version 130nm 374 mm2
  • 13. Intel Nehalem 3MB 3MB 3MB3MB 3MB 3MB 3MB3MB 24MB L3 Core 0 Core 1 Core 0
  • 14. Example : STI Cell Processor SPE = 21M transistors (14M array; 7M logic) Local Storage
  • 15. Cell Synergistic Processing Element Each SPE contains 128 x128 bit registers, 256KB, 1-port, ECC-protected local SRAM (Not cache)
  • 16. Cache Terminology • Hit: data appears in some block – Hit Rate: the fraction of memory accesses found in the level – Hit Time: Time to access the level (consists of RAM access time + Time to determine hit) • Miss: data needs to be retrieved from a block in the lower level (e.g., Block Y) – Miss Rate = 1 - (Hit Rate) – Miss Penalty: Time to replace a block in the upper level + Time to deliver the block to the processor • Hit Time << Miss Penalty Lower Level MemoryUpper Level Memory To Processor From Processor Blk X Blk Y
  • 17. Average Memory Access Time • Average memory-access time = Hit time + Miss rate x Miss penalty • Miss penalty: time to fetch a block from lower memory level – access time: function of latency – transfer time: function of bandwidth b/w levels •Transfer one “cache line/block” at a time •Transfer at the size of the memory-bus width
  • 18. Memory Hierarchy Performance • Average Memory Access Time (AMAT) = Hit Time + Miss rate * Miss Penalty = Thit(L1) + Miss%(L1) * T(memory) • Example: – Cache Hit = 1 cycle – Miss rate = 10% = 0.1 – Miss penalty = 300 cycles – AMAT = 1 + 0.1 * 300 = 31 cycles • Can we improve it? Main Memory (DRAM) First-level Cache Hit Time Miss % * Miss penalty 1 clk 300 clks
  • 19. Reducing Penalty: Multi-Level Cache Average Memory Access Time (AMAT) = Thit(L1) + Miss%(L1)* (Thit(L2) + Miss%(L2)* (Thit(L3) + Miss% (L3)*T(memory) ) ) Main Memory (DRAM) Second Level Cache First-level Cache Third Level Cache 1 clk 300 clks20 clks10 clks On-die L1 L2 L3
  • 20. AMAT of multi-level memory = Thit(L1) + Miss%(L1)* Tmiss(L1) = Thit(L1) + Miss%(L1)* { Thit(L2) + Miss%(L2)* (Tmiss(L2) } = Thit(L1) + Miss%(L1)* { Thit(L2) + Miss%(L2)* (Tmiss(L2) } = Thit(L1) + Miss%(L1)* { Thit(L2) + Miss%(L2) * [ Thit(L3) + Miss%(L3) * T(memory) ] }
  • 21. AMAT Example = Thit(L1) + Miss%(L1)* (Thit(L2) + Miss%(L2)* (Thit(L3) + Miss%(L3)*T(memory) ) ) • Example: – Miss rate L1=10%, Thit(L1) = 1 cycle – Miss rate L2=5%, Thit(L2) = 10 cycles – Miss rate L3=1%, Thit(L3) = 20 cycles – T(memory) = 300 cycles • AMAT = ? – 2.115 (compare to 31 with no multi-levels) 14.7x speed-up!
  • 22. Types of Caches Type of cache Mapping of data from memory to cache Complexity of searching the cache Direct mapped (DM) A memory value can be placed at a single corresponding location in the cache Fast indexing mechanism Set- associative (SA) A memory value can be placed in any of a set of locations in the cache Slightly more involved search mechanism Fully- associative (FA) A memory value can be placed in any location in the cache Extensive hardware resources required to search (CAM) •DM and FA can be thought as special cases of SA •DM  1-way SA •FA  All-way SA
  • 23. 0xF011111 11111 0xAA 0x0F00000 00000 0x55 Direct Mapping 0 1 000001 0 1 0 10x0F 00000 0x55 11111 0xAA 0xF011111 Tag Index Data Direct mapping: A memory value can only be placed at a single corresponding location in the cache 0000000000 11111
  • 24. Set Associative Mapping (2-Way) 0 10x0F 0x55 0xAA 0xF0 Tag Index Data 0 1 0 0 1 Set-associative mapping: A memory value can be placed in any location of a set in the cache Way 0 Way 1 0000 00000 0 0x55 0000 10000 1 0x0F 1111 01111 0 0xAA 1111 11111 1 0xF0
  • 25. 0xF01111 1111 0xAA 0x0F0000 0000 0x55 Fully Associative Mapping 0x0F 0x55 0xAA 0xF0 Tag Data 000110 000001 000000 111110 111111 0xF01111 1111 0xAA 0x0F0000 0000 0x55 0x0F 0x55 0xAA 0xF0 000110 000001 000000 111110 111111 Fully-associative mapping: A memory value can be placed anywhere in the cache
  • 26. Direct Mapped Cache Memory DM Cache Address 0 1 2 3 4 5 6 7 8 9 A B C D E F Cache Index 0 1 2 3 • Cache location 0 is occupied by data from: – Memory locations 0, 4, 8, and C • Which one should we place in the cache? • How can we tell which one is in the cache? A Cache Line (or Block)
  • 27. Three (or Four) Cs (Cache Miss Terms) • Compulsory Misses: – cold start misses (Caches do not have valid data at the start of the program) • Capacity Misses: – Increase cache size • Conflict Misses: – Increase cache size and/or associativity. – Associative caches reduce conflict misses • Coherence Misses: – In multiprocessor systems (later lectures…) Processor Cache 0x1234 0x5678 0x91B1 0x1111 Processor Cache 0x1234 0x5678 0x91B1 0x1111 Processor Cache 0x1234
  • 28. Example: 1KB DM Cache, 32-byte Lines • The lowest M bits are the Offset (Line Size = 2M ) • Index = log2 (# of sets) Index 0 1 2 3 : Cache Data Byte 0 0431 : Tag Ex: 0x01 Valid Bit : 31 Byte 1Byte 31 : Byte 32Byte 33Byte 63 : Byte 992Byte 1023 : Cache Tag Offset Ex: 0x00 9 #ofset Address
  • 29. Example of Caches • Given a 2MB, direct-mapped physical caches, line size=64bytes • Support up to 52-bit physical address • Tag size? • Now change it to 16-way, Tag size? • How about if it’s fully associative, Tag size?
  • 30. Example: 1KB DM Cache, 32-byte Lines • lw from 0x77FF1C68 77FF1C68 = 0111 0111 1111 1111 0001 1100 0101 1000 DM Cache Tag array Data array Tag Index Offset 2 24 25 26 27
  • 31. DM Cache Speed Advantage • Tag and data access happen in parallel – Faster cache access! Index Tag Index Offset Tag array Data array
  • 32. Associative Caches Reduce Conflict Misses • Set associative (SA) cache – multiple possible locations in a set • Fully associative (FA) cache – any location in the cache • Hardware and speed overhead – Comparators – Multiplexors – Data selection only after Hit/Miss determination (i.e., after tag comparison)
  • 33. Set Associative Cache (2-way) • Cache index selects a “set” from the cache • The two tags in the set are compared in parallel • Data is selected based on the tag result Cache Data Cache Line 0 Cache TagValid :: : Cache Data Cache Line 0 Cache Tag Valid : :: Cache Index Mux 01Sel1 Sel0 Cache Line Compare Adr Tag Compare OR Hit •Additional circuitry as compared to DM caches •Makes SA caches slower to access than DM of comparable size
  • 34. Set-Associative Cache (2-way) • 32 bit address • lw from 0x77FF1C78 Tag array1Data array1 Tag Index offset Tag array0 Data aray0
  • 35. Fully Associative Cache tag offset Multiplexor Associative Search Tag = = = = Data Rotate and Mask
  • 36. Fully Associative Cache Tag Data compare Tag Data compare Tag Data compare Tag Data compare Address Write Data Read Data Tag offset Additional circuitry as compared to DM caches More extensive than SA caches Makes FA caches slower to access than either DM or SA of comparable size
  • 37. Cache Write Policy • Write through -The value is written to both the cache line and to the lower-level memory. • Write back - The value is written only to the cache line. The modified cache line is written to main memory only when it has to be replaced. – Is the cache line clean (holds the same value as memory) or dirty (holds a different value than memory)?
  • 39. Write Buffer – Processor: writes data into the cache and the write buffer – Memory controller: writes contents of the buffer to memory • Write buffer is a FIFO structure: – Typically 4 to 8 entries – Desirable: Occurrence of Writes << DRAM write cycles • Memory system designer’s nightmare: – Write buffer saturation (i.e., Writes  DRAM write cycles) Processor Cache Write Buffer DRAM
  • 41. On Write Miss • Write allocate – The line is allocated on a write miss, followed by the write hit actions above. – Write misses first act like read misses • No write allocate – Write misses do not interfere cache – Line is only modified in the lower level memory – Mostly use with write-through cache
  • 42. Quick recap • Processor-memory performance gap • Memory hierarchy exploits program locality to reduce AMAT • Types of Caches – Direct mapped – Set associative – Fully associative • Cache policies – Write through vs. Write back – Write allocate vs. No write allocate
  • 43. Cache Replacement Policy • Random – Replace a randomly chosen line • FIFO – Replace the oldest line • LRU (Least Recently Used) – Replace the least recently used line • NRU (Not Recently Used) – Replace one of the lines that is not recently used – In Itanium2 L1 Dcache, L2 and L3 caches
  • 44. LRU Policy AA BB CC DD MRU LRULRU+1MRU-1 Access C CC AA BB DD Access D DD CC AA BB Access E EE DD CC AA Access C CC EE DD AA Access G GG CC EE DD MISS, replacement needed MISS, replacement needed
  • 45. LRU From Hardware Perspective AA BB CC DD Way0Way1Way2Way3 StateState machinemachine LRU Access update Access D LRU policy increases cache access times Additional hardware bits needed for LRU state machine
  • 46. LRU Algorithms • True LRU – Expensive in terms of speed and hardware – Need to remember the order in which all N lines were last accessed – N! scenarios – O(log N!) ≈ O(N log N)O(N log N) LRU bits •2-ways  AB BA = 2 = 2! •3-ways  ABC ACB BAC BCA CAB CBA = 6 = 3! • Pseudo LRU: O(N)O(N) – Approximates LRU policy with a binary tree
  • 47. Pseudo LRU Algorithm (4-way SA) AB/CD bit (AB/CD bit (L0L0)) A/B bit (A/B bit (L1L1)) C/D bit (C/D bit (L2L2)) Way AWay A Way BWay B Way CWay C Way DWay D AA BB CC DD Way0Way1Way2Way3 • Tree-based • O(N): 3 bits for 4-way • Cache ways are the leaves of the tree • Combine ways as we proceed towards the root of the tree
  • 48. Pseudo LRU Algorithm L2L2 L1L1 L0L0 Way to replaceWay to replace X 0 0 Way A X 1 0 Way B 0 X 1 Way C 1 X 1 Way D Way hitWay hit L2L2 L1L1 L0L0 Way A --- 1 1 Way B --- 0 1 Way C 1 --- 0 Way D 0 --- 0 LRU update algorithmLRU update algorithm Replacement DecisionReplacement Decision AB/CD bit (AB/CD bit (L0L0)) A/B bit (A/B bit (L1L1)) C/D bit (C/D bit (L2L2)) Way AWay A Way BWay B Way CWay C Way DWay D AB/CDAB/CDABABCDCD AB/CDAB/CDABABCDCD • Less hardware than LRU • Faster than LRU •L2L1L0 = 000, there is a hit in Way B, what is the new updated L2L1L0? •L2L1L0 = 001, a way needs to be replaced, which way would be chosen?
  • 49. Not Recently Used (NRU) • Use R(eferenced) and M(odified) bits – 0 (not referenced or not modified) – 1 (referenced or modified) • Classify lines into – C0: R=0, M=0 – C1: R=0, M=1 – C2: R=1, M=0 – C3: R=1, M=1 • Chose the victim from the lowest class – (C3 > C2 > C1 > C0) • Periodically clear R and M bits
  • 50. Reducing Miss Rate • Enlarge Cache • If cache size is fixed – Increase associativity – Increase line size 2 5 6 4 0 % 3 5 % 3 0 % 2 5 % 2 0 % 1 5 % 1 0 % 5 % 0 % Missrate 6 41 64 B lo c k s iz e ( b y te s ) 1 K B 8 K B 1 6 K B 6 4 K B 2 5 6 K B •Does this always work? Increasing cache pollution
  • 51. Reduce Miss Rate/Penalty: Way Prediction • Best of both worlds: Speed as that of a DM cache and reduced conflict misses as that of a SA cache • Extra bits predict the way of the next access • Alpha 21264 Way Prediction (next line predictor) – If correct, 1-cycle I-cache latency – If incorrect, 2-cycle latency from I-cache fetch/branch predictor – Branch predictor can override the decision of the way predictor
  • 52. Alpha 21264 Way Prediction (2-way) (offset) Note: Alpha advocates to align the branch targets on octaword (16 bytes)
  • 53. Reduce Miss Rate: Code Optimization • Misses occur if sequentially accessed array elements come from different cache lines • Code optimizations  No hardware change – Rely on programmers or compilers • Examples: – Loop interchange • In nested loops: outer loop becomes inner loop and vice versa – Loop blocking • partition large array into smaller blocks, thus fitting the accessed array elements into cache size • enhances cache reuse
  • 54. j=0 i=0 Loop Interchange /* Before */ for (j=0; j<100; j++) for (i=0; i<5000; i++) x[i][j] = 2*x[i][j] /* After */ for (i=0; i<5000; i++) for (j=0; j<100; j++) x[i][j] = 2*x[i][j] j=0 i=0 Improved cache efficiency Row-major ordering Is this always safe transformation? Does this always lead to higher efficiency? What is the worst that could happen? Hint: DM cache
  • 55. Loop Blocking /* Before */ for (i=0; i<N; i++) for (j=0; j<N; j++) { r=0; for (k=0; k<N; k++) r += y[i][k]*z[k][j]; x[i][j] = r; } i k k j y[i][k]y[i][k] z[k][j]z[k][j] i X[i][j]X[i][j] Does not exploit localityDoes not exploit locality
  • 56. Loop Blocking i k k j y[i][k]y[i][k] z[k][j]z[k][j] i j X[i][j]X[i][j] •Partition the loop’s iteration space into many smaller chunks •Ensure that the data stays in the cache until it is reused
  • 57. Other Miss Penalty Reduction Techniques • Critical value first and Restart early – Send requested data in the leading edge transfer – Trailing edge transfer continues in the background • Give priority to read misses over writes – Use write buffer (WT) and writeback buffer (WB) • Combining writes – combining write buffer – Intel’s WC (write-combining) memory type • Victim caches • Assist caches • Non-blocking caches • Data Prefetch mechanism
  • 58. Write Combining Buffer For WC buffer, combine neighbor addresses 100100 108108 116116 124124 11 11 11 11 Mem[100]Mem[100] Mem[108]Mem[108] Mem[116]Mem[116] Mem[124]Mem[124] VWr. addr 00 00 00 00 V 00 00 00 00 V 00 00 00 00 V 100100 11 00 00 00 Mem[100]Mem[100] VWr. addr 11 V 00 00 00 Mem[108]Mem[108] 11 00 00 00 V Mem[116]Mem[116] 11 00 00 00 Mem[124]Mem[124] V • Need to initiate 4 separate writes back to lower level memory • One single write back to lower level memory
  • 59. WC memory type • Intel 32 (starting in P6) supports USWC (or WC) memory type – Uncacheable, speculative Write Combining – Expensive (in terms of time) for individual write – Combine several individual writes into a bursty write – Effective for video memory data •Algorithm writing 1 byte at a time •Combine 32 of 1-byte data into one 32-byte write •Ordering is not important