SlideShare a Scribd company logo
1 of 67
Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007
Single-core computer
Single-core CPU chip the single core
Multi-core architectures ,[object Object],Core 1 Core 2 Core 3 Core 4 Multi-core CPU chip
Multi-core CPU chip ,[object Object],[object Object],core 1 core 2 core 3 core 4
The cores run in parallel core 1 core 2 core 3 core 4 thread 1 thread 2 thread 3 thread 4
Within each core, threads are time-sliced (just like on a uniprocessor) core 1 core 2 core 3 core 4 several  threads several  threads several  threads several  threads
Interaction with the Operating System ,[object Object],[object Object],[object Object]
Why multi-core ? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Instruction-level parallelism ,[object Object],[object Object],[object Object]
Thread-level parallelism (TLP) ,[object Object],[object Object],[object Object],[object Object],[object Object]
General context: Multiprocessors ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Lemieux cluster, Pittsburgh  supercomputing  center
Multiprocessor memory types ,[object Object],[object Object]
Multi-core processor is a special kind of a multiprocessor: All processors are on the same chip ,[object Object],[object Object]
What applications benefit  from multi-core? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Each can run on its own core
More examples ,[object Object],[object Object],[object Object],[object Object]
A technique complementary to multi-core: Simultaneous multithreading   ,[object Object],[object Object],[object Object],[object Object],Source: Intel BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus
Simultaneous multithreading (SMT) ,[object Object],[object Object],[object Object]
Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point
Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 2: integer operation
SMT processor: both threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point Thread 2: integer operation
But: Can’t simultaneously use  the same functional unit BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2 This scenario is impossible with SMT on a single core (assuming a single integer unit) IMPOSSIBLE
SMT not a “true” parallel processor ,[object Object],[object Object],[object Object],[object Object]
Multi-core:  threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus Thread 1 Thread 2
Multi-core:  threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus Thread 3 Thread 4
Combining Multi-core and SMT ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SMT Dual-core: all four threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus Thread 1 Thread 3 Thread 2 Thread 4
Comparison: multi-core vs SMT ,[object Object]
Comparison: multi-core vs SMT ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The memory hierarchy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
“Fish” machines ,[object Object],[object Object],[object Object],[object Object],memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 hyper-threads
Designs with private L2 caches memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache Both L1 and L2 are private Examples: AMD Opteron,  AMD Athlon, Intel Pentium D L3 cache L3 cache A design with L3 caches Example: Intel Itanium 2
Private vs shared caches? ,[object Object]
Private vs shared caches ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The cache coherence problem ,[object Object],[object Object]
The cache coherence problem Suppose variable x initially contains 15213 One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 1 reads x One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 2 reads x One or more  levels of  cache x=15213 One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 1 writes to x, setting it to 21660 One or more  levels of  cache x=21660 One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip assuming  write-through  caches Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 2 attempts to read x… gets a stale copy One or more  levels of  cache x=21660 One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
Solutions for cache coherence ,[object Object],[object Object],[object Object]
Inter-core bus One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache Main memory multi-core chip inter-core bus Core 1 Core 2 Core 3 Core 4
Invalidation protocol with snooping ,[object Object],[object Object]
The cache coherence problem Revisited: Cores 1 and 2 have both read x One or more  levels of  cache x=15213 One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 1 writes to x, setting it to 21660 One or more  levels of  cache x=21660 One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip assuming  write-through  caches INVALIDATED sends invalidation request inter-core bus Core 1 Core 2 Core 3 Core 4
The cache coherence problem After invalidation: One or more  levels of  cache x=21660 One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 2 reads x. Cache misses,   and loads the new copy. One or more  levels of  cache x=21660 One or more  levels of  cache x=21660 One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
Alternative to invalidate protocol: update protocol Core 1 writes x=21660: One or more  levels of  cache x=21660 One or more  levels of  cache x= 21660 One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip assuming  write-through  caches UPDATED broadcasts updated value inter-core bus Core 1 Core 2 Core 3 Core 4
Which do you think is better? Invalidation or update?
Invalidation vs update ,[object Object],[object Object],[object Object],[object Object]
Invalidation protocols ,[object Object],[object Object],[object Object]
Programming for multi-core ,[object Object],[object Object],[object Object],[object Object]
Thread safety very important ,[object Object],[object Object],[object Object]
However: Need to use synchronization even if only time-slicing on a uniprocessor ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Need to use synchronization even if only time-slicing on a uniprocessor ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],gives counter=2 gives counter=1
Assigning threads to the cores ,[object Object],[object Object],[object Object],[object Object]
Affinity masks are bit vectors ,[object Object],1 0 1 1 core 3 core 2 core 1 core 0 ,[object Object]
Affinity masks when multi-core and SMT combined ,[object Object],[object Object],1 core 3 core 2 core 1 core 0 1 0 0 1 0 1 1 thread 1 ,[object Object],[object Object],thread 0 thread 1 thread 0 thread 1 thread 0 thread 1 thread 0
Default Affinities ,[object Object],[object Object],[object Object]
Process migration is costly ,[object Object],[object Object],[object Object],[object Object]
Hard affinities ,[object Object],[object Object]
When to set your own affinities ,[object Object],[object Object],[object Object],Source: Sensable.com
Kernel scheduler API ,[object Object],[object Object],[object Object],[object Object]
Kernel scheduler API ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Windows Task Manager core 2 core 1
Legal licensing issues ,[object Object],[object Object]
Conclusion ,[object Object],[object Object],[object Object]

More Related Content

What's hot

Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi coremukul bhardwaj
 
Computer system architecture
Computer system architectureComputer system architecture
Computer system architecturevenkateswarlu G
 
Real Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systemsReal Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systemsHariharan Ganesan
 
Superscalar and VLIW architectures
Superscalar and VLIW architecturesSuperscalar and VLIW architectures
Superscalar and VLIW architecturesAmit Kumar Rathi
 
Parallel Programing Model
Parallel Programing ModelParallel Programing Model
Parallel Programing ModelAdlin Jeena
 
Cache memory and virtual memory
Cache memory and virtual memoryCache memory and virtual memory
Cache memory and virtual memoryPrakharBansal29
 
Cache coherence
Cache coherenceCache coherence
Cache coherenceEmployee
 
REAL TIME OPERATING SYSTEM
REAL TIME OPERATING SYSTEMREAL TIME OPERATING SYSTEM
REAL TIME OPERATING SYSTEMprakrutijsh
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory MultiprocessorsSalvatore La Bua
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecturePiyush Mittal
 
Distributed shred memory architecture
Distributed shred memory architectureDistributed shred memory architecture
Distributed shred memory architectureMaulik Togadiya
 
Parallel programming model
Parallel programming modelParallel programming model
Parallel programming modeleasy notes
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingAkhila Prabhakaran
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processingPage Maker
 

What's hot (20)

Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi core
 
Computer system architecture
Computer system architectureComputer system architecture
Computer system architecture
 
VLIW Processors
VLIW ProcessorsVLIW Processors
VLIW Processors
 
Real Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systemsReal Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systems
 
Superscalar and VLIW architectures
Superscalar and VLIW architecturesSuperscalar and VLIW architectures
Superscalar and VLIW architectures
 
Parallel Programing Model
Parallel Programing ModelParallel Programing Model
Parallel Programing Model
 
Cache memory and virtual memory
Cache memory and virtual memoryCache memory and virtual memory
Cache memory and virtual memory
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
REAL TIME OPERATING SYSTEM
REAL TIME OPERATING SYSTEMREAL TIME OPERATING SYSTEM
REAL TIME OPERATING SYSTEM
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory Multiprocessors
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecture
 
Distributed shred memory architecture
Distributed shred memory architectureDistributed shred memory architecture
Distributed shred memory architecture
 
RTOS - Real Time Operating Systems
RTOS - Real Time Operating SystemsRTOS - Real Time Operating Systems
RTOS - Real Time Operating Systems
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
Parallel programming model
Parallel programming modelParallel programming model
Parallel programming model
 
Multicore Processors
Multicore ProcessorsMulticore Processors
Multicore Processors
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
 

Viewers also liked

IBM z/OS V2R2 Networking Technologies Update
IBM z/OS V2R2 Networking Technologies UpdateIBM z/OS V2R2 Networking Technologies Update
IBM z/OS V2R2 Networking Technologies UpdateAnderson Bassani
 
Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFOfer Rosenberg
 
Embedded Solutions 2010: Intel Multicore by Eastronics
Embedded Solutions 2010:  Intel Multicore by Eastronics Embedded Solutions 2010:  Intel Multicore by Eastronics
Embedded Solutions 2010: Intel Multicore by Eastronics New-Tech Magazine
 
IBM z/OS V2R2 Performance and Availability Topics
IBM z/OS V2R2 Performance and Availability TopicsIBM z/OS V2R2 Performance and Availability Topics
IBM z/OS V2R2 Performance and Availability TopicsAnderson Bassani
 
Cache & CPU performance
Cache & CPU performanceCache & CPU performance
Cache & CPU performanceso61pi
 
可靠分布式系统基础 Paxos的直观解释
可靠分布式系统基础 Paxos的直观解释可靠分布式系统基础 Paxos的直观解释
可靠分布式系统基础 Paxos的直观解释Yanpo Zhang
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesTanel Poder
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF SuperpowersBrendan Gregg
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Computex 2014 AMD Press Conference
Computex 2014 AMD Press ConferenceComputex 2014 AMD Press Conference
Computex 2014 AMD Press ConferenceAMD
 
AMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureAMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureLow Hong Chuan
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and ToolsBrendan Gregg
 
KVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStackKVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStackBoden Russell
 

Viewers also liked (19)

IBM z/OS V2R2 Networking Technologies Update
IBM z/OS V2R2 Networking Technologies UpdateIBM z/OS V2R2 Networking Technologies Update
IBM z/OS V2R2 Networking Technologies Update
 
Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOF
 
Ludden q3 2008_boston
Ludden q3 2008_bostonLudden q3 2008_boston
Ludden q3 2008_boston
 
Embedded Solutions 2010: Intel Multicore by Eastronics
Embedded Solutions 2010:  Intel Multicore by Eastronics Embedded Solutions 2010:  Intel Multicore by Eastronics
Embedded Solutions 2010: Intel Multicore by Eastronics
 
IBM z/OS V2R2 Performance and Availability Topics
IBM z/OS V2R2 Performance and Availability TopicsIBM z/OS V2R2 Performance and Availability Topics
IBM z/OS V2R2 Performance and Availability Topics
 
z/OS V2R2 Enhancements
z/OS V2R2 Enhancementsz/OS V2R2 Enhancements
z/OS V2R2 Enhancements
 
Cache & CPU performance
Cache & CPU performanceCache & CPU performance
Cache & CPU performance
 
可靠分布式系统基础 Paxos的直观解释
可靠分布式系统基础 Paxos的直观解释可靠分布式系统基础 Paxos的直观解释
可靠分布式系统基础 Paxos的直观解释
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
SMP/Multithread
SMP/MultithreadSMP/Multithread
SMP/Multithread
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Computex 2014 AMD Press Conference
Computex 2014 AMD Press ConferenceComputex 2014 AMD Press Conference
Computex 2014 AMD Press Conference
 
AMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureAMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores Architecture
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and Tools
 
KVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStackKVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStack
 

Similar to Multi-core architectures

multi-core Processor.ppt for IGCSE ICT and Computer Science Students
multi-core Processor.ppt for IGCSE ICT and Computer Science Studentsmulti-core Processor.ppt for IGCSE ICT and Computer Science Students
multi-core Processor.ppt for IGCSE ICT and Computer Science StudentsMKKhaing
 
Osa-multi-core.ppt
Osa-multi-core.pptOsa-multi-core.ppt
Osa-multi-core.pptSrikumarTB
 
Trip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningTrip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningRenaldas Zioma
 
Processors and its Types
Processors and its TypesProcessors and its Types
Processors and its TypesNimrah Shahbaz
 
Multi-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureMulti-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureUmair Amjad
 
5.6 Basic computer structure microprocessors
5.6 Basic computer structure   microprocessors5.6 Basic computer structure   microprocessors
5.6 Basic computer structure microprocessorslpapadop
 
Lecture 4.pptx
Lecture 4.pptxLecture 4.pptx
Lecture 4.pptxinfomerlin
 
Intro To .Net Threads
Intro To .Net ThreadsIntro To .Net Threads
Intro To .Net Threadsrchakra
 
Final draft intel core i5 processors architecture
Final draft intel core i5 processors architectureFinal draft intel core i5 processors architecture
Final draft intel core i5 processors architectureJawid Ahmad Baktash
 
fundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdffundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdfshubhangisonawane6
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.pptAberaZeleke1
 
Paralle programming 2
Paralle programming 2Paralle programming 2
Paralle programming 2Anshul Sharma
 

Similar to Multi-core architectures (20)

27 multicore
27 multicore27 multicore
27 multicore
 
27 multicore
27 multicore27 multicore
27 multicore
 
multi-core Processor.ppt for IGCSE ICT and Computer Science Students
multi-core Processor.ppt for IGCSE ICT and Computer Science Studentsmulti-core Processor.ppt for IGCSE ICT and Computer Science Students
multi-core Processor.ppt for IGCSE ICT and Computer Science Students
 
Osa-multi-core.ppt
Osa-multi-core.pptOsa-multi-core.ppt
Osa-multi-core.ppt
 
Trip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningTrip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine Learning
 
Processors and its Types
Processors and its TypesProcessors and its Types
Processors and its Types
 
Multi-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureMulti-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architecture
 
5.6 Basic computer structure microprocessors
5.6 Basic computer structure   microprocessors5.6 Basic computer structure   microprocessors
5.6 Basic computer structure microprocessors
 
Multi-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IKMulti-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IK
 
Memory Mapping Cache
Memory Mapping CacheMemory Mapping Cache
Memory Mapping Cache
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
Lecture 4.pptx
Lecture 4.pptxLecture 4.pptx
Lecture 4.pptx
 
Intro To .Net Threads
Intro To .Net ThreadsIntro To .Net Threads
Intro To .Net Threads
 
Final draft intel core i5 processors architecture
Final draft intel core i5 processors architectureFinal draft intel core i5 processors architecture
Final draft intel core i5 processors architecture
 
fundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdffundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdf
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.ppt
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
Paralle programming 2
Paralle programming 2Paralle programming 2
Paralle programming 2
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
 

More from nextlib

Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Archnextlib
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conferencenextlib
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New Worldnextlib
 
Social Graph
Social GraphSocial Graph
Social Graphnextlib
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Predictionnextlib
 
Closures for Java
Closures for JavaClosures for Java
Closures for Javanextlib
 
A Content-Driven Reputation System for the Wikipedia
A Content-Driven Reputation System for the WikipediaA Content-Driven Reputation System for the Wikipedia
A Content-Driven Reputation System for the Wikipedianextlib
 
SVD review
SVD reviewSVD review
SVD reviewnextlib
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlersnextlib
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategynextlib
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學nextlib
 
Comparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering SystemsComparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering Systemsnextlib
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007nextlib
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Designnextlib
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲nextlib
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...nextlib
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Bigtable
BigtableBigtable
Bigtablenextlib
 

More from nextlib (20)

Nio
NioNio
Nio
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conference
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New World
 
Social Graph
Social GraphSocial Graph
Social Graph
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Prediction
 
Closures for Java
Closures for JavaClosures for Java
Closures for Java
 
A Content-Driven Reputation System for the Wikipedia
A Content-Driven Reputation System for the WikipediaA Content-Driven Reputation System for the Wikipedia
A Content-Driven Reputation System for the Wikipedia
 
SVD review
SVD reviewSVD review
SVD review
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlers
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategy
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學
 
Comparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering SystemsComparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering Systems
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Design
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Bigtable
BigtableBigtable
Bigtable
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Multi-core architectures

  • 1. Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007
  • 3. Single-core CPU chip the single core
  • 4.
  • 5.
  • 6. The cores run in parallel core 1 core 2 core 3 core 4 thread 1 thread 2 thread 3 thread 4
  • 7. Within each core, threads are time-sliced (just like on a uniprocessor) core 1 core 2 core 3 core 4 several threads several threads several threads several threads
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point
  • 20. Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 2: integer operation
  • 21. SMT processor: both threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point Thread 2: integer operation
  • 22. But: Can’t simultaneously use the same functional unit BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2 This scenario is impossible with SMT on a single core (assuming a single integer unit) IMPOSSIBLE
  • 23.
  • 24. Multi-core: threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2
  • 25. Multi-core: threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 3 Thread 4
  • 26.
  • 27. SMT Dual-core: all four threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 3 Thread 2 Thread 4
  • 28.
  • 29.
  • 30.
  • 31.
  • 32. Designs with private L2 caches memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache Both L1 and L2 are private Examples: AMD Opteron, AMD Athlon, Intel Pentium D L3 cache L3 cache A design with L3 caches Example: Intel Itanium 2
  • 33.
  • 34.
  • 35.
  • 36. The cache coherence problem Suppose variable x initially contains 15213 One or more levels of cache One or more levels of cache One or more levels of cache One or more levels of cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 37. The cache coherence problem Core 1 reads x One or more levels of cache x=15213 One or more levels of cache One or more levels of cache One or more levels of cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 38. The cache coherence problem Core 2 reads x One or more levels of cache x=15213 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 39. The cache coherence problem Core 1 writes to x, setting it to 21660 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches Core 1 Core 2 Core 3 Core 4
  • 40. The cache coherence problem Core 2 attempts to read x… gets a stale copy One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 41.
  • 42. Inter-core bus One or more levels of cache One or more levels of cache One or more levels of cache One or more levels of cache Main memory multi-core chip inter-core bus Core 1 Core 2 Core 3 Core 4
  • 43.
  • 44. The cache coherence problem Revisited: Cores 1 and 2 have both read x One or more levels of cache x=15213 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 45. The cache coherence problem Core 1 writes to x, setting it to 21660 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches INVALIDATED sends invalidation request inter-core bus Core 1 Core 2 Core 3 Core 4
  • 46. The cache coherence problem After invalidation: One or more levels of cache x=21660 One or more levels of cache One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 47. The cache coherence problem Core 2 reads x. Cache misses, and loads the new copy. One or more levels of cache x=21660 One or more levels of cache x=21660 One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 48. Alternative to invalidate protocol: update protocol Core 1 writes x=21660: One or more levels of cache x=21660 One or more levels of cache x= 21660 One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches UPDATED broadcasts updated value inter-core bus Core 1 Core 2 Core 3 Core 4
  • 49. Which do you think is better? Invalidation or update?
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65. Windows Task Manager core 2 core 1
  • 66.
  • 67.