SlideShare a Scribd company logo
1 of 13
Introduction to Multi-Core

    A multi-core processor is an integrated circuit to which two or more processors have
     been attached.

    Leads to

           o   enhanced performance, reduced power consumption, and more efficient
               simultaneous processing of multiple tasks



    What changes expected in software design:

           o   To achieve competitive application performance on these new processors, many
               applications must be written (or rewritten) as parallel, multithreaded
               applications.

           o   Multithreaded development can be difficult, expensive, time consuming, and
               error prone — and it requires new programming skill sets.



    Adding cores results in additional overheads and latencies

           o   Serializes execution between communicating and non-communicating cores
               (e.g. hardware barriers, fences, resource contention)

           o   Various interdependent sources of latency and overhead

                      Architecture : cache coherency

                      System: processor scheduling

                      Application: synchronization

           o   Sensitive to real workloads (e.g. data dependencies)

           o   As the number of cores increase, the size of the overheads and latencies
               increases



    Is Multiprocessor same as Multicore?

           o   Multi-core, multiple cpu cores within a single processor

           o   Multi-processor, multiple processor within a single chip
 For software perspective, we can use either one of the term.




                   Multiprocessor MultiCore Diagram
Example of Multi-core Architecture:

    ARMMPCORE




    Two basic models of multi-core:

           o   Each core acts independently - “multiple single cores”

           o   Cores cooperate each other – “true multi-core”
What is Multiple single core?

    Each core acts independently

           o   Pros

                      Simplifying a porting from Single core systems

                      The minimum of interaction between cores – less overhead and more
                       predictable system

                      No cache coherency issues between the cores

                      Tools support may remain the same as it was for single core

                      Good scalability – however depends on hardware support

           o   Cons

                      Load balancing issues – some cores maybe idle and some overloaded.

                      Hardware should support this mode of operations by providing I/O
                       Queues for network interfaces.



   What is True Multi-core?

    Cores cooperate each other

           o   Pros

                      Better possibilities for balance loading meaning more effective usage of
                       system resources

                      L1 instruction cache can be used more efficiently (cache affinity)

           o   Cons

                      Porting from single core is typically more complicated

                      Possible cache coherency issues between the cores

                      System becomes more complex especially when dependencies exist
                       between tasks. As a result, hard-real time scheduling is harder to
                       achieve
 ??Example of True-Multi-Core designs: Master-Slave, SMP …



Different flavors of Multi-core

     SMP ( Symmetric Multi-processor)

            o   Identical processor cores

            o   Dynamic Task allocation ( each task can run on any identical processor)

            o   Shared view of Memory

                       Synhcronization and communication via shared memory

            o   Normally homogeneous CPU arrangement



     AMP (Asymmetric Multi-processor)

            o   Static Task allocation ( each processor is assigned a particular kind of task )

            o   Distributed or common view of memory

                       Synchronization and communication via message passing mechanism

            o   Either homogeneous or heterogeneous CPU cores

                       Cache coherency requires special attention



     Master-Slave MP architecture



            o   Master core is responsible for all I/O operations and uses all the cores as the
                slaves. It decides what task each core performs

            o   Slave core do not communicate each other but only thru a master core
OS + Multi-core Design:

Each CPU has its own OS

   •   Statically allocate physical memory to each CPU

   •   Each CPU runs its own independents OS

   •   Share peripherals

   •   Each CPU handles its processes system calls

   •   Used in early multiprocessor systems

   •   Simple to implement

   •   Avoids concurrency issues by not sharing

   •   Issues: 1. Each processor has its own scheduling queue.

         2. Each processor has its own memory partition.

         3. Consistency is an issue with independent disk buffer caches and

           potentially shared files.




OS + Master-Slave Multiprocessors

   •   OS mostly runs on a single fixed CPU.

   •   User-level applications run on the other CPUs.

   •   All system calls are passed to the Master CPU for processing

   •   Very little synchronisation required

   •   Single to implement

   •   Single centralised scheduler to keep all processors busy

   •   Memory can be allocated as needed to all CPUs.
•   Issues: Master CPU becomes the bottleneck.




OS + SMP

    •   OS kernel runs on all processors, while load and resources are balanced between all
        processors.

    •   One alternative: A single mutex (mutual exclusion object) that make the entire kernel a
        large critical section; Only one CPU can be in the kernel at a time; Only slight better than
        master-slave

    •   Better alternative: Identify independent parts of the kernel and make each of them their
        own critical section, which allows parallelism in the kernel

    •   Issues: A difficult task; Code is mostly similar to uniprocessor code; hard part is
        identifying independent parts that don’t interfere with each other

    •   CPUs connected via shared bus to shared memory

    •   Each processor has L1 Cache

    •   Any task can be running on any CPU, every CPU is equal for system

    •   No master-slave configuration

    •   Each processor able to access the entire memory map

    •   Each processor is non-unique and equal power




Application porting on Multi-core:

     Identify the threads (tasks) that can be executed concurrently by different cores

     How to choose these tasks ?

            o   Minimize inter-task dependencies

            o   Each task should have schedulable real-time characteristics for single core

            o   Avoid too short tasks because of overhead
o   Keep place for tuning at implementation stage

        o   Identify inter-task dependencies

        o   Inter-task dependencies may cause performance degradation as one core will
            have to wait for other cores and as a result to missing deadlines.

        o   Inter-task dependencies may affect your scheduler decisions

        o   Define what management and I/O tasks you assign to a “master” core and what
            is shared between several cores



 Memory management can be done both by master core and by all cores

 Ethernet and other I/O

 DMA

        o   Define a scheduling policies

 Take into account cache considerations and multicore :

                    1. For example, it may be more efficient to co-schedule two tasks that
                       are using the same working set in L2 cache

                    2. Running several “big” working sets on the different cores thrashing
                       each other in L2 at the same time may be painful

                    3. Data cache affinity – sometimes it is worth to give priority for task
                       to run on the same core and use advantage of “hot” cache

What is Cache Coherency?

   Cache coherency is a state where each processor in a multiprocessor system sees the same
    value for a data item in its cache as the value that is in System Memory.
 This state is transparent to the software but affects software performance

For Example:

•   Processor A and B both cache address x

•   A writes to x

        –   Updates cache

•   How does B find out?
There are many cache coherence protocols like:

           –   MESI

   MESI

    Modified

           o   Have modified the cached data, must write it back to memory

    Exclusive

           o   No other processor has it cached, can be modified

    Shared

           o   Not modified, other processors have cached it, if required to change have to
               inform other processor to invalidate cache line

    Invalid

           o   Cached line is no longer valid ( may be some other processor has updated it )

Specifics required to work with an MP core:

    Identification

           o   CPUID to uniquely identify CPU to software

           o   Ability to indicate need to memory coherent

    Can maintain memory coherency

           o   Caches can participate in MESI protocol

    Provides consistent view of memory

           o   With a defined memory ordering

           o   Atomic and synchronization primitives

    Communication with peers

           o   IPI

           o   Message passing

    Interrupt distribution
o   Interrupt distribution unit controlling individual processor Interrupt controller
               unit

Multi-core/Multi-Processor design issues:

    Cache coherency

    Design of multi-threaded applications for multi-core

           o   Functional decomposition

           o   Domain decomposition ( independent data sets )

    Snooping (Cache/Memory snooping)

    Interrupt distribution

    Processor affinity

    Inter-processor Interrupts

    Memory access

    Concurrency

           o   Interrupt

           o   Instruction/data

           o   Memory/peripherals

    Memory consistency/memory ordering model ( by hardware + by compiler
     optimization)

    SMP protection by OS/HW.

           o   Spinlocks

           o   Atomic operations for the basis of all protection tools (ARM LL/SC operation)

    Debugging tools

    Performance

    Profiling

Linux SMP Design:

    Process affinity
o     Each processor has runqueue

       o     Runqueue is list of all active processes , to be scheduled

 Load Balancing

       o     To shift process from one overloaded process to another symmetric processor

       o     Part of the scheduler

       o     Should maintain processor affinity for cache efficiency

 Interrupt Affinity

       o     Requires help from the hardware interrupt distribution system ( APIC )

       o     APIC controls interrupt going to only one of the core

       o     Linux interrupt provides cpu_set function to change APIC behavior

 Smp_processor_id

       o     Returns CPU identifier for which current code is executing

 Per-CPU variable

       o     Define per-cpu memory region at the start of the kernel where per-cpu
             variables will be placed

       o     Variable associated with a single core

       o     Variable defined as per-cpu creates an array of variables, one per CPU instance.

 Spinlock

       o     Disabling preemption and interrupts will not help in MP environment

 Big-lock

       o     Introduced in 2.2 kernel to serialize access across the system

 What about BH?

       o     Tasklets are executed on the processor that schedules it
Linux SMP booting:
Introduction to multi core

More Related Content

What's hot

Multivector and multiprocessor
Multivector and multiprocessorMultivector and multiprocessor
Multivector and multiprocessorKishan Panara
 
Multi_Core_Processor_2015_(Download it!)
Multi_Core_Processor_2015_(Download it!)Multi_Core_Processor_2015_(Download it!)
Multi_Core_Processor_2015_(Download it!)Sudip Roy
 
Hardware Multi-Threading
Hardware Multi-ThreadingHardware Multi-Threading
Hardware Multi-Threadingbabuece
 
Unit7 & 8 performance analysis and optimization
Unit7 & 8 performance analysis and optimizationUnit7 & 8 performance analysis and optimization
Unit7 & 8 performance analysis and optimizationleenachandra
 
Design of embedded systems
Design of embedded systemsDesign of embedded systems
Design of embedded systemsPradeep Kumar TS
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architecturesnextlib
 
Multicore processors and its advantages
Multicore processors and its advantagesMulticore processors and its advantages
Multicore processors and its advantagesNitesh Tudu
 
Chapter 08
Chapter 08Chapter 08
Chapter 08 Google
 
Dual Core Processor
Dual Core ProcessorDual Core Processor
Dual Core Processorfaiza nahin
 
Computer architecture multi core processor
Computer architecture multi core processorComputer architecture multi core processor
Computer architecture multi core processorMazin Alwaaly
 
CAN (Controller Area Network) Bus Protocol
CAN (Controller Area Network) Bus ProtocolCAN (Controller Area Network) Bus Protocol
CAN (Controller Area Network) Bus ProtocolAbhinaw Tiwari
 
Microcontroller presentation
Microcontroller presentationMicrocontroller presentation
Microcontroller presentationxavierpaulino
 
INTERRUPT LATENCY AND RESPONSE OF THE TASK
INTERRUPT LATENCY AND RESPONSE OF THE TASKINTERRUPT LATENCY AND RESPONSE OF THE TASK
INTERRUPT LATENCY AND RESPONSE OF THE TASKJOLLUSUDARSHANREDDY
 

What's hot (20)

Multi core processors
Multi core processorsMulti core processors
Multi core processors
 
Multivector and multiprocessor
Multivector and multiprocessorMultivector and multiprocessor
Multivector and multiprocessor
 
Multi_Core_Processor_2015_(Download it!)
Multi_Core_Processor_2015_(Download it!)Multi_Core_Processor_2015_(Download it!)
Multi_Core_Processor_2015_(Download it!)
 
Multicore Processors
Multicore ProcessorsMulticore Processors
Multicore Processors
 
Hardware Multi-Threading
Hardware Multi-ThreadingHardware Multi-Threading
Hardware Multi-Threading
 
Unit7 & 8 performance analysis and optimization
Unit7 & 8 performance analysis and optimizationUnit7 & 8 performance analysis and optimization
Unit7 & 8 performance analysis and optimization
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
Design of embedded systems
Design of embedded systemsDesign of embedded systems
Design of embedded systems
 
Distributed system
Distributed systemDistributed system
Distributed system
 
Bus
BusBus
Bus
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architectures
 
Multicore processors and its advantages
Multicore processors and its advantagesMulticore processors and its advantages
Multicore processors and its advantages
 
Chapter 08
Chapter 08Chapter 08
Chapter 08
 
Processors selection
Processors selectionProcessors selection
Processors selection
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
Dual Core Processor
Dual Core ProcessorDual Core Processor
Dual Core Processor
 
Computer architecture multi core processor
Computer architecture multi core processorComputer architecture multi core processor
Computer architecture multi core processor
 
CAN (Controller Area Network) Bus Protocol
CAN (Controller Area Network) Bus ProtocolCAN (Controller Area Network) Bus Protocol
CAN (Controller Area Network) Bus Protocol
 
Microcontroller presentation
Microcontroller presentationMicrocontroller presentation
Microcontroller presentation
 
INTERRUPT LATENCY AND RESPONSE OF THE TASK
INTERRUPT LATENCY AND RESPONSE OF THE TASKINTERRUPT LATENCY AND RESPONSE OF THE TASK
INTERRUPT LATENCY AND RESPONSE OF THE TASK
 

Viewers also liked

Multicore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiMulticore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiAnkit Raj
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecturePiyush Mittal
 
Multicore processor technology advantages and challenges
Multicore processor technology  advantages and challengesMulticore processor technology  advantages and challenges
Multicore processor technology advantages and challengeseSAT Journals
 
(Paper) Task scheduling algorithm for multicore processor system for minimiz...
 (Paper) Task scheduling algorithm for multicore processor system for minimiz... (Paper) Task scheduling algorithm for multicore processor system for minimiz...
(Paper) Task scheduling algorithm for multicore processor system for minimiz...Naoki Shibata
 
Overview of Nios II Embedded Processor
Overview of Nios II Embedded ProcessorOverview of Nios II Embedded Processor
Overview of Nios II Embedded ProcessorAltera Corporation
 
Operating Systems 1 (5/12) - Architectures (Unix)
Operating Systems 1 (5/12) - Architectures (Unix)Operating Systems 1 (5/12) - Architectures (Unix)
Operating Systems 1 (5/12) - Architectures (Unix)Peter Tröger
 
الديسلكسيا العسر القرائي
الديسلكسيا العسر القرائيالديسلكسيا العسر القرائي
الديسلكسيا العسر القرائيLAILAF_M
 
parallelization strategy
parallelization strategyparallelization strategy
parallelization strategyR. M.
 
Multithreading
MultithreadingMultithreading
Multithreadingsagsharma
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overviewRajiv Kumar
 
Core I3 Vs Core I5
Core I3 Vs Core I5Core I3 Vs Core I5
Core I3 Vs Core I5Ayeshasidhu
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors pptSiddhartha Anand
 
Arm processor architecture awareness session pi technologies
Arm processor architecture awareness session pi technologiesArm processor architecture awareness session pi technologies
Arm processor architecture awareness session pi technologiesPiTechnologies
 
29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technology29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technologySindhu Nathan
 
Embedded Systems - Training ppt
Embedded Systems - Training pptEmbedded Systems - Training ppt
Embedded Systems - Training pptNishant Kayal
 
Multi-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureMulti-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureUmair Amjad
 

Viewers also liked (20)

Multicore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiMulticore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash Prajapati
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecture
 
Multi core processor
Multi core processorMulti core processor
Multi core processor
 
Multicore processor technology advantages and challenges
Multicore processor technology  advantages and challengesMulticore processor technology  advantages and challenges
Multicore processor technology advantages and challenges
 
(Paper) Task scheduling algorithm for multicore processor system for minimiz...
 (Paper) Task scheduling algorithm for multicore processor system for minimiz... (Paper) Task scheduling algorithm for multicore processor system for minimiz...
(Paper) Task scheduling algorithm for multicore processor system for minimiz...
 
Overview of Nios II Embedded Processor
Overview of Nios II Embedded ProcessorOverview of Nios II Embedded Processor
Overview of Nios II Embedded Processor
 
Operating Systems 1 (5/12) - Architectures (Unix)
Operating Systems 1 (5/12) - Architectures (Unix)Operating Systems 1 (5/12) - Architectures (Unix)
Operating Systems 1 (5/12) - Architectures (Unix)
 
Multicore
MulticoreMulticore
Multicore
 
الديسلكسيا العسر القرائي
الديسلكسيا العسر القرائيالديسلكسيا العسر القرائي
الديسلكسيا العسر القرائي
 
parallelization strategy
parallelization strategyparallelization strategy
parallelization strategy
 
Multithreading
MultithreadingMultithreading
Multithreading
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
 
Core I3 Vs Core I5
Core I3 Vs Core I5Core I3 Vs Core I5
Core I3 Vs Core I5
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
 
Arm processor architecture awareness session pi technologies
Arm processor architecture awareness session pi technologiesArm processor architecture awareness session pi technologies
Arm processor architecture awareness session pi technologies
 
29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technology29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technology
 
Embedded Systems - Training ppt
Embedded Systems - Training pptEmbedded Systems - Training ppt
Embedded Systems - Training ppt
 
scheduling
schedulingscheduling
scheduling
 
Multi-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureMulti-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architecture
 
Arm Processor
Arm ProcessorArm Processor
Arm Processor
 

Similar to Introduction to multi core

Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.pptAberaZeleke1
 
Parallel Computing - Lec 3
Parallel Computing - Lec 3Parallel Computing - Lec 3
Parallel Computing - Lec 3Shah Zaib
 
Memory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdfMemory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdfrajaratna4
 
Symmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelSymmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelManoraj Pannerselum
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Sarwan ali
 
Lecture 6
Lecture  6Lecture  6
Lecture 6Mr SMAK
 
Lecture 6
Lecture  6Lecture  6
Lecture 6Mr SMAK
 
Lecture 6
Lecture  6Lecture  6
Lecture 6Mr SMAK
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computingNiranjana Ambadi
 
Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Subhajit Sahu
 
Multicore processor.pdf
Multicore processor.pdfMulticore processor.pdf
Multicore processor.pdfrajaratna4
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsTony Nguyen
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsYoung Alista
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsJames Wong
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsFraboni Ec
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsHoang Nguyen
 

Similar to Introduction to multi core (20)

Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.ppt
 
Parallel Computing - Lec 3
Parallel Computing - Lec 3Parallel Computing - Lec 3
Parallel Computing - Lec 3
 
Memory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdfMemory and Cache Coherence in Multiprocessor System.pdf
Memory and Cache Coherence in Multiprocessor System.pdf
 
Factored operating systems
Factored operating systemsFactored operating systems
Factored operating systems
 
Symmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelSymmetric multiprocessing and Microkernel
Symmetric multiprocessing and Microkernel
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
Multi-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IKMulti-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IK
 
Wiki 2
Wiki 2Wiki 2
Wiki 2
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)
 
Multicore processor.pdf
Multicore processor.pdfMulticore processor.pdf
Multicore processor.pdf
 
6.distributed shared memory
6.distributed shared memory6.distributed shared memory
6.distributed shared memory
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
 

Introduction to multi core

  • 1. Introduction to Multi-Core  A multi-core processor is an integrated circuit to which two or more processors have been attached.  Leads to o enhanced performance, reduced power consumption, and more efficient simultaneous processing of multiple tasks  What changes expected in software design: o To achieve competitive application performance on these new processors, many applications must be written (or rewritten) as parallel, multithreaded applications. o Multithreaded development can be difficult, expensive, time consuming, and error prone — and it requires new programming skill sets.  Adding cores results in additional overheads and latencies o Serializes execution between communicating and non-communicating cores (e.g. hardware barriers, fences, resource contention) o Various interdependent sources of latency and overhead  Architecture : cache coherency  System: processor scheduling  Application: synchronization o Sensitive to real workloads (e.g. data dependencies) o As the number of cores increase, the size of the overheads and latencies increases  Is Multiprocessor same as Multicore? o Multi-core, multiple cpu cores within a single processor o Multi-processor, multiple processor within a single chip
  • 2.  For software perspective, we can use either one of the term. Multiprocessor MultiCore Diagram
  • 3. Example of Multi-core Architecture:  ARMMPCORE  Two basic models of multi-core: o Each core acts independently - “multiple single cores” o Cores cooperate each other – “true multi-core”
  • 4. What is Multiple single core?  Each core acts independently o Pros  Simplifying a porting from Single core systems  The minimum of interaction between cores – less overhead and more predictable system  No cache coherency issues between the cores  Tools support may remain the same as it was for single core  Good scalability – however depends on hardware support o Cons  Load balancing issues – some cores maybe idle and some overloaded.  Hardware should support this mode of operations by providing I/O Queues for network interfaces. What is True Multi-core?  Cores cooperate each other o Pros  Better possibilities for balance loading meaning more effective usage of system resources  L1 instruction cache can be used more efficiently (cache affinity) o Cons  Porting from single core is typically more complicated  Possible cache coherency issues between the cores  System becomes more complex especially when dependencies exist between tasks. As a result, hard-real time scheduling is harder to achieve
  • 5.  ??Example of True-Multi-Core designs: Master-Slave, SMP … Different flavors of Multi-core  SMP ( Symmetric Multi-processor) o Identical processor cores o Dynamic Task allocation ( each task can run on any identical processor) o Shared view of Memory  Synhcronization and communication via shared memory o Normally homogeneous CPU arrangement  AMP (Asymmetric Multi-processor) o Static Task allocation ( each processor is assigned a particular kind of task ) o Distributed or common view of memory  Synchronization and communication via message passing mechanism o Either homogeneous or heterogeneous CPU cores  Cache coherency requires special attention  Master-Slave MP architecture o Master core is responsible for all I/O operations and uses all the cores as the slaves. It decides what task each core performs o Slave core do not communicate each other but only thru a master core
  • 6. OS + Multi-core Design: Each CPU has its own OS • Statically allocate physical memory to each CPU • Each CPU runs its own independents OS • Share peripherals • Each CPU handles its processes system calls • Used in early multiprocessor systems • Simple to implement • Avoids concurrency issues by not sharing • Issues: 1. Each processor has its own scheduling queue. 2. Each processor has its own memory partition. 3. Consistency is an issue with independent disk buffer caches and potentially shared files. OS + Master-Slave Multiprocessors • OS mostly runs on a single fixed CPU. • User-level applications run on the other CPUs. • All system calls are passed to the Master CPU for processing • Very little synchronisation required • Single to implement • Single centralised scheduler to keep all processors busy • Memory can be allocated as needed to all CPUs.
  • 7. Issues: Master CPU becomes the bottleneck. OS + SMP • OS kernel runs on all processors, while load and resources are balanced between all processors. • One alternative: A single mutex (mutual exclusion object) that make the entire kernel a large critical section; Only one CPU can be in the kernel at a time; Only slight better than master-slave • Better alternative: Identify independent parts of the kernel and make each of them their own critical section, which allows parallelism in the kernel • Issues: A difficult task; Code is mostly similar to uniprocessor code; hard part is identifying independent parts that don’t interfere with each other • CPUs connected via shared bus to shared memory • Each processor has L1 Cache • Any task can be running on any CPU, every CPU is equal for system • No master-slave configuration • Each processor able to access the entire memory map • Each processor is non-unique and equal power Application porting on Multi-core:  Identify the threads (tasks) that can be executed concurrently by different cores  How to choose these tasks ? o Minimize inter-task dependencies o Each task should have schedulable real-time characteristics for single core o Avoid too short tasks because of overhead
  • 8. o Keep place for tuning at implementation stage o Identify inter-task dependencies o Inter-task dependencies may cause performance degradation as one core will have to wait for other cores and as a result to missing deadlines. o Inter-task dependencies may affect your scheduler decisions o Define what management and I/O tasks you assign to a “master” core and what is shared between several cores  Memory management can be done both by master core and by all cores  Ethernet and other I/O  DMA o Define a scheduling policies  Take into account cache considerations and multicore : 1. For example, it may be more efficient to co-schedule two tasks that are using the same working set in L2 cache 2. Running several “big” working sets on the different cores thrashing each other in L2 at the same time may be painful 3. Data cache affinity – sometimes it is worth to give priority for task to run on the same core and use advantage of “hot” cache What is Cache Coherency?  Cache coherency is a state where each processor in a multiprocessor system sees the same value for a data item in its cache as the value that is in System Memory.  This state is transparent to the software but affects software performance For Example: • Processor A and B both cache address x • A writes to x – Updates cache • How does B find out?
  • 9. There are many cache coherence protocols like: – MESI MESI  Modified o Have modified the cached data, must write it back to memory  Exclusive o No other processor has it cached, can be modified  Shared o Not modified, other processors have cached it, if required to change have to inform other processor to invalidate cache line  Invalid o Cached line is no longer valid ( may be some other processor has updated it ) Specifics required to work with an MP core:  Identification o CPUID to uniquely identify CPU to software o Ability to indicate need to memory coherent  Can maintain memory coherency o Caches can participate in MESI protocol  Provides consistent view of memory o With a defined memory ordering o Atomic and synchronization primitives  Communication with peers o IPI o Message passing  Interrupt distribution
  • 10. o Interrupt distribution unit controlling individual processor Interrupt controller unit Multi-core/Multi-Processor design issues:  Cache coherency  Design of multi-threaded applications for multi-core o Functional decomposition o Domain decomposition ( independent data sets )  Snooping (Cache/Memory snooping)  Interrupt distribution  Processor affinity  Inter-processor Interrupts  Memory access  Concurrency o Interrupt o Instruction/data o Memory/peripherals  Memory consistency/memory ordering model ( by hardware + by compiler optimization)  SMP protection by OS/HW. o Spinlocks o Atomic operations for the basis of all protection tools (ARM LL/SC operation)  Debugging tools  Performance  Profiling Linux SMP Design:  Process affinity
  • 11. o Each processor has runqueue o Runqueue is list of all active processes , to be scheduled  Load Balancing o To shift process from one overloaded process to another symmetric processor o Part of the scheduler o Should maintain processor affinity for cache efficiency  Interrupt Affinity o Requires help from the hardware interrupt distribution system ( APIC ) o APIC controls interrupt going to only one of the core o Linux interrupt provides cpu_set function to change APIC behavior  Smp_processor_id o Returns CPU identifier for which current code is executing  Per-CPU variable o Define per-cpu memory region at the start of the kernel where per-cpu variables will be placed o Variable associated with a single core o Variable defined as per-cpu creates an array of variables, one per CPU instance.  Spinlock o Disabling preemption and interrupts will not help in MP environment  Big-lock o Introduced in 2.2 kernel to serialize access across the system  What about BH? o Tasklets are executed on the processor that schedules it