SlideShare a Scribd company logo
1 of 26
Download to read offline
INTRODUCTION TO
HETEROGENEOUS SYSTEM
ARCHITECTURE
Presenter: BingRu Wu
Outline
◻ Introduction
◻ Goal
◻ Concept
◻ Memory Model
◻ System Components
Introduction
◻ HSA: Heterogeneous System Architecture
◻ Promising future:
◻ Arm processors producers
◻ GPU vendors: AMD, Imaginations
◻ Fully utilize computation resource
◻ Our system may connect to major
application base with supporting HSA
Goal of HSA
◻ Remove programmability barrier
◻ Memory space barrier
◻ Access latency among devices
◻ Backward compatible
◻ Utilize existing programming models
Concept of HSA
Abstract
◻ Two kinds of compute unit
◻ LCU: Latency Compute Unit (ex. CPU)
◻ TCU: Throughput Compute Unit (ex. GPU)
◻ Merged memory space
Memory Management (1/2)
◻ Shared page table
◻ Memory is shared by all devices
◻ No longer host to device copy and vice versa
◻ Support pointer data structure (ex. list)
◻ Page faulting
◻ Virtual memory space for all devices
◻ ex. GPU now can use memory as if it has
whole memory space
Memory Management (2/2)
◻ Coherent memory regions
◻ The memory is coherent
◻ Shared among all devices (CUs)
◻ Unified address space
◻ Memory type separated by address
◻ Private / local / global memory decided by
memory region
◻ No special instruction is required
User-Level Command Queue
◻ Queues for communication
◻ User to device
◻ Device to device
◻ HSA runtime handles the queue
◻ Allocation & destruction
◻ Each per application
◻ Vendor dependent implementation
◻ Direct access to devices
◻ No OS syscall
◻ No task managing
Hardware Scheduler (1/3)
◻ No real scheduling on TCU (GPU)
◻ Task scheduling
◻ Task preemption
◻ Current implementation
◻ Execute without lock:
◻ All threads execute
◻ Multiple tasks cause error result
Hardware Scheduler (2/3)
◻ Current implementation
◻ Execute with lock:
◻ Code exception may cause the resource being
locked up
◻ Long runtime tasks prevent others from
execution
◻ We may fail to finish critical jobs
Hardware Scheduler (3/3)
HSA runtime guarantees:
◻ Bounded execution time
◻ Any process cease in reasonable time
◻ Fast switch among applications
◻ Use hardware to save time
◻ Application level parallelism
HSAIL (1/2)
◻ HSA Intermediate Language
◻ The language for TCU
◻ Similar to “PTX” code
◻ No graphic-specific instructions
◻ Further translated to HW ISA (by Finalizer)
◻ The abstract platform is similar to OpenCL
◻ Work item (thread)
◻ Work group (block)
◻ NDRange (grid)
HSAIL (2/2)
Memory Model
◻ All types of memory using same space
◻ Memory access behavior
◻ Not all regions are accessible by all devices
◻ OS kernel should not be accessible
◻ Mapping to a region in kernel is still possible
◻ Accessing identical address may gives
different values
◻ Work item private memory
◻ Work group local memory
◻ Accessing other item / group is not valid
Virtual Memory Address
◻ Global
◻ The memory shared by all LCU & TCU
◻ Accessible via work item / group
◻ Group
◻ The memory shared by all work items in the
same group
◻ Private
◻ The memory only visible by a work item
Memory Region
◻ Kernarg
◻ The memory for kernel arguments
◻ Kernel is the code fragment we ask a device
to run on
◻ Readonly
◻ Read-only type of global memory
◻ Spill
◻ Memory for register spill
◻ Arg
◻ Memory for function call arguments
Memory Region
Memory Consistency
◻ LCU
◻ LCU maintains its own consistency
◻ Shares global memory
◻ Work item
◻ Memory operation to same address by single
work item is in order
◻ Memory operations to different address may
be reordered
◻ Other than that, nothing is guaranteed
System Components
HSA System
Compilation
◻ Frontend
◻ LLVM IR
◻ No data dependency
◻ Backend
◻ Convert IR to HSAIL
◻ Optimization happens
here
◻ Binary format
◻ ELF format
◻ Embedded container for
HSAIL (BRIG)
Runtime
◻ HSA runtime
◻ Issue tasks to device
protocol
◻ Device
◻ Convert HSAIL to ISA with
Finalizer
HSAIL Program Features
◻ Backward Compatible
◻ A system without HSA support should still
run the executable
◻ Function Invocation
◻ LCU functions may call LCU ones
◻ TCU functions may call TCU ones with
Finalizer support
◻ LCU to TCU / TCU to LCU is supported by
using queue
◻ C++ compatible
Conclusion
◻ HSA is an open and standard layer
between software / hardware
◻ The cardinal feature of HSA is the unified
virtual memory space
◻ No replacement for current programming
framework, no new language is required
Reference
◻ Heterogeneous System Architecture: A
Technical Review
◻ HSA Programmer’s Reference Manual
◻ HSAIL: Write-Once-Run-Everywhere for
Heterogeneous Systems

More Related Content

What's hot

Sharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjaySharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjayGluster.org
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Avoiding Data Hotspots at Scale
Avoiding Data Hotspots at ScaleAvoiding Data Hotspots at Scale
Avoiding Data Hotspots at ScaleScyllaDB
 
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeKernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeAnne Nicolas
 
Unikraft: Fast, Specialized Unikernels the Easy Way
Unikraft: Fast, Specialized Unikernels the Easy WayUnikraft: Fast, Specialized Unikernels the Easy Way
Unikraft: Fast, Specialized Unikernels the Easy WayScyllaDB
 
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-VRISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-VScyllaDB
 
Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreadingFraboni Ec
 
Many Cores Java - Session One: Threads and Threads
Many Cores Java - Session One: Threads and ThreadsMany Cores Java - Session One: Threads and Threads
Many Cores Java - Session One: Threads and ThreadsRobert Burrell Donkin
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory MultiprocessorsSalvatore La Bua
 
Haskell-related part of speech in ONLab
Haskell-related part of speech in ONLabHaskell-related part of speech in ONLab
Haskell-related part of speech in ONLabDmitry Zuikov
 
The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421Linaro
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking MechanismsKernel TLV
 
Performance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for DatabasesPerformance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for DatabasesScyllaDB
 
Life as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiLife as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiGluster.org
 

What's hot (20)

Apache CouchDB
Apache CouchDBApache CouchDB
Apache CouchDB
 
Sharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjaySharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika Dhananjay
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Avoiding Data Hotspots at Scale
Avoiding Data Hotspots at ScaleAvoiding Data Hotspots at Scale
Avoiding Data Hotspots at Scale
 
Linux logging
Linux loggingLinux logging
Linux logging
 
Multicore
MulticoreMulticore
Multicore
 
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeKernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
 
Unikraft: Fast, Specialized Unikernels the Easy Way
Unikraft: Fast, Specialized Unikernels the Easy WayUnikraft: Fast, Specialized Unikernels the Easy Way
Unikraft: Fast, Specialized Unikernels the Easy Way
 
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-VRISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
 
Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreading
 
Threads and Threads
Threads and ThreadsThreads and Threads
Threads and Threads
 
Many Cores Java - Session One: Threads and Threads
Many Cores Java - Session One: Threads and ThreadsMany Cores Java - Session One: Threads and Threads
Many Cores Java - Session One: Threads and Threads
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory Multiprocessors
 
Haskell-related part of speech in ONLab
Haskell-related part of speech in ONLabHaskell-related part of speech in ONLab
Haskell-related part of speech in ONLab
 
CUDA
CUDACUDA
CUDA
 
The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking Mechanisms
 
Performance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for DatabasesPerformance Analysis and Troubleshooting Methodologies for Databases
Performance Analysis and Troubleshooting Methodologies for Databases
 
Life as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiLife as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan Rossi
 
Lecture1
Lecture1Lecture1
Lecture1
 

Viewers also liked

Interwiew about ABC Sealants (PVC Vitrini)
Interwiew about ABC Sealants (PVC Vitrini)Interwiew about ABC Sealants (PVC Vitrini)
Interwiew about ABC Sealants (PVC Vitrini)MURAT KARADAYI
 
Next a new face heaven memorial park
Next a new face heaven memorial parkNext a new face heaven memorial park
Next a new face heaven memorial parkLie Jeffri L Tjiputra
 
Godhead 5 - His Teaching in our Past History
Godhead 5 - His Teaching in our Past HistoryGodhead 5 - His Teaching in our Past History
Godhead 5 - His Teaching in our Past HistorySami Wilberforce
 
CIAT views on Extractive Industries
CIAT views on Extractive IndustriesCIAT views on Extractive Industries
CIAT views on Extractive IndustriesMiguel Pecho
 
An updated look at social network extraction system a personal data analysis ...
An updated look at social network extraction system a personal data analysis ...An updated look at social network extraction system a personal data analysis ...
An updated look at social network extraction system a personal data analysis ...eSAT Publishing House
 
The feel of silk bagian pertama
The feel of silk bagian pertamaThe feel of silk bagian pertama
The feel of silk bagian pertamaBerti Subagijo
 
Dramtic reading assignment english
Dramtic reading assignment englishDramtic reading assignment english
Dramtic reading assignment englishBradymort9
 
Developing of decision support system for budget allocation of an r&d organiz...
Developing of decision support system for budget allocation of an r&d organiz...Developing of decision support system for budget allocation of an r&d organiz...
Developing of decision support system for budget allocation of an r&d organiz...eSAT Publishing House
 
Atkinson et al 2015 Length-weight Emerald shiner
Atkinson et al 2015 Length-weight Emerald shinerAtkinson et al 2015 Length-weight Emerald shiner
Atkinson et al 2015 Length-weight Emerald shinerThomas Simon
 

Viewers also liked (14)

Interwiew about ABC Sealants (PVC Vitrini)
Interwiew about ABC Sealants (PVC Vitrini)Interwiew about ABC Sealants (PVC Vitrini)
Interwiew about ABC Sealants (PVC Vitrini)
 
Sa electronic-pune
Sa electronic-puneSa electronic-pune
Sa electronic-pune
 
Next a new face heaven memorial park
Next a new face heaven memorial parkNext a new face heaven memorial park
Next a new face heaven memorial park
 
Godhead 5 - His Teaching in our Past History
Godhead 5 - His Teaching in our Past HistoryGodhead 5 - His Teaching in our Past History
Godhead 5 - His Teaching in our Past History
 
Product Liability
Product LiabilityProduct Liability
Product Liability
 
CIAT views on Extractive Industries
CIAT views on Extractive IndustriesCIAT views on Extractive Industries
CIAT views on Extractive Industries
 
An updated look at social network extraction system a personal data analysis ...
An updated look at social network extraction system a personal data analysis ...An updated look at social network extraction system a personal data analysis ...
An updated look at social network extraction system a personal data analysis ...
 
The feel of silk bagian pertama
The feel of silk bagian pertamaThe feel of silk bagian pertama
The feel of silk bagian pertama
 
Dramtic reading assignment english
Dramtic reading assignment englishDramtic reading assignment english
Dramtic reading assignment english
 
Social networks
Social networksSocial networks
Social networks
 
Developing of decision support system for budget allocation of an r&d organiz...
Developing of decision support system for budget allocation of an r&d organiz...Developing of decision support system for budget allocation of an r&d organiz...
Developing of decision support system for budget allocation of an r&d organiz...
 
DEEPU KUMAR CV
DEEPU KUMAR CVDEEPU KUMAR CV
DEEPU KUMAR CV
 
Atkinson et al 2015 Length-weight Emerald shiner
Atkinson et al 2015 Length-weight Emerald shinerAtkinson et al 2015 Length-weight Emerald shiner
Atkinson et al 2015 Length-weight Emerald shiner
 
пасха презентация
пасха презентацияпасха презентация
пасха презентация
 

Similar to Introduction to HSA

C for Cuda - Small Introduction to GPU computing
C for Cuda - Small Introduction to GPU computingC for Cuda - Small Introduction to GPU computing
C for Cuda - Small Introduction to GPU computingIPALab
 
Concurrency, Parallelism And IO
Concurrency,  Parallelism And IOConcurrency,  Parallelism And IO
Concurrency, Parallelism And IOPiyush Katariya
 
LCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLinaro
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8AbdullahMunir32
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsnARUNACHALAM468781
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overviewinside-BigData.com
 
HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Foundation
 
Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Subhajit Sahu
 
Evolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave ProbertEvolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave Probertyang
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective HSA Foundation
 
UCL All of the Things (MeetBSD California 2014 Lightning Talk)
UCL All of the Things (MeetBSD California 2014 Lightning Talk)UCL All of the Things (MeetBSD California 2014 Lightning Talk)
UCL All of the Things (MeetBSD California 2014 Lightning Talk)iXsystems
 
Basic Computer Architeccture
Basic Computer ArchitecctureBasic Computer Architeccture
Basic Computer ArchitecctureShreerajKhatiwada
 
Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersVaibhav Sharma
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmAnne Nicolas
 
5.6 Basic computer structure microprocessors
5.6 Basic computer structure   microprocessors5.6 Basic computer structure   microprocessors
5.6 Basic computer structure microprocessorslpapadop
 

Similar to Introduction to HSA (20)

C for Cuda - Small Introduction to GPU computing
C for Cuda - Small Introduction to GPU computingC for Cuda - Small Introduction to GPU computing
C for Cuda - Small Introduction to GPU computing
 
Concurrency, Parallelism And IO
Concurrency,  Parallelism And IOConcurrency,  Parallelism And IO
Concurrency, Parallelism And IO
 
LCU13: HSA Architecture Presentation
LCU13: HSA Architecture PresentationLCU13: HSA Architecture Presentation
LCU13: HSA Architecture Presentation
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
 
HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013
 
Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)
 
Implement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVMImplement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVM
 
Oct2009
Oct2009Oct2009
Oct2009
 
Evolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave ProbertEvolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave Probert
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective
 
UCL All of the Things (MeetBSD California 2014 Lightning Talk)
UCL All of the Things (MeetBSD California 2014 Lightning Talk)UCL All of the Things (MeetBSD California 2014 Lightning Talk)
UCL All of the Things (MeetBSD California 2014 Lightning Talk)
 
2337610
23376102337610
2337610
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
Ch8
Ch8Ch8
Ch8
 
Basic Computer Architeccture
Basic Computer ArchitecctureBasic Computer Architeccture
Basic Computer Architeccture
 
Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & Containers
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
 
5.6 Basic computer structure microprocessors
5.6 Basic computer structure   microprocessors5.6 Basic computer structure   microprocessors
5.6 Basic computer structure microprocessors
 

Recently uploaded

Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 

Recently uploaded (20)

Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 

Introduction to HSA

  • 2. Outline ◻ Introduction ◻ Goal ◻ Concept ◻ Memory Model ◻ System Components
  • 3. Introduction ◻ HSA: Heterogeneous System Architecture ◻ Promising future: ◻ Arm processors producers ◻ GPU vendors: AMD, Imaginations ◻ Fully utilize computation resource ◻ Our system may connect to major application base with supporting HSA
  • 4. Goal of HSA ◻ Remove programmability barrier ◻ Memory space barrier ◻ Access latency among devices ◻ Backward compatible ◻ Utilize existing programming models
  • 6. Abstract ◻ Two kinds of compute unit ◻ LCU: Latency Compute Unit (ex. CPU) ◻ TCU: Throughput Compute Unit (ex. GPU) ◻ Merged memory space
  • 7. Memory Management (1/2) ◻ Shared page table ◻ Memory is shared by all devices ◻ No longer host to device copy and vice versa ◻ Support pointer data structure (ex. list) ◻ Page faulting ◻ Virtual memory space for all devices ◻ ex. GPU now can use memory as if it has whole memory space
  • 8. Memory Management (2/2) ◻ Coherent memory regions ◻ The memory is coherent ◻ Shared among all devices (CUs) ◻ Unified address space ◻ Memory type separated by address ◻ Private / local / global memory decided by memory region ◻ No special instruction is required
  • 9. User-Level Command Queue ◻ Queues for communication ◻ User to device ◻ Device to device ◻ HSA runtime handles the queue ◻ Allocation & destruction ◻ Each per application ◻ Vendor dependent implementation ◻ Direct access to devices ◻ No OS syscall ◻ No task managing
  • 10. Hardware Scheduler (1/3) ◻ No real scheduling on TCU (GPU) ◻ Task scheduling ◻ Task preemption ◻ Current implementation ◻ Execute without lock: ◻ All threads execute ◻ Multiple tasks cause error result
  • 11. Hardware Scheduler (2/3) ◻ Current implementation ◻ Execute with lock: ◻ Code exception may cause the resource being locked up ◻ Long runtime tasks prevent others from execution ◻ We may fail to finish critical jobs
  • 12. Hardware Scheduler (3/3) HSA runtime guarantees: ◻ Bounded execution time ◻ Any process cease in reasonable time ◻ Fast switch among applications ◻ Use hardware to save time ◻ Application level parallelism
  • 13. HSAIL (1/2) ◻ HSA Intermediate Language ◻ The language for TCU ◻ Similar to “PTX” code ◻ No graphic-specific instructions ◻ Further translated to HW ISA (by Finalizer) ◻ The abstract platform is similar to OpenCL ◻ Work item (thread) ◻ Work group (block) ◻ NDRange (grid)
  • 16. ◻ All types of memory using same space ◻ Memory access behavior ◻ Not all regions are accessible by all devices ◻ OS kernel should not be accessible ◻ Mapping to a region in kernel is still possible ◻ Accessing identical address may gives different values ◻ Work item private memory ◻ Work group local memory ◻ Accessing other item / group is not valid Virtual Memory Address
  • 17. ◻ Global ◻ The memory shared by all LCU & TCU ◻ Accessible via work item / group ◻ Group ◻ The memory shared by all work items in the same group ◻ Private ◻ The memory only visible by a work item Memory Region
  • 18. ◻ Kernarg ◻ The memory for kernel arguments ◻ Kernel is the code fragment we ask a device to run on ◻ Readonly ◻ Read-only type of global memory ◻ Spill ◻ Memory for register spill ◻ Arg ◻ Memory for function call arguments Memory Region
  • 19. Memory Consistency ◻ LCU ◻ LCU maintains its own consistency ◻ Shares global memory ◻ Work item ◻ Memory operation to same address by single work item is in order ◻ Memory operations to different address may be reordered ◻ Other than that, nothing is guaranteed
  • 22. Compilation ◻ Frontend ◻ LLVM IR ◻ No data dependency ◻ Backend ◻ Convert IR to HSAIL ◻ Optimization happens here ◻ Binary format ◻ ELF format ◻ Embedded container for HSAIL (BRIG)
  • 23. Runtime ◻ HSA runtime ◻ Issue tasks to device protocol ◻ Device ◻ Convert HSAIL to ISA with Finalizer
  • 24. HSAIL Program Features ◻ Backward Compatible ◻ A system without HSA support should still run the executable ◻ Function Invocation ◻ LCU functions may call LCU ones ◻ TCU functions may call TCU ones with Finalizer support ◻ LCU to TCU / TCU to LCU is supported by using queue ◻ C++ compatible
  • 25. Conclusion ◻ HSA is an open and standard layer between software / hardware ◻ The cardinal feature of HSA is the unified virtual memory space ◻ No replacement for current programming framework, no new language is required
  • 26. Reference ◻ Heterogeneous System Architecture: A Technical Review ◻ HSA Programmer’s Reference Manual ◻ HSAIL: Write-Once-Run-Everywhere for Heterogeneous Systems