This document discusses trends in software platforms for heterogeneous multi-core systems and open source community activities. It describes how hardware limitations led to the rise of multi-core processors and classifications of multi-processor systems. Issues with programming heterogeneous systems are outlined along with solutions like OpenCL and efforts by organizations like ETRI and OpenSEED to develop software for these systems.
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Trends of SW Platforms for Heterogeneous Systems
1. Trends of SW Platforms for Heterogeneous Multi-core systems
and
Open Source Community Activities
Seung-hwa Song
2. Trends of SW technologies for
Heterogeneous Multi-core systems
Research and Efforts to overcome Issues about H.S
Open source community activities
3. Why has the Multi-core Era Arrived?
Limitations of Moore’s Law (Performance Issue)
Performance-oriented design is not the main market needs today.
Performance per Watt is the main requirement. (Energy Issue)
Lower power consumption of computers is becoming more and more
important
4. Why has the Multi-core Era Arrived?
Performance per watt and power consumption of three kinds of processors
5. Classification of Multi Processor
SMP(Symmetric Multi Processor)
- Homogeneous architecture
AMP(Asymmetric Multi Processor)
- Heterogeneous architecture
6. - Each processor can access a common memory map
- Any task can be allocated to any given processor
SMP (homogeneous architecture)
7. ASMP system (heterogeneous architecture)
- Each processor has different structures for specialized purposes.
-Each system consists of a master processor and slave processors. The
slave processors communicate with the master processor.
- Some (slave) processors cannot access the common memory map and
some are designated only a special role instead.
- Programmers should understand (the unique) tasks on/of/ each processor
and consider(try to create an) efficient communication mechanism.
8. Comparisons between
Homogeneous and Heterogeneous Computing
Symmetric, Same cores
(Usually CPUs)
Assymmetric, Different cores
(CPUs, GPUs, DSPs and accelerators)
operation is guaranteed to be same at each core operation cannot be supposed to be same at each
core
easy to off load tasks more complicated to off load tasks
good compatibility less compatibility
specialized for specific tasks
9. Overview of MPSoC solutions
Lucent Daytona(2000)
- First MPSoC
- Application : wireless communication router
- Symmetric
- A common memory map
C-5 network Processor(2001)
- Application : Network packet processor
- Asymmetric
10. Overview of MPSoC solutions
Texas Instruments OMAP architecture (2004)
- Application : cell phone processor
- ARM9 (master) and TMS320C55x DSP
(slave)
- Asymmetric
Texas Instruments’ Davinci
- Application : multimedia processor
- ARM Cortex-A8, ARM M3, DSP, codec accelerator
ARM MPcore
- main applications
(networking,
file I/O, UI)
- control slave cores
Video codec accelerators
- Video compression
Data bus
DSP
- Image processing
11. Today, most embedded system processors are heterogeneous.
Even though ASMP is specialized for designer’s goal, higher performance is
always required.
Recent MPSoC architecture integrates both SMP and ASMP structures.
Overview of MPSoC solutions
12. Overview of MPSoC solutions
AMD’s APU(Accelerated Processing Unit) Llano(2011)
First CPU-GPU fused processor
Intel’s sandy bridge processor
CPU-GPU fused processing unit
14. Software Issues with Heterogeneous Systems
Offloading
- Task offloading is a main goal of multi core system
- In heterogeneous system, task offloading is not easy
Data sharing
- Overhead of data transferring via memory bus is important issue
- The results from each processing unit should be integrated
- the number of memory copy should be minimized.
Programmability
- S/W development productivity is important
15. Software Issues with Heterogeneous
Systems, Continued
How can programmers develop S/W for each
different processor easily? (Usability)
How can we move code from a system to
other systems? (Portability)
16. HSA Foundations
HSA creates an improved processor design that exposes the benefits and capabilities of mainstream
programmable computer elements. Each part works together seamlessly.
19. Open projects for parallel computing
OpenMP(Only CPU)
OpenACC(CPU, GPU)
OpenCL(Various processors)
20. Introduction to OpenCL
Open Computing Language (OpenCL) is a framework for writing programs
that execute across heterogeneous platforms consisting of CPUs, GPUs,
DSPs, FPGAs and other processors. (Source: Wikipedia)
OpenCL is an open standard maintained by the Khronos Group.
Programming model executable across various types of processors
21. Introduction to OpenCL
the abstract concept of the modern high-level programming language is
abandoned in OpenCL
OpenCL provides an abstract programming model for heterogeneous
hardware so that programmers are able to control processor resources
more flexibly
While Nvidia’s CUDA is a solution to maximize use of only GPUs, the goal of
OpenCL is to utilize any available processor resources
But main use of OpenCL is focused on GPUs currently.
25. Activities of ETRI
The Industrial S/W platform technology for
heterogeneous systems was weak.
OS, platforms and software libraries for
heterogeneous multi core systems are
becoming more and more important.
28. - Advanced OS kernel
- CPU-GPU load balancing enhancement
- IDE tool supporting S/W development based on heterogeneous multi-core
- Power consumption measurement of multi-core processor
Research by ETRI
34. Conclusion
The heterogeneous system era has already arrived.
Open projects and organizations are supporting the SW platform standard for
heterogeneous systems.
Software platforms and its advances are essential because heterogenous
systems are sophisticated.
New technologies should be distributed to contribute to the industry based on
heterogeneous systems.
35. Thank you
You can download this presentation file at
http://sshlab.blogspot.com
Good morning, my name is Seung-hwa Song.
Today, I am going to talk about a really interesting emerging trend in the computer industry.
The computer industry has experienced great development with semiconductor.
The processor units became faster and faster over the past decades. However, now the multi-core processor era has been arrived because boosting the speed of single-core processor’s speeds is no longer sufficient.
First, I will talk about multi-core systems and its SW technoloigy trends.
And then, I will present some interesting research and the efforts to solve software issues about heterogeneous systems.
I will finally discuss open platform for heterogeneous systems and role of open source organizations
The most important change in the market trends is that the only performance-oriented developement of computers are not the main requirement, but performance per watt.
/With the rapid proliferation of mobile devices, from laptops to the smart phone boom, people prefer to use their mobile devices for a long time without having to recharge their batteries
The growth of small-scale SoC technology has helped boost that change.
It is common knowledge that we cannot easily boost the clock speed of the processing unit with modern semiconductor technology so people started to consider increasing the number of processor instead of clock speed.
As we boost clock speed, electric power consumption increases exponentially.
/Too much power consumption creates overheating which may damage semiconductions. But faster computers are a necessity and that is why the multi-core system is becoming increasingly popular.
This picture describes the performance per watt of each processor.
We can see many core processor use much less power than GPUs or quad processors..
There are two kinds of multi processor design.
Symmetric, and Asymmetric Multi Processor
In SMP design, every processor is identical.
All processors can access a common main memory and share data through a memory bus.
ASMP is heterogeneous.
Each processor has different architecture.
Usually, one core works as a master processor and communicates with other slave processors.
In some different designs, some slave processors do not connect to the main memory and work seperately.
Because the homogeneous systems are symmetric each core is supposed to operate similary.
So we can easily off load tasks from one core to another without additional effort.
On the other hand, the heterogeneous systems consist of various specialized cores for specific tasks such as GPUs or codec accelerators.
These special cores are less compatible but consume less power than symmetric CPUs..
The multi-core system has developed due to an advance of the SoC
Let’s take a look at the history of multi processing system-on-chip
Lucent Daytona, the first MPSoC, is designed for wireless communication routers in 2000.
Early MPSoC design was symmetric and has a common memory map as is expected.
OMAP is a famous architecture for cell phone developed by Texas Instruments
It includes an ARM9 processor as the master processor and a DSP as a slave (processor)
The davinci core is another chipset model optimized for multimedia processing.
I used this a few years ago. I ported main application on the ARM processor. I could use legacy linux OS and software libraries easily.
But other graphical processing jobs are off loaded into DSP and video codec accelerators.
Today, most embedded system processors are heterogeneous.
Because heterogeneous system is better to save energy.
Even though ASMP is specialized for designer’s goal, higher performance is always required.
This is because many recent MPSoC architecture integrates both SMP and AMP
The first CPU-GPU fused design was launched in the personal computer industry.
AMD announced the first APU which low-powered Multi core CPU and advanced GPU are fused in one die.
Interestingly, It seems that PC users don’t want only high performance computers but also worry about electricity fee too.
Now this architecture solved both the performance and power consumption problem. And Intel is also following that trend.
The most pressing market issue is mobile devices.
Many processor vendors launched various cores to satisfy market needs.
These processors are all heterogeneous multi processors including GPU and symmetric CPUs.
There are some important software issues about heterogeneous system.
The main goal of multi-core system is task off loading.
It was not that big a problem in a homogeneous multi core system. Since all processors were the same, tasks could be computed on any core without any change of software.
However, in heterogeneous system, we cannot port the legacy software libraries operated on CPUs to GPUs or DSPs.
The second issue is data sharing.
Good or bad, task off loading causes another overhead of data transferring through the memory bus.
The results from each processing unit should be integrated so the final computation is significantly limited by the memory bus speed.
This is because software engineers should design software to make the number of memory copy between cores reduced as many as possible.
What I really want to talk about today is programmability, because programmability defines productivity.
When we program on heterogeneous systems, we have to learn characteristics of each different core and software development environment.
Even though processor vendors provide supporting packages for programmers, it is difficult for programmers to learn every programming environment.
How can we develop software easily for each different environment?
Many legacy software platforms, OSs and libraries support only major CPUs such as Intel, ARM, and Power PC.
We have to rewrite code for minor processors if we want to off load them.
To overcome these issues, the heterogeneous system association has been created.
This organization proposes software platform architecture for the heterogeneous system.
This is the HSA Solution Stack.
While legacy OS and applications are on CPU and GPU hardware, HSA Runtime Infrastructure covers GPU, ACC and legacy OS.
This low level software stack is abstraction layer of heterogeneous processors.
HSA Accelerated applications layered over the infrastructure layer defines programming languages such as OpenCL, C++AMP, Python, and Javascript.
These are commercial solutions for parallel computing.
AMD’s APP SDK supports AMD processors. Since AMD is more open source-oriented company than Intel, it is based on OpenCL.
Intel provides a parallel studio for programming on Intel processors.
Nvidia provides CUDA which supports Nvidia’s GPUs. Its language grammar is similar with OpenCL but designed only for GPU utilization.
There are also many open source solutions for parallel computing
I cannot cover them in this presentation, but I will tell more details about OpenCL.
The OpenCL is a programming language proposed by Apple firstly.
Now its standard is maintained by the Khronos Group.
OpenCL is designed for writing programs that is executable across various types of processors.
This is an OpenCL platform model.
The OpenCL platform has only one host which is connected to one or more OpenCL devices.
The host is usually operated with a master CPU.
Each OpenCL device includes one or more Compute Units
And the compute units consist of one or more ‘processing elements’.
Actual computation on a device occurs within the processing elements.
There are open projects supporting the OpenCL.
WebCL, WebGL and OpenGL is also standardized by Khronos group.
Some projects related to image processing such as OpenCV and FFmpeg are also supporting OpenCL.
Because of the movement, trends, and market needs, people in ETRI became busy. (I think they are always busy)
there is no any basement technology, infra system or educational program in Korea.
Thus, a leader of a research team in ETRI, Dr. Jung and his brilliant coworkers planned some projects for growth and development of the future software industry
Their research work covers a variety of software industries such as OS, service libraries, development tools, management tools, and applications.
This is RND road map of ETRI
At first year, some proto type technologies and products were developed.
Some works were related to OpenCL and linux kernels
Next year, research works progressed so that we could make sufficient results.
OpenCL IDE, Web engines based on heterogeneous multi core system, advanced OS kernel, and so on.
Finally, this year, we are about to complete those projects and open them to the public
Now, let me introduce some interesting research progressed by ETRI
As I mentioned before, there are some noticeable issues about heterogeneous system such as task offloading, load balancing, and power efficiency.
After offloading tasks to each core, efficient load balancing definately defines multi core system’s performance
Advanced OS kernel includes improved task scheduler called the Distributed Weighted Round-Robin.
This research shows how to utilize processors with a high efficiency rate.
Since GPUs have parallel vector processing capabilities that enable them to compute large sets of data,
data transferring between CPU and GPU causes a considerable bottleneck.
This is because the ETRI also tried to improve load balancing between CPU and GPU is also important.
Another main research topic of the ETRI is power consumption measurement algorithm.
The new algorithm showed better accuracy compared to Google Power Tutor
There is one more really important mission of the team ETRI
A few years ago, some passionate engineers, including myself, were invited to a small group meeting.
It was a tiny but significant kick-off meeting of an open source project community, called OpenSEED
The main goal of this community is to test and evalute recent techniques devloped by ETRI.
All of the research work is open to public and available for download at the OpenSEED site.
My team members and I have evaluated the advanced Linux kernel since the first year of the project.
We tried to test some image processing application programs with the Linux kernel.
Now, we are trying to test ocl modules in the OpenCV library which supports OpenCL on OpenCV library.