SlideShare a Scribd company logo
1 of 7
Download to read offline
OpenCL Kernel
Optimization Tips
Champ Yen (champ.yen@gmail.com)
http://champyen.blogspot.com
ver.20140820
Optimization - a form of balance
Device/Platform
Features
Runtime
Toolchain
Problem
Algorithm
Optimization
Optimization is not only greedy
searching in single direction. It is
more like to find a good balance
point between device, toolchain
and the problem.
Device - Computation
● device type
○ cpu - powerful single thread performance
○ gpu - many threads, great total throughput
● ISA design
○ scalar-based
○ vector-based
● # of compute unit/processing elements
● estimate impact of using divergence & barrier
● capability of asynchronous data transfer
Device - Memory
● get basic memory characteristics:
○ size
○ latency
○ throughput
○ coalescing effect
○ addressing mode
● global memory - unified or not
● local memory - real or not
● penalty of oversize
Toolchain/Runtime
● document/tutorial/guide for debugging, profiling and optimization.
● there is no perfect runtime/toolchain
● profiling/debugging tools.
● it is not always a good idea to debug/optimization on different
platforms.
● automatic optimization MAY NOT HELP the thinking of optimization
● tricky forms of computation/memory operations.
○ MAD operations
○ memory access mode
Problem/Algorithms
● DATA PARALLEL!
● multi-stages is not always bad.
○ doing all things together uses more memory resource in one workitem.
● vectorized is not always a good idea
● use appropriate work group size
○ bad memory access pattern, less coalescing
○ may cause lower cache hit rate
○ less local memory for each workitem
○ may be less private memory for each workitem.
● different form of implementation
● do optimization things manually.
○ DO NOT relies on automatic features.
Q & A

More Related Content

Similar to OpenCL Kernel Optimization Tips

Machine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloMachine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloRahul Singh
 
Software Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationSoftware Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationHao Xu
 
Ad109 - XPages Performance and Scalability
Ad109 - XPages Performance and ScalabilityAd109 - XPages Performance and Scalability
Ad109 - XPages Performance and Scalabilityddrschiw
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...Red Hat Developers
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopTamas K Lengyel
 
Monitoring and automation
Monitoring and automationMonitoring and automation
Monitoring and automationRicardo Bánffy
 
Concurrency - Why it's hard ?
Concurrency - Why it's hard ?Concurrency - Why it's hard ?
Concurrency - Why it's hard ?Ramith Jayasinghe
 
Liferay portals in real projects
Liferay portals  in real projectsLiferay portals  in real projects
Liferay portals in real projectsIBACZ
 
Anurag Awasthi - Machine Learning applications for CloudStack
Anurag Awasthi - Machine Learning applications for CloudStackAnurag Awasthi - Machine Learning applications for CloudStack
Anurag Awasthi - Machine Learning applications for CloudStackShapeBlue
 
Memory Management in Amoeba
Memory Management in AmoebaMemory Management in Amoeba
Memory Management in AmoebaRamesh Adhikari
 
Lecture01 algorithm analysis
Lecture01 algorithm analysisLecture01 algorithm analysis
Lecture01 algorithm analysisZara Nawaz
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Omid Vahdaty
 
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big DataVoxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big DataVoxxed Athens
 
Effective cplusplus
Effective cplusplusEffective cplusplus
Effective cplusplusMark Veltzer
 

Similar to OpenCL Kernel Optimization Tips (20)

Machine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloMachine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and Accumulo
 
Software Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationSoftware Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale Automation
 
Ad109 - XPages Performance and Scalability
Ad109 - XPages Performance and ScalabilityAd109 - XPages Performance and Scalability
Ad109 - XPages Performance and Scalability
 
Programming for Problem Solving
Programming for Problem Solving Programming for Problem Solving
Programming for Problem Solving
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
 
Multicore
MulticoreMulticore
Multicore
 
2016-01-16 03 Денис Нелюбин. How to test a million
2016-01-16 03 Денис Нелюбин. How to test a million2016-01-16 03 Денис Нелюбин. How to test a million
2016-01-16 03 Денис Нелюбин. How to test a million
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
 
Apache Singa AI
Apache Singa AIApache Singa AI
Apache Singa AI
 
Monitoring and automation
Monitoring and automationMonitoring and automation
Monitoring and automation
 
Why Concurrency is hard ?
Why Concurrency is hard ?Why Concurrency is hard ?
Why Concurrency is hard ?
 
Concurrency - Why it's hard ?
Concurrency - Why it's hard ?Concurrency - Why it's hard ?
Concurrency - Why it's hard ?
 
C3 w3
C3 w3C3 w3
C3 w3
 
Liferay portals in real projects
Liferay portals  in real projectsLiferay portals  in real projects
Liferay portals in real projects
 
Anurag Awasthi - Machine Learning applications for CloudStack
Anurag Awasthi - Machine Learning applications for CloudStackAnurag Awasthi - Machine Learning applications for CloudStack
Anurag Awasthi - Machine Learning applications for CloudStack
 
Memory Management in Amoeba
Memory Management in AmoebaMemory Management in Amoeba
Memory Management in Amoeba
 
Lecture01 algorithm analysis
Lecture01 algorithm analysisLecture01 algorithm analysis
Lecture01 algorithm analysis
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...
 
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big DataVoxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
 
Effective cplusplus
Effective cplusplusEffective cplusplus
Effective cplusplus
 

More from Champ Yen

Halide tutorial 2019
Halide tutorial 2019Halide tutorial 2019
Halide tutorial 2019Champ Yen
 
Linux SD/MMC Driver Stack
Linux SD/MMC Driver Stack Linux SD/MMC Driver Stack
Linux SD/MMC Driver Stack Champ Yen
 
OpenGL ES 2.x Programming Introduction
OpenGL ES 2.x Programming IntroductionOpenGL ES 2.x Programming Introduction
OpenGL ES 2.x Programming IntroductionChamp Yen
 
Chrome OS Observation
Chrome OS ObservationChrome OS Observation
Chrome OS ObservationChamp Yen
 
Play With Android
Play With AndroidPlay With Android
Play With AndroidChamp Yen
 
Linux Porting
Linux PortingLinux Porting
Linux PortingChamp Yen
 

More from Champ Yen (6)

Halide tutorial 2019
Halide tutorial 2019Halide tutorial 2019
Halide tutorial 2019
 
Linux SD/MMC Driver Stack
Linux SD/MMC Driver Stack Linux SD/MMC Driver Stack
Linux SD/MMC Driver Stack
 
OpenGL ES 2.x Programming Introduction
OpenGL ES 2.x Programming IntroductionOpenGL ES 2.x Programming Introduction
OpenGL ES 2.x Programming Introduction
 
Chrome OS Observation
Chrome OS ObservationChrome OS Observation
Chrome OS Observation
 
Play With Android
Play With AndroidPlay With Android
Play With Android
 
Linux Porting
Linux PortingLinux Porting
Linux Porting
 

Recently uploaded

Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 

Recently uploaded (20)

Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 

OpenCL Kernel Optimization Tips

  • 1. OpenCL Kernel Optimization Tips Champ Yen (champ.yen@gmail.com) http://champyen.blogspot.com ver.20140820
  • 2. Optimization - a form of balance Device/Platform Features Runtime Toolchain Problem Algorithm Optimization Optimization is not only greedy searching in single direction. It is more like to find a good balance point between device, toolchain and the problem.
  • 3. Device - Computation ● device type ○ cpu - powerful single thread performance ○ gpu - many threads, great total throughput ● ISA design ○ scalar-based ○ vector-based ● # of compute unit/processing elements ● estimate impact of using divergence & barrier ● capability of asynchronous data transfer
  • 4. Device - Memory ● get basic memory characteristics: ○ size ○ latency ○ throughput ○ coalescing effect ○ addressing mode ● global memory - unified or not ● local memory - real or not ● penalty of oversize
  • 5. Toolchain/Runtime ● document/tutorial/guide for debugging, profiling and optimization. ● there is no perfect runtime/toolchain ● profiling/debugging tools. ● it is not always a good idea to debug/optimization on different platforms. ● automatic optimization MAY NOT HELP the thinking of optimization ● tricky forms of computation/memory operations. ○ MAD operations ○ memory access mode
  • 6. Problem/Algorithms ● DATA PARALLEL! ● multi-stages is not always bad. ○ doing all things together uses more memory resource in one workitem. ● vectorized is not always a good idea ● use appropriate work group size ○ bad memory access pattern, less coalescing ○ may cause lower cache hit rate ○ less local memory for each workitem ○ may be less private memory for each workitem. ● different form of implementation ● do optimization things manually. ○ DO NOT relies on automatic features.