Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
OpenCL Kernel
Optimization Tips
Champ Yen (champ.yen@gmail.com)
http://champyen.blogspot.com
ver.20140820
Optimization - a form of balance
Device/Platform
Features
Runtime
Toolchain
Problem
Algorithm
Optimization
Optimization is...
Device - Computation
● device type
○ cpu - powerful single thread performance
○ gpu - many threads, great total throughput...
Device - Memory
● get basic memory characteristics:
○ size
○ latency
○ throughput
○ coalescing effect
○ addressing mode
● ...
Toolchain/Runtime
● document/tutorial/guide for debugging, profiling and optimization.
● there is no perfect runtime/toolc...
Problem/Algorithms
● DATA PARALLEL!
● multi-stages is not always bad.
○ doing all things together uses more memory resourc...
Q & A
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
Simd programming introduction
Next
Download to read offline and view in fullscreen.

Share

OpenCL Kernel Optimization Tips

Download to read offline

Tips of things must be considered in the procedure of OpenCL optimization

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

OpenCL Kernel Optimization Tips

  1. 1. OpenCL Kernel Optimization Tips Champ Yen (champ.yen@gmail.com) http://champyen.blogspot.com ver.20140820
  2. 2. Optimization - a form of balance Device/Platform Features Runtime Toolchain Problem Algorithm Optimization Optimization is not only greedy searching in single direction. It is more like to find a good balance point between device, toolchain and the problem.
  3. 3. Device - Computation ● device type ○ cpu - powerful single thread performance ○ gpu - many threads, great total throughput ● ISA design ○ scalar-based ○ vector-based ● # of compute unit/processing elements ● estimate impact of using divergence & barrier ● capability of asynchronous data transfer
  4. 4. Device - Memory ● get basic memory characteristics: ○ size ○ latency ○ throughput ○ coalescing effect ○ addressing mode ● global memory - unified or not ● local memory - real or not ● penalty of oversize
  5. 5. Toolchain/Runtime ● document/tutorial/guide for debugging, profiling and optimization. ● there is no perfect runtime/toolchain ● profiling/debugging tools. ● it is not always a good idea to debug/optimization on different platforms. ● automatic optimization MAY NOT HELP the thinking of optimization ● tricky forms of computation/memory operations. ○ MAD operations ○ memory access mode
  6. 6. Problem/Algorithms ● DATA PARALLEL! ● multi-stages is not always bad. ○ doing all things together uses more memory resource in one workitem. ● vectorized is not always a good idea ● use appropriate work group size ○ bad memory access pattern, less coalescing ○ may cause lower cache hit rate ○ less local memory for each workitem ○ may be less private memory for each workitem. ● different form of implementation ● do optimization things manually. ○ DO NOT relies on automatic features.
  7. 7. Q & A
  • wiliwe

    Feb. 19, 2017
  • himanshu.sheth

    Aug. 12, 2014
  • zarnikyawhtin

    Aug. 12, 2014
  • KaspterJu

    Aug. 11, 2014
  • luciensomg

    Aug. 11, 2014

Tips of things must be considered in the procedure of OpenCL optimization

Views

Total views

1,873

On Slideshare

0

From embeds

0

Number of embeds

768

Actions

Downloads

10

Shares

0

Comments

0

Likes

5

×