SlideShare a Scribd company logo
1 of 29
Download to read offline
Page 1
GPU Ecosystem
Introduction & Case Study
Ofer Rosenberg
October 2013
Page 2
Content
 GPU Ecosystem
 Ecosystem on Mobile/Embedded Platforms
 NSIGHT - Tools case study
 Libraries
Page 3
Product
GPU Ecosystem
Software Product Development cycle:
The GPU Ecosystem role is to support, speedup, and
improve this cycle for GPU Compute
Design
Write
Code
Debug
Profile
Page 4
GPU Ecosystem
 Support writing code by:
 IDE integration – Compiler, Parser, Wizards
 Libraries: Math (BLAS, IPP-like, Matrix, etc.),
STL-like (Thrust, BOLT)
 Support Debugging by:
 IDE integration of the debugger (preferred)
 Provide usable execution control (breakpoints, pause/resume, etc.)
 Providing reliable memory view of various address spaces
 Support Profiling by:
 Provide two levels of profiling: System Tracing and Kernel Profiling
 System Tracing - quick highlighting of hotspots and device optimal access
 Statistical and TimeLine-based Kernel Profiling (using perf. counters)
Design
Write
Code
Debug
Profile
Page 5
Ecosystem on
Mobile/Embedded Platforms
Page 6
ARM MALI
 Part of ARM SoC
 OpenCL 1.1Full Profile (Linux, Android)
 Renderscript (Android only)
 OpenCL SDK – Samples, Tutorials, etc.
 No GPU debugging capability
 ARM DS-5 (Developer Suite 5)
 Eclipse IDE integration
 Compiler, Debugger (CPU only)
 System Trace – CPU & GPU
 Deep Profiling - CPU & GPU
Page 7
Intel Haswell GPU
 Part of Haswell (CPU & GPU)
 OpenCL 1.2 Full Profile
 Windows only for now (Linux @ alpha stage)
 OpenCL SDK
 Samples
 Tools: Kernel Builder, VS/Eclipse Integration, Offline Compiler, GDB support (CPU Only)
 No GPU debugging capability
 VTune Amplifier XE supports OpenCL (CPU & GPU)
 System level tracing (Application, Memory, Kernel launch)
 Kernel Profiling
Page 8
Intel BayTrail platform (Atom)
 BayTrail < 13W, BayTrail-M < 6.5W
 Vallyview SoC (Z37xx)
 GPU is based on Gen7 (same arch as IvyBridge)
 Same as previous slide:
 OpenCL 1.2 (windows only for now)
 OpenCL SDK
 VTune support
 System level tracing
 Kernel Profiling
Page 9
NVIDIA Tegra 5 ? (Codename: Logan)
 Disclaimer: Logan is due early 2014. Part of the information is speculations
 Development Boards and Samples available to selected customers
 Logan SoC – 2W
 ARM CPU A15 4+1 :speculated
 Kepler based GPU : verified
 CUDA Support : verified
 CUDA SDK – Dozens of samples
 CUDA Libraries: Thrust, cuBLAS, cuNVPP, etc.
 NSIGHT : speculated
 System Trace
 Profiling, Debugging
Page 10
NSIGHT
TOOLS CASE STUDY
Design
Write
Code
Debug
Profile
Page 11
Nsight Highlights
 “NVIDIA® Nsight™ is the ultimate development platform for heterogeneous
computing”
( Taken from Nsight page )
 IDE integration
 Windows – integration with Visual Studio
 Linux – specialized Eclipse version
 Debugging , System Trace , Profiling
 Graphics (DX, OpenGL)
 Computing (OpenCL, CUDA, C++ AMP)
 Profiling only on CUDA kernels
 Debug/Trace/Profile Information is highly shaped
 Highly efficient information fields, windows, diagrams
 Feedback from professional users is noticed
Page 12
Debugging
 Much more than “just integrated” with the IDE
 Shaped windows showing valuable info
Assembly (GPU!)
Variables across
all warpsVisible layout of the stopped thread
Page 13
Debugging – Eclipse edition
 Seems that Eclipse integration is deeper than Visual Studio
 Unified CPU / GPU Debugging
 Simultaneous visibility into both CPU and GPU state
 Multi-GPU support
Slides from: “CUDA Development Using NVIDIA Nsight,
Eclipse Edition” by David Goodwin, SC12
 Full GPU debugging
 Set kernel breakpoints
 Single-step, run until, etc.
 View values across multiple GPU
threads at the same time
 Examine thread, warp, block state
 Source and assembly level debugging
Page 14
System Trace
Page 15
Kernel Profiling
 Choose a kernel to profile
 Skip N kernels, Profile M kernels
 Choose “experiments”
 Experiment - Types of profiling/analysis
 NVIDIA runs each kernel launch dozens of times with the same data
Page 16
Profiling Results
 Experiment list
 Each experiment is a tabbed window
 Profiling information is shaped in graphs,
pie charts, diagrams, etc.
 Taking HW counters and shaping them to easy-
to-understand graphics
 Information targets known HW bottlenecks, Code
inefficiencies, etc.
 Amazingly shaped…
Page 17
Profiling Results
 The information provides a quick & easy methodic way to identify the performance
bottlenecks
1 2
3 4
Page 18
Eclipse Edition - Source Code Editor
 Project Templates
 CUDA code highlighting
 CUDA aware refactoring
 CUDA aware code completion and inline help
Page 19
LIBRARIES EXAMPLES
Page 20
CUDA Libraries – Part of the SDK
 cuFFT
 cuBLAS
 cuRAND
 cuSPARSE
 NPP (like IPP)
 Math Library
 Thrust (next slide)
Page 21
Thrust Library
 https://developer.nvidia.com/thrust
 Works on top of CUDA
 Open-source version is available at github
 http://thrust.github.io/
 Presentations:
 http://on-demand.gputechconf.com/gtc-
express/2011/presentations/introductiontothrust.pdf
Page 22
OPENCL LIBRARIES
Page 23
CLPP
 OpenCL Data Parallel Primitives Library (similar to thrust)
 Source : https://code.google.com/p/clpp/
 7 committers, last commit 1.5Y ago
Page 24
OpenCL BLAS
 OpenCL BLAS
 http://openclblas.sourceforge.net/
 Code is available here (GPLv2):
 http://sourceforge.net/projects/openclblas/
Page 25
ViennaCL
 BLAS implementation
 http://viennacl.sourceforge.net/
 Looks very promising
Page 26
REFERENCES
Page 27
Platform links:
 ARM
 Developer site : http://malideveloper.arm.com
 OpenCL tracing : http://malideveloper.arm.com/develop-for-mali/tools/mali-graphics-debugger/
 DS-5 suite : http://www.arm.com/products/tools/software-tools/ds-5/index.php
 OpenCL SDK : http://malideveloper.arm.com/develop-for-mali/sdks/mali-opencl-sdk/
 OpenCL developer guide:
 Online: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0538e/index.html
 PDF: http://infocenter.arm.com/help/topic/com.arm.doc.dui0538e/DUI0538E_mali_t600_opencl_dg.pdf
 NVIDIA
 http://www.anandtech.com/show/7169/nvidia-demonstrates-logan-soc-mobile-kepler
 http://www.slashgear.com/nvidia-tegra-logan-detailed-with-game-changing-cuda-integration-19274630/
 http://www.ubergizmo.com/2013/07/nvidia-tegra-5-release-date-specs-news/
Page 28
Links:
 Intel
 OpenCL sdk http://software.intel.com/en-us/vcsource/tools/opencl-sdk
 GPA http://software.intel.com/en-us/vcsource/tools/intel-gpa
 vTune support in OpenCL http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe-getting-started-with-opencl-
performance-analysis-on-intel-hd-graphics
 http://www.theinquirer.net/inquirer/news/2266966/intel-releases-opencl-sdk-for-windows-and-linux
 Haswell Linux support: http://www.phoronix.com/scan.php?page=news_item&px=MTA3NDc
 OpenCL “Beignet” – open source linux compiler :
 http://software.intel.com/en-us/forums/topic/402118
 http://linux.slashdot.org/story/13/04/16/014233/intel-releases-new-opencl-implementation-for-gnulinux
 ATOM BayTrail:
 http://arstechnica.com/gadgets/2013/02/intel-gets-aggressive-with-new-smartphone-and-tablet-chips/
 http://www.anandtech.com/show/7314/intel-baytrail-preview-intel-atom-z3770-tested
 http://www.tomshardware.com/reviews/bay-trail-celeron-j1750-performance,3614-6.html
 http://software.intel.com/en-us/forums/topic/476221
 http://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors#.22Bay_Trail.22_.2822_nm.29
Page 29
NSIGHT Links
 http://www.nvidia.com/object/nsight.html
 https://developer.nvidia.com/nsight-visual-studio-edition-videos
 https://developer.nvidia.com/developer-webinars
 http://on-demand.gputechconf.com/supercomputing/2012/presentation/SB006-Goodwin-
CUDA-Development-Nsight.pdf
 http://on-demand.gputechconf.com/gtc/2013/presentations/S3011-CUDA-Optimization-
With-Nsight-VSE.pdf

More Related Content

What's hot

MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...AMD Developer Central
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLinaro
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...AMD Developer Central
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...AMD Developer Central
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...AMD Developer Central
 
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansGS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansAMD Developer Central
 
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...AMD Developer Central
 
Getting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® GraphicsGetting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® GraphicsIntel® Software
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wallugur candan
 
Ostech war story using mainline linux for an android tv bsp
Ostech  war story  using mainline linux  for an android tv bspOstech  war story  using mainline linux  for an android tv bsp
Ostech war story using mainline linux for an android tv bspNeil Armstrong
 
GPU power consumption and performance trends
GPU power consumption and performance trendsGPU power consumption and performance trends
GPU power consumption and performance trendsAlessio Villardita
 

What's hot (20)

MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics Upstreaming
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
PostgreSQL with OpenCL
PostgreSQL with OpenCLPostgreSQL with OpenCL
PostgreSQL with OpenCL
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
 
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri S...
 
Cuda
CudaCuda
Cuda
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
 
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin CoumansGS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
 
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
GS-4136, Optimizing Game Development using AMD’s GPU PerfStudio 2, by Gordon ...
 
Getting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® GraphicsGetting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® Graphics
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
 
Ostech war story using mainline linux for an android tv bsp
Ostech  war story  using mainline linux  for an android tv bspOstech  war story  using mainline linux  for an android tv bsp
Ostech war story using mainline linux for an android tv bsp
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
GPU power consumption and performance trends
GPU power consumption and performance trendsGPU power consumption and performance trends
GPU power consumption and performance trends
 

Viewers also liked

GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014StampedeCon
 
Accelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUsAccelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUsIBM
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingMark Kilgard
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in Rherbps10
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering Mark Kilgard
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on sparkSatyendra Rana
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrKohei KaiGai
 
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...
Enabling Graph Analytics at Scale:  The Opportunity for GPU-Acceleration of D...Enabling Graph Analytics at Scale:  The Opportunity for GPU-Acceleration of D...
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...odsc
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overviewinside-BigData.com
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015Kohei KaiGai
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScaleGoDataDriven
 
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...Spark Summit
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillotsparktc
 
How to Solve Real-Time Data Problems
How to Solve Real-Time Data ProblemsHow to Solve Real-Time Data Problems
How to Solve Real-Time Data ProblemsIBM Power Systems
 
Containerizing GPU Applications with Docker for Scaling to the Cloud
Containerizing GPU Applications with Docker for Scaling to the CloudContainerizing GPU Applications with Docker for Scaling to the Cloud
Containerizing GPU Applications with Docker for Scaling to the CloudSubbu Rama
 
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...Chris Fregly
 
The Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkThe Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkSpark Summit
 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit
 

Viewers also liked (20)

GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014GPUs in Big Data - StampedeCon 2014
GPUs in Big Data - StampedeCon 2014
 
Accelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUsAccelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUs
 
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingSIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
SIGGRAPH 2012: GPU-Accelerated 2D and Web Rendering
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
 
GTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path RenderingGTC 2012: GPU-Accelerated Path Rendering
GTC 2012: GPU-Accelerated Path Rendering
 
Deep learning on spark
Deep learning on sparkDeep learning on spark
Deep learning on spark
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated Asyncr
 
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...
Enabling Graph Analytics at Scale:  The Opportunity for GPU-Acceleration of D...Enabling Graph Analytics at Scale:  The Opportunity for GPU-Acceleration of D...
Enabling Graph Analytics at Scale: The Opportunity for GPU-Acceleration of D...
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at Scale
 
Deep Learning on Hadoop
Deep Learning on HadoopDeep Learning on Hadoop
Deep Learning on Hadoop
 
Hadoop + GPU
Hadoop + GPUHadoop + GPU
Hadoop + GPU
 
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
 
How to Solve Real-Time Data Problems
How to Solve Real-Time Data ProblemsHow to Solve Real-Time Data Problems
How to Solve Real-Time Data Problems
 
Containerizing GPU Applications with Docker for Scaling to the Cloud
Containerizing GPU Applications with Docker for Scaling to the CloudContainerizing GPU Applications with Docker for Scaling to the Cloud
Containerizing GPU Applications with Docker for Scaling to the Cloud
 
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
Tallinn Estonia Advanced Java Meetup Spark + TensorFlow = TensorFrames Oct 24...
 
The Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkThe Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in Spark
 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim Hunter
 

Similar to GPU Ecosystem

PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyAMD Developer Central
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...Edge AI and Vision Alliance
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018NVIDIA
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsnARUNACHALAM468781
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Intel® Software
 
Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!Codemotion
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaKazuaki Ishizaki
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Stefano Di Carlo
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
 
Performance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesPerformance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesIntel® Software
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseKazuaki Ishizaki
 
[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight
[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight
[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsightlaparuma
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & FutureOfer Rosenberg
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0Ganesan Narayanasamy
 

Similar to GPU Ecosystem (20)

PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
 
DRIVE PX 2
DRIVE PX 2DRIVE PX 2
DRIVE PX 2
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
 
Choosing the right processor
Choosing the right processorChoosing the right processor
Choosing the right processor
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)
 
Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
 
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
Performance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android DevicesPerformance and Power Profiling on Intel Android Devices
Performance and Power Profiling on Intel Android Devices
 
Resume
ResumeResume
Resume
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to Use
 
[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight
[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight
[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Cuda materials
Cuda materialsCuda materials
Cuda materials
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & Future
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

GPU Ecosystem

  • 1. Page 1 GPU Ecosystem Introduction & Case Study Ofer Rosenberg October 2013
  • 2. Page 2 Content  GPU Ecosystem  Ecosystem on Mobile/Embedded Platforms  NSIGHT - Tools case study  Libraries
  • 3. Page 3 Product GPU Ecosystem Software Product Development cycle: The GPU Ecosystem role is to support, speedup, and improve this cycle for GPU Compute Design Write Code Debug Profile
  • 4. Page 4 GPU Ecosystem  Support writing code by:  IDE integration – Compiler, Parser, Wizards  Libraries: Math (BLAS, IPP-like, Matrix, etc.), STL-like (Thrust, BOLT)  Support Debugging by:  IDE integration of the debugger (preferred)  Provide usable execution control (breakpoints, pause/resume, etc.)  Providing reliable memory view of various address spaces  Support Profiling by:  Provide two levels of profiling: System Tracing and Kernel Profiling  System Tracing - quick highlighting of hotspots and device optimal access  Statistical and TimeLine-based Kernel Profiling (using perf. counters) Design Write Code Debug Profile
  • 6. Page 6 ARM MALI  Part of ARM SoC  OpenCL 1.1Full Profile (Linux, Android)  Renderscript (Android only)  OpenCL SDK – Samples, Tutorials, etc.  No GPU debugging capability  ARM DS-5 (Developer Suite 5)  Eclipse IDE integration  Compiler, Debugger (CPU only)  System Trace – CPU & GPU  Deep Profiling - CPU & GPU
  • 7. Page 7 Intel Haswell GPU  Part of Haswell (CPU & GPU)  OpenCL 1.2 Full Profile  Windows only for now (Linux @ alpha stage)  OpenCL SDK  Samples  Tools: Kernel Builder, VS/Eclipse Integration, Offline Compiler, GDB support (CPU Only)  No GPU debugging capability  VTune Amplifier XE supports OpenCL (CPU & GPU)  System level tracing (Application, Memory, Kernel launch)  Kernel Profiling
  • 8. Page 8 Intel BayTrail platform (Atom)  BayTrail < 13W, BayTrail-M < 6.5W  Vallyview SoC (Z37xx)  GPU is based on Gen7 (same arch as IvyBridge)  Same as previous slide:  OpenCL 1.2 (windows only for now)  OpenCL SDK  VTune support  System level tracing  Kernel Profiling
  • 9. Page 9 NVIDIA Tegra 5 ? (Codename: Logan)  Disclaimer: Logan is due early 2014. Part of the information is speculations  Development Boards and Samples available to selected customers  Logan SoC – 2W  ARM CPU A15 4+1 :speculated  Kepler based GPU : verified  CUDA Support : verified  CUDA SDK – Dozens of samples  CUDA Libraries: Thrust, cuBLAS, cuNVPP, etc.  NSIGHT : speculated  System Trace  Profiling, Debugging
  • 10. Page 10 NSIGHT TOOLS CASE STUDY Design Write Code Debug Profile
  • 11. Page 11 Nsight Highlights  “NVIDIA® Nsight™ is the ultimate development platform for heterogeneous computing” ( Taken from Nsight page )  IDE integration  Windows – integration with Visual Studio  Linux – specialized Eclipse version  Debugging , System Trace , Profiling  Graphics (DX, OpenGL)  Computing (OpenCL, CUDA, C++ AMP)  Profiling only on CUDA kernels  Debug/Trace/Profile Information is highly shaped  Highly efficient information fields, windows, diagrams  Feedback from professional users is noticed
  • 12. Page 12 Debugging  Much more than “just integrated” with the IDE  Shaped windows showing valuable info Assembly (GPU!) Variables across all warpsVisible layout of the stopped thread
  • 13. Page 13 Debugging – Eclipse edition  Seems that Eclipse integration is deeper than Visual Studio  Unified CPU / GPU Debugging  Simultaneous visibility into both CPU and GPU state  Multi-GPU support Slides from: “CUDA Development Using NVIDIA Nsight, Eclipse Edition” by David Goodwin, SC12  Full GPU debugging  Set kernel breakpoints  Single-step, run until, etc.  View values across multiple GPU threads at the same time  Examine thread, warp, block state  Source and assembly level debugging
  • 15. Page 15 Kernel Profiling  Choose a kernel to profile  Skip N kernels, Profile M kernels  Choose “experiments”  Experiment - Types of profiling/analysis  NVIDIA runs each kernel launch dozens of times with the same data
  • 16. Page 16 Profiling Results  Experiment list  Each experiment is a tabbed window  Profiling information is shaped in graphs, pie charts, diagrams, etc.  Taking HW counters and shaping them to easy- to-understand graphics  Information targets known HW bottlenecks, Code inefficiencies, etc.  Amazingly shaped…
  • 17. Page 17 Profiling Results  The information provides a quick & easy methodic way to identify the performance bottlenecks 1 2 3 4
  • 18. Page 18 Eclipse Edition - Source Code Editor  Project Templates  CUDA code highlighting  CUDA aware refactoring  CUDA aware code completion and inline help
  • 20. Page 20 CUDA Libraries – Part of the SDK  cuFFT  cuBLAS  cuRAND  cuSPARSE  NPP (like IPP)  Math Library  Thrust (next slide)
  • 21. Page 21 Thrust Library  https://developer.nvidia.com/thrust  Works on top of CUDA  Open-source version is available at github  http://thrust.github.io/  Presentations:  http://on-demand.gputechconf.com/gtc- express/2011/presentations/introductiontothrust.pdf
  • 23. Page 23 CLPP  OpenCL Data Parallel Primitives Library (similar to thrust)  Source : https://code.google.com/p/clpp/  7 committers, last commit 1.5Y ago
  • 24. Page 24 OpenCL BLAS  OpenCL BLAS  http://openclblas.sourceforge.net/  Code is available here (GPLv2):  http://sourceforge.net/projects/openclblas/
  • 25. Page 25 ViennaCL  BLAS implementation  http://viennacl.sourceforge.net/  Looks very promising
  • 27. Page 27 Platform links:  ARM  Developer site : http://malideveloper.arm.com  OpenCL tracing : http://malideveloper.arm.com/develop-for-mali/tools/mali-graphics-debugger/  DS-5 suite : http://www.arm.com/products/tools/software-tools/ds-5/index.php  OpenCL SDK : http://malideveloper.arm.com/develop-for-mali/sdks/mali-opencl-sdk/  OpenCL developer guide:  Online: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0538e/index.html  PDF: http://infocenter.arm.com/help/topic/com.arm.doc.dui0538e/DUI0538E_mali_t600_opencl_dg.pdf  NVIDIA  http://www.anandtech.com/show/7169/nvidia-demonstrates-logan-soc-mobile-kepler  http://www.slashgear.com/nvidia-tegra-logan-detailed-with-game-changing-cuda-integration-19274630/  http://www.ubergizmo.com/2013/07/nvidia-tegra-5-release-date-specs-news/
  • 28. Page 28 Links:  Intel  OpenCL sdk http://software.intel.com/en-us/vcsource/tools/opencl-sdk  GPA http://software.intel.com/en-us/vcsource/tools/intel-gpa  vTune support in OpenCL http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe-getting-started-with-opencl- performance-analysis-on-intel-hd-graphics  http://www.theinquirer.net/inquirer/news/2266966/intel-releases-opencl-sdk-for-windows-and-linux  Haswell Linux support: http://www.phoronix.com/scan.php?page=news_item&px=MTA3NDc  OpenCL “Beignet” – open source linux compiler :  http://software.intel.com/en-us/forums/topic/402118  http://linux.slashdot.org/story/13/04/16/014233/intel-releases-new-opencl-implementation-for-gnulinux  ATOM BayTrail:  http://arstechnica.com/gadgets/2013/02/intel-gets-aggressive-with-new-smartphone-and-tablet-chips/  http://www.anandtech.com/show/7314/intel-baytrail-preview-intel-atom-z3770-tested  http://www.tomshardware.com/reviews/bay-trail-celeron-j1750-performance,3614-6.html  http://software.intel.com/en-us/forums/topic/476221  http://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors#.22Bay_Trail.22_.2822_nm.29
  • 29. Page 29 NSIGHT Links  http://www.nvidia.com/object/nsight.html  https://developer.nvidia.com/nsight-visual-studio-edition-videos  https://developer.nvidia.com/developer-webinars  http://on-demand.gputechconf.com/supercomputing/2012/presentation/SB006-Goodwin- CUDA-Development-Nsight.pdf  http://on-demand.gputechconf.com/gtc/2013/presentations/S3011-CUDA-Optimization- With-Nsight-VSE.pdf