SlideShare a Scribd company logo
1 of 29
Download to read offline
Copyright © 2016 Imagination Technologies 1
Efficient Convolutional Neural Network
Inference on Mobile GPUs
Paul Brasnett
May 3, 2016
Copyright © 2016 Imagination Technologies 2
• About Imagination Technologies
• PowerVR GPUs
• Case study: Implementing Convolutions
• Performance Analysis
• Conclusions
• Resources
Overview
Copyright © 2016 Imagination Technologies 3
• Imagination Technologies
is a leading IP supplier for
multimedia, processors and
communications
• More than 8bn units
containing Imagination IP
shipped
About Imagination Technologies
SoCfabric
PowerVR
Graphics & GPU Compute
Processors
Ensigma
Communications
Processors
PowerVR
Vision
Processors
MIPS
Processors
PowerVR
Video
Processors
Copyright © 2016 Imagination Technologies 4
What is a Mobile GPU?
Mobile GPU
Optimised for High
Performance at
Low Power
Copyright © 2016 Imagination Technologies 5
What is a Mobile GPU?
Mobile Devices
Automotive
Consumer Multimedia
Wearables
Internet of Things
Augmented Reality
Mobile GPU
Optimised for High
Performance at
Low Power
Copyright © 2016 Imagination Technologies 6
Why Mobile GPUs for Vision Processing?
CPUs can generate large amounts of heat• CPUs can deliver high peak/burst
performance
• But generate large amounts of heat
• PowerVR Mobile GPUs provide
• Lowest power FP16 & int pipelines
• Local memory for highly efficient data
access for compute operations
• Power-saving features such as gating
of non-compute parts of GPU for
efficient compute operation
Copyright © 2016 Imagination Technologies 7
Why Mobile GPUs for Vision Processing?
Provence
(raytracing)
Particle
Simulation –
32k
Particle
Simulation –
4k
Julia Set
Ambient
Occlusion
Denoise Gaussian Blur
CPU 100.00% 100% 100% 100% 100% 100% 100%
PowerVR Series6 265% 407% 517% 963% 1126% 482% 383%
0%
100%
200%
300%
400%
500%
600%
Performancerelative
toCPU
Copyright © 2016 Imagination Technologies 8
Moving the CNN Workload to the GPU
PowerVR GPU — Graphics and computeCPU
Large Cache
Unified System Memory
CPU1
CPU0
THREADS
Few
Multiprocessor (Unified Shading Cluster)
Multiprocessor (Unified Shading Cluster)
Coarse Grain Scheduler
L2
System Level CacheCache Unit
Residency
Slots
Common
StoreCompute Store
Texture
Processing Unit
Residency
Slots
Common
StoreCompute StoreScheduler
System Memory Interface
enqueue
Compute
Kernel
Host
Interface
Scheduler
System Memory Interface
Copyright © 2016 Imagination Technologies 9
Evolution of Mobile GPU
PowerVR
Series 6 GPU
PowerVR
Series 7 GPU
PowerVR
Series 8 GPU
…
Copyright © 2016 Imagination Technologies 10
Evolution of Mobile GPU
OpenCL 1.2
OpenCV
OpenVX
Vulkan
OpenCL 2.0
New APIs
Copyright © 2016 Imagination Technologies 11
• Mobile GPU increasingly dominating compute performance in SoCs
GPU Dominates Compute in Modern SoCs
CPU
GPU
Illustrative diagram only, to show relative CPU/GPU size
Copyright © 2016 Imagination Technologies 12
• State-of-the-art performance
• Rapid development cycles
• Range of vision tasks
• Classification
• Localisation
• Other applications…
Why CNNs?
Camera Localisation
PoseNet: A Convolutional Network for Real-Time 6-DOF Camera
Relocalization, Kendall, A., Grimes, M., Cipolla, R., ICCV 2015
Copyright © 2016 Imagination Technologies 13
What is a CNN?
Convolution Activation Normalization Pooling Fully Connected
ConvolutionImage Activation Pooling
Fully Connected
CNN Architecture Building Blocks
CNN Example Network
Normalization
Soft Max
Convolution Activation Pooling Normalization
Convolution Activation Pooling Soft Max
Copyright © 2016 Imagination Technologies 14
• Training — Offline
CNN Object Classification
Architecture
Data
CNN Library Compute + Time Model Coefficients
Copyright © 2016 Imagination Technologies 15
• Training — Offline
• Inference — Online
CNN Object Classification
Architecture
Data
CNN Library Compute + Time Model Coefficients
Architecture
Model Coefficients
Copyright © 2016 Imagination Technologies 16
• Training — Offline
• Inference — Online
CNN Object Classification
Architecture
Data
CNN Library Compute + Time Model Coefficients
Architecture
Model Coefficients
Image
CNN Library Compute Classification
Mobile GPU
Copyright © 2016 Imagination Technologies 17
Where is the Cost in CNN Inference?
Flops by layer-type (AlexNet)
Convolution
Normalisation
Pooling
Fully Connected
Copyright © 2016 Imagination Technologies 18
• Create as many work-items as is size of output matrix
• Each work-item will read it’s row and column and produce dot product
• Requires large number of accesses to memory
Matrix Multiply — Naïve
x =
A B C
Copyright © 2016 Imagination Technologies 19
• The OpenCL memory model
closely maps to GPU architecture
• Private Memory — Per work-item
• Local Memory
• Shared within a work-group
• Global Memory /Constant Memory
• Visible to all work-groups
• Host memory
• Typically share CPU/GPU on a
mobile SoC
OpenCL Memory Model
Copyright © 2016 Imagination Technologies 20
• Work-items load A data into private memory
Matrix Multiply — Tiling Approach
Tiling approach based on “2008. Volkov and Demmel. Using GPUs to accelerate linear algebra runtime”
x =
A B C
Copyright © 2016 Imagination Technologies 21
• Work-items load A data into private memory
• Work-groups load B data into local memory
• Each work item will read from local memory and produce a dot product
• Significantly reduces global memory accesses
Matrix Multiply — Tiling Approach
x =
A B C
Tiling approach based on “2008. Volkov and Demmel. Using GPUs to accelerate linear algebra runtime”
Copyright © 2016 Imagination Technologies 22
• Choose work-group size to fit the GPU, 32 work-items is typically a good
choice for PowerVR GPUs
• Read multiple items (e.g. 4 or 8) into private memory at a time to optimise
memory transfers
• Consider the use of half data type in place of float
• Most PowerVR platforms provide up to 2x the flops
• Define workgroup size at compile time
• __attribute__((reqd_work_group_size(SIZE, 1, 1)))
Matrix Multiply — OpenCL Tips
Copyright © 2016 Imagination Technologies 23
Matrix Multiply — Tiling Approach
0.1
1
10
100
1000
Time(s)
Matrix Size
Naïve
Tiled matrix multiply
Copyright © 2016 Imagination Technologies 24
CNN Classification: AlexNet & GoogLeNet
60
5.5
Model Coefficients
(Millions)
AlexNet GoogLeNet
1.3
3.1
Operations
(Billions)
AlexNet GoogLeNet18.2
10.07
Top-5 Error Rate (%)
AlexNet GoogLeNet
 Bandwidth  Compute
Copyright © 2016 Imagination Technologies 25
• Time consumed by layer type
Performance Analysis — CNN Inference
GoogLeNet
Convolutions
Pooling
Normalisation
Fully Connected
Reference Time*: 1.36 Reference Time*: 1.00
AlexNet
Convolutions
Pooling
Normalisation
Fully Connected
Copyright © 2016 Imagination Technologies 26
Performance Analysis — GPU v CPU*
* CPU results based on Caffe (with ATLAS)
0
2
4
6
8
10
12
14RelativeFPSPerformance
(Higherisbetter)
AlexNet
GPU - PowerVR 2 Cluster
GPU (480MHz)
CPU - ARM A15 (1.6GHz)
Copyright © 2016 Imagination Technologies 27
Efficiency Analysis — GPU v CPU
0
0.5
1
1.5
2
2.5
3
3.5
RelativeEfficiency(Higheris
better)
AlexNet
GPU - PowerVR 2
Cluster GPU (480MHz)
CPU - ARM A15
(1.6GHz)
Copyright © 2016 Imagination Technologies 28
• Mobile GPUs are widely available in a range of SoCs across numerous
markets today
• Compared to mobile CPUs, PowerVR Mobile GPUs offer
• upto 3x higher efficiency and
• upto 12x higher performance deployment for CNNs
• Newer CNN architectures with smaller fully connected layers help to
make more efficient use of compute resources
• PowerVR GPUs scale to allow for higher levels of performance & lower
power for current and future generations of vision enabled products
• COME & SEE THE DEMO DURING THE NEXT BREAK
Conclusions
Copyright © 2016 Imagination Technologies 29
• PowerVR GPU Compute
• https://imgtec.com/tools/powervr-gpu-compute/
• Guide to writing OpenCL
• http://blog.imgtec.com/powervr/a-quick-guide-to-writing-opencl-kernels-for-rogue
• PowerVR Imaging Framework
• http://blog.imgtec.com/powervr/powervr-imaging-framework-sdk
• PowerVR CNN Demo
• See our stand
• OpenCL Tutorial
• https://handsonopencl.github.io/
Resources

More Related Content

More from Edge AI and Vision Alliance

“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...Edge AI and Vision Alliance
 
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...Edge AI and Vision Alliance
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...Edge AI and Vision Alliance
 
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic LeapEdge AI and Vision Alliance
 
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ..."Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...Edge AI and Vision Alliance
 
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...Edge AI and Vision Alliance
 
“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from InstrumentalEdge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
 
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
 
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
 
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ..."Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
 
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
 
“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

"Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

  • 1. Copyright © 2016 Imagination Technologies 1 Efficient Convolutional Neural Network Inference on Mobile GPUs Paul Brasnett May 3, 2016
  • 2. Copyright © 2016 Imagination Technologies 2 • About Imagination Technologies • PowerVR GPUs • Case study: Implementing Convolutions • Performance Analysis • Conclusions • Resources Overview
  • 3. Copyright © 2016 Imagination Technologies 3 • Imagination Technologies is a leading IP supplier for multimedia, processors and communications • More than 8bn units containing Imagination IP shipped About Imagination Technologies SoCfabric PowerVR Graphics & GPU Compute Processors Ensigma Communications Processors PowerVR Vision Processors MIPS Processors PowerVR Video Processors
  • 4. Copyright © 2016 Imagination Technologies 4 What is a Mobile GPU? Mobile GPU Optimised for High Performance at Low Power
  • 5. Copyright © 2016 Imagination Technologies 5 What is a Mobile GPU? Mobile Devices Automotive Consumer Multimedia Wearables Internet of Things Augmented Reality Mobile GPU Optimised for High Performance at Low Power
  • 6. Copyright © 2016 Imagination Technologies 6 Why Mobile GPUs for Vision Processing? CPUs can generate large amounts of heat• CPUs can deliver high peak/burst performance • But generate large amounts of heat • PowerVR Mobile GPUs provide • Lowest power FP16 & int pipelines • Local memory for highly efficient data access for compute operations • Power-saving features such as gating of non-compute parts of GPU for efficient compute operation
  • 7. Copyright © 2016 Imagination Technologies 7 Why Mobile GPUs for Vision Processing? Provence (raytracing) Particle Simulation – 32k Particle Simulation – 4k Julia Set Ambient Occlusion Denoise Gaussian Blur CPU 100.00% 100% 100% 100% 100% 100% 100% PowerVR Series6 265% 407% 517% 963% 1126% 482% 383% 0% 100% 200% 300% 400% 500% 600% Performancerelative toCPU
  • 8. Copyright © 2016 Imagination Technologies 8 Moving the CNN Workload to the GPU PowerVR GPU — Graphics and computeCPU Large Cache Unified System Memory CPU1 CPU0 THREADS Few Multiprocessor (Unified Shading Cluster) Multiprocessor (Unified Shading Cluster) Coarse Grain Scheduler L2 System Level CacheCache Unit Residency Slots Common StoreCompute Store Texture Processing Unit Residency Slots Common StoreCompute StoreScheduler System Memory Interface enqueue Compute Kernel Host Interface Scheduler System Memory Interface
  • 9. Copyright © 2016 Imagination Technologies 9 Evolution of Mobile GPU PowerVR Series 6 GPU PowerVR Series 7 GPU PowerVR Series 8 GPU …
  • 10. Copyright © 2016 Imagination Technologies 10 Evolution of Mobile GPU OpenCL 1.2 OpenCV OpenVX Vulkan OpenCL 2.0 New APIs
  • 11. Copyright © 2016 Imagination Technologies 11 • Mobile GPU increasingly dominating compute performance in SoCs GPU Dominates Compute in Modern SoCs CPU GPU Illustrative diagram only, to show relative CPU/GPU size
  • 12. Copyright © 2016 Imagination Technologies 12 • State-of-the-art performance • Rapid development cycles • Range of vision tasks • Classification • Localisation • Other applications… Why CNNs? Camera Localisation PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization, Kendall, A., Grimes, M., Cipolla, R., ICCV 2015
  • 13. Copyright © 2016 Imagination Technologies 13 What is a CNN? Convolution Activation Normalization Pooling Fully Connected ConvolutionImage Activation Pooling Fully Connected CNN Architecture Building Blocks CNN Example Network Normalization Soft Max Convolution Activation Pooling Normalization Convolution Activation Pooling Soft Max
  • 14. Copyright © 2016 Imagination Technologies 14 • Training — Offline CNN Object Classification Architecture Data CNN Library Compute + Time Model Coefficients
  • 15. Copyright © 2016 Imagination Technologies 15 • Training — Offline • Inference — Online CNN Object Classification Architecture Data CNN Library Compute + Time Model Coefficients Architecture Model Coefficients
  • 16. Copyright © 2016 Imagination Technologies 16 • Training — Offline • Inference — Online CNN Object Classification Architecture Data CNN Library Compute + Time Model Coefficients Architecture Model Coefficients Image CNN Library Compute Classification Mobile GPU
  • 17. Copyright © 2016 Imagination Technologies 17 Where is the Cost in CNN Inference? Flops by layer-type (AlexNet) Convolution Normalisation Pooling Fully Connected
  • 18. Copyright © 2016 Imagination Technologies 18 • Create as many work-items as is size of output matrix • Each work-item will read it’s row and column and produce dot product • Requires large number of accesses to memory Matrix Multiply — Naïve x = A B C
  • 19. Copyright © 2016 Imagination Technologies 19 • The OpenCL memory model closely maps to GPU architecture • Private Memory — Per work-item • Local Memory • Shared within a work-group • Global Memory /Constant Memory • Visible to all work-groups • Host memory • Typically share CPU/GPU on a mobile SoC OpenCL Memory Model
  • 20. Copyright © 2016 Imagination Technologies 20 • Work-items load A data into private memory Matrix Multiply — Tiling Approach Tiling approach based on “2008. Volkov and Demmel. Using GPUs to accelerate linear algebra runtime” x = A B C
  • 21. Copyright © 2016 Imagination Technologies 21 • Work-items load A data into private memory • Work-groups load B data into local memory • Each work item will read from local memory and produce a dot product • Significantly reduces global memory accesses Matrix Multiply — Tiling Approach x = A B C Tiling approach based on “2008. Volkov and Demmel. Using GPUs to accelerate linear algebra runtime”
  • 22. Copyright © 2016 Imagination Technologies 22 • Choose work-group size to fit the GPU, 32 work-items is typically a good choice for PowerVR GPUs • Read multiple items (e.g. 4 or 8) into private memory at a time to optimise memory transfers • Consider the use of half data type in place of float • Most PowerVR platforms provide up to 2x the flops • Define workgroup size at compile time • __attribute__((reqd_work_group_size(SIZE, 1, 1))) Matrix Multiply — OpenCL Tips
  • 23. Copyright © 2016 Imagination Technologies 23 Matrix Multiply — Tiling Approach 0.1 1 10 100 1000 Time(s) Matrix Size Naïve Tiled matrix multiply
  • 24. Copyright © 2016 Imagination Technologies 24 CNN Classification: AlexNet & GoogLeNet 60 5.5 Model Coefficients (Millions) AlexNet GoogLeNet 1.3 3.1 Operations (Billions) AlexNet GoogLeNet18.2 10.07 Top-5 Error Rate (%) AlexNet GoogLeNet  Bandwidth  Compute
  • 25. Copyright © 2016 Imagination Technologies 25 • Time consumed by layer type Performance Analysis — CNN Inference GoogLeNet Convolutions Pooling Normalisation Fully Connected Reference Time*: 1.36 Reference Time*: 1.00 AlexNet Convolutions Pooling Normalisation Fully Connected
  • 26. Copyright © 2016 Imagination Technologies 26 Performance Analysis — GPU v CPU* * CPU results based on Caffe (with ATLAS) 0 2 4 6 8 10 12 14RelativeFPSPerformance (Higherisbetter) AlexNet GPU - PowerVR 2 Cluster GPU (480MHz) CPU - ARM A15 (1.6GHz)
  • 27. Copyright © 2016 Imagination Technologies 27 Efficiency Analysis — GPU v CPU 0 0.5 1 1.5 2 2.5 3 3.5 RelativeEfficiency(Higheris better) AlexNet GPU - PowerVR 2 Cluster GPU (480MHz) CPU - ARM A15 (1.6GHz)
  • 28. Copyright © 2016 Imagination Technologies 28 • Mobile GPUs are widely available in a range of SoCs across numerous markets today • Compared to mobile CPUs, PowerVR Mobile GPUs offer • upto 3x higher efficiency and • upto 12x higher performance deployment for CNNs • Newer CNN architectures with smaller fully connected layers help to make more efficient use of compute resources • PowerVR GPUs scale to allow for higher levels of performance & lower power for current and future generations of vision enabled products • COME & SEE THE DEMO DURING THE NEXT BREAK Conclusions
  • 29. Copyright © 2016 Imagination Technologies 29 • PowerVR GPU Compute • https://imgtec.com/tools/powervr-gpu-compute/ • Guide to writing OpenCL • http://blog.imgtec.com/powervr/a-quick-guide-to-writing-opencl-kernels-for-rogue • PowerVR Imaging Framework • http://blog.imgtec.com/powervr/powervr-imaging-framework-sdk • PowerVR CNN Demo • See our stand • OpenCL Tutorial • https://handsonopencl.github.io/ Resources