SlideShare a Scribd company logo
1 of 33
Download to read offline
Copyright © 2015 Advanced Micro Devices 1
Harris Gasparakis, Ph.D.
12 May 2015
Understanding Adaptive Machine Learning Vision
Algorithms and Implementing them on GPUs and
Heterogeneous Platforms
Copyright © 2015 Advanced Micro Devices 2
• Machine Learning (ML)
• Constrained optimization problems
• Heterogeneous computing
• OpenCL2.0, HSA
• Synthesis
• OpenCL programming tips for ML
• Conclusions
Agenda
Copyright © 2015 Advanced Micro Devices 3
Can you find an algorithm to describe an object, and detect it?
Why Machine Learning (ML)?
Copyright © 2015 Advanced Micro Devices 4
Can you find an algorithm to describe an object, and detect it?
Sometimes Not Needed…
Copyright © 2015 Advanced Micro Devices 5
Can you find an algorithm to describe an object, and detect it?
Most Often Indispensable!
Vidit Jain and Erik Learned-Miller.
FDDB: A Benchmark for Face Detection in Unconstrained Settings.
Copyright © 2015 Advanced Micro Devices 6
• Learn from examples!
• Model the universe using functions with (possibly many) parameters “w”
that you learn from training data
• “x” is a (multi-dimensional) function of the image data
• Pixel patches
• A priori Features
• Features in a learned dictionary (basis)
• PCA
• Sparse coding/LASSO/LARS
• DNN
• “y” is our value judgment on the data
• Object category
• Object identity, etc.
Formalism
Copyright © 2015 Advanced Micro Devices 7
Tune parameters “w” to best explain the “N” observations (𝑦𝑛, 𝑥 𝑛)
Machine learning typically involves constrained functional minimization
• Bias/variance
• Overcompleteness/sparsity
• How much learning is too much?
• N = ? |w| = ?
• Graphical models/subspace updates
Formalism
𝐸 𝑤 = 𝐷 𝑦𝑛, 𝑥 𝑛; 𝑤 + λ𝐶 𝑤 + ⋯
𝑁
𝑛=1
Copyright © 2015 Advanced Micro Devices 8
It is a Jungle of Minima!
Start with initial guess:
𝑤0
Iteratively improve it:
𝑤𝑡 = 𝑤𝑡−1 + 𝛿𝑤𝑡
Local minima, with
Basins of attraction
Copyright © 2015 Advanced Micro Devices 9
• Second order methods:
𝛿𝑤𝑡 = −𝐻 𝑤𝑡−1 𝑔(𝑤𝑡−1)
• First order methods:
𝛿𝑤𝑡 = −κ 𝑔(𝑤𝑡−1)
• Tweaks:
Line minimization, momentum, heat, homotopy, multiresolution
• Modern first order methods (AdaGrad, AdaDelta, etc):
𝛿𝑤𝑡 = −𝐻(𝑔1:𝑡)𝑔(𝑤𝑡−1)
History
Copyright © 2015 Advanced Micro Devices 10
• Start from a pool of multiple initial conditions, and multiple update
rules (“configurations”)
• Explore them simultaneously (GPU thread)
• On each update step, reason about the progress of each (CPU threads)
• Eliminate configurations:
• Dead ends
• in the same basin of attraction
• Replace them with other random configurations
• Give preference to configurations that progress the most
What if?
Copyright © 2015 Advanced Micro Devices 11
Let’s Explore the Jungle!
Copyright © 2015 Advanced Micro Devices 12
Let’s Explore the Jungle!
Copyright © 2015 Advanced Micro Devices 13
Let’s Explore the Jungle!
Copyright © 2015 Advanced Micro Devices 14
Let’s Explore the Jungle!
Copyright © 2015 Advanced Micro Devices 15
Some Dead Ends…
Copyright © 2015 Advanced Micro Devices 16
Reinitialize them!
Copyright © 2015 Advanced Micro Devices 17
Continue Exploring…
Copyright © 2015 Advanced Micro Devices 18
One Visitor Per Attractor is Enough...
Copyright © 2015 Advanced Micro Devices 19
• CPU as adaptive GPU supervisor
• GPU computes an ensemble of updates
• CPU reasons about the ensemble of updates
• Coalesce if in the same basin of attraction
• Prune or “kick” if trapped in local minimum
• Test and rank according to generalization error
• Is it practical?
The Master Adaptive Strategy
Copyright © 2015 Advanced Micro Devices 20
Know Thy (HSA) Hardware!
CPU HSA iGPU
Physical Memory
Unified (Bidirectionally Coherent, Pageable) Virtual Memory
L2 L2
CC
L1
CC
L1
CC
L1
CC
L1/LDS
CC
L1/LDS
CC
L1/LDS
CC
L1/LDS
Scheduler Scheduler
hUMA
Heterogeneous System Architecture (exposed via OpenCL 2.0)
Copyright © 2015 Advanced Micro Devices 21
CPU HSA iGPU
Physical Memory
Unified (Bidirectionally Coherent, Pageable) Virtual Memory
L2 L2
CC
L1
CC
L1
CC
L1
CC
L1
CC
L1
CC
L1
CC
L1
hQ
Scheduler Scheduler
Dynamic parallelism,
Context switching,
Preemption,
Concurrent execution
Know Thy (HSA) Hardware!
Copyright © 2015 Advanced Micro Devices 22
• Scope
• Thread, workgroup, device, all HSA devices
• Semantics
• Acquire (require that memory writes of other threads within the
scope become visible in current thread)
• Release (writes of current thread become visible to other threads in
current scope)
C++11 Atomics/Opencl 2.0 Atomics
Copyright © 2015 Advanced Micro Devices 23
• Initialize pool of 𝑤𝑡 as fine grain SMV with atomics enabled:
clSVMAlloc (…, CL_MEM_READ_WRITE |
CL_MEM_SVM_FINE_GRAIN_BUFFER |
CL_MEM_SVM_ATOMICS,…);
• CPU waits for GPU to finish an iteration:
done = std::atomic_load_explicit (..,
std::memory_order_acquire );
• GPU kernel “signals” when done with an iteration:
atomic_store_explicit ( (global atomic_int *)(…), …
memory_order_release,
memory_scope_all_svm_devices );
C++11 Atomics/OpenCL 2.0 Atomics
Copyright © 2015 Advanced Micro Devices 24
• The optimal partitioning of problem to threads may be non-obvious
• Depends a lot on cache line size
• Do not incur memory latency multiple times, align threads with
cache lines.
OpenCL Tips
Copyright © 2015 Advanced Micro Devices 25
• X0,0, X0,1, … X0,15 ,…, X0,127
• X1,0, X1,1, … , X0,15 ,…, X1,127
• X2,0, X2,1, … , X0,15 ,…, X2,127
• XN-1,0, XN-1,1, … , XN,15 ,…, XN,127
K=2 Means, N=10000, in F=128 dims
• M0,0, M0,2, … , M0,15 ,…, M0,127
• M1,0, X1,2, … , X0,15 ,…, M1,127
Copyright © 2015 Advanced Micro Devices 26
• X0,0, X0,1, … X0,15 ,…, X0,127
• X1,0, X1,1, … , X0,15 ,…, X1,127
• X2,0, X2,1, … , X0,15 ,…, X2,127
• XN-1,0, XN-1,1, … , XN,15 ,…, XN,127
K=2 Means, N=10000, in F=128 dims
• M0,0, M0,2, … , M0,15 ,…, M0,127
• M1,0, X1,2, … , X0,15 ,…, M1,127
Copyright © 2015 Advanced Micro Devices 27
• The optimal partitioning of problem to threads may be non-obvious
• Depends a lot on cache line size
• Depends a lot on L2 size (and for virtual memory, on page size)
• Don’t jump around virtual pages
• Ensure you stay within L2
Know Thy hardware!
Copyright © 2015 Advanced Micro Devices 28
Device/Main memory
Device/Main memoryInput
Kernel 1
Kernel 2
Device/Main memoryOutput
Programmer’s View
Virtual memory
Copyright © 2015 Advanced Micro Devices 29
Input
Kernel 1
Kernel 2
Output
L2
L2
Device/Main memory
Device/Main memory
L2
Device/Main
memory
Ideal Physical View
Copyright © 2015 Advanced Micro Devices 30
Device/Main
memory
L2
L2
Device/Main memoryInput
Kernel 1
Kernel 2
Device/Main memory
Output
L2
L2
Be Mindful of your L2
Copyright © 2015 Advanced Micro Devices 31
• Consumer/producer paradigm…
• GPU: number crunching producer
• CPU: supervises GPU to global convergence
• mediated via C++11 platform atomics
• Very easy to transition to OpenCL right NOW!
• Replace all malloc code with:
clSVMAlloc and clEnqueueSVMMap (if needed)
• That’s it! No need to change any CPU code, and you can start
writing kernels!
Conclusions
Copyright © 2015 Advanced Micro Devices 32
• Ready for prime time in real time!
• Detection
• Recognition
• Tracking
• Real-time learning
Conclusions
Copyright © 2015 Advanced Micro Devices 33
The information presented in this document is for informational purposes only and may contain technical inaccuracies,
omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not
limited to product and roadmap changes, component and motherboard version changes, new model and/or product
releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the
like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right
to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify
any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO
RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN
NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES
ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY
OF SUCH DAMAGES.
ATTRIBUTION
© 2015 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks
of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes
only and may be trademarks of their respective owners. OpenCL is a trademark of Apple Inc. used by permission by
Khronos.
Disclaimer & Attribution

More Related Content

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

"Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD

  • 1. Copyright © 2015 Advanced Micro Devices 1 Harris Gasparakis, Ph.D. 12 May 2015 Understanding Adaptive Machine Learning Vision Algorithms and Implementing them on GPUs and Heterogeneous Platforms
  • 2. Copyright © 2015 Advanced Micro Devices 2 • Machine Learning (ML) • Constrained optimization problems • Heterogeneous computing • OpenCL2.0, HSA • Synthesis • OpenCL programming tips for ML • Conclusions Agenda
  • 3. Copyright © 2015 Advanced Micro Devices 3 Can you find an algorithm to describe an object, and detect it? Why Machine Learning (ML)?
  • 4. Copyright © 2015 Advanced Micro Devices 4 Can you find an algorithm to describe an object, and detect it? Sometimes Not Needed…
  • 5. Copyright © 2015 Advanced Micro Devices 5 Can you find an algorithm to describe an object, and detect it? Most Often Indispensable! Vidit Jain and Erik Learned-Miller. FDDB: A Benchmark for Face Detection in Unconstrained Settings.
  • 6. Copyright © 2015 Advanced Micro Devices 6 • Learn from examples! • Model the universe using functions with (possibly many) parameters “w” that you learn from training data • “x” is a (multi-dimensional) function of the image data • Pixel patches • A priori Features • Features in a learned dictionary (basis) • PCA • Sparse coding/LASSO/LARS • DNN • “y” is our value judgment on the data • Object category • Object identity, etc. Formalism
  • 7. Copyright © 2015 Advanced Micro Devices 7 Tune parameters “w” to best explain the “N” observations (𝑦𝑛, 𝑥 𝑛) Machine learning typically involves constrained functional minimization • Bias/variance • Overcompleteness/sparsity • How much learning is too much? • N = ? |w| = ? • Graphical models/subspace updates Formalism 𝐸 𝑤 = 𝐷 𝑦𝑛, 𝑥 𝑛; 𝑤 + λ𝐶 𝑤 + ⋯ 𝑁 𝑛=1
  • 8. Copyright © 2015 Advanced Micro Devices 8 It is a Jungle of Minima! Start with initial guess: 𝑤0 Iteratively improve it: 𝑤𝑡 = 𝑤𝑡−1 + 𝛿𝑤𝑡 Local minima, with Basins of attraction
  • 9. Copyright © 2015 Advanced Micro Devices 9 • Second order methods: 𝛿𝑤𝑡 = −𝐻 𝑤𝑡−1 𝑔(𝑤𝑡−1) • First order methods: 𝛿𝑤𝑡 = −κ 𝑔(𝑤𝑡−1) • Tweaks: Line minimization, momentum, heat, homotopy, multiresolution • Modern first order methods (AdaGrad, AdaDelta, etc): 𝛿𝑤𝑡 = −𝐻(𝑔1:𝑡)𝑔(𝑤𝑡−1) History
  • 10. Copyright © 2015 Advanced Micro Devices 10 • Start from a pool of multiple initial conditions, and multiple update rules (“configurations”) • Explore them simultaneously (GPU thread) • On each update step, reason about the progress of each (CPU threads) • Eliminate configurations: • Dead ends • in the same basin of attraction • Replace them with other random configurations • Give preference to configurations that progress the most What if?
  • 11. Copyright © 2015 Advanced Micro Devices 11 Let’s Explore the Jungle!
  • 12. Copyright © 2015 Advanced Micro Devices 12 Let’s Explore the Jungle!
  • 13. Copyright © 2015 Advanced Micro Devices 13 Let’s Explore the Jungle!
  • 14. Copyright © 2015 Advanced Micro Devices 14 Let’s Explore the Jungle!
  • 15. Copyright © 2015 Advanced Micro Devices 15 Some Dead Ends…
  • 16. Copyright © 2015 Advanced Micro Devices 16 Reinitialize them!
  • 17. Copyright © 2015 Advanced Micro Devices 17 Continue Exploring…
  • 18. Copyright © 2015 Advanced Micro Devices 18 One Visitor Per Attractor is Enough...
  • 19. Copyright © 2015 Advanced Micro Devices 19 • CPU as adaptive GPU supervisor • GPU computes an ensemble of updates • CPU reasons about the ensemble of updates • Coalesce if in the same basin of attraction • Prune or “kick” if trapped in local minimum • Test and rank according to generalization error • Is it practical? The Master Adaptive Strategy
  • 20. Copyright © 2015 Advanced Micro Devices 20 Know Thy (HSA) Hardware! CPU HSA iGPU Physical Memory Unified (Bidirectionally Coherent, Pageable) Virtual Memory L2 L2 CC L1 CC L1 CC L1 CC L1/LDS CC L1/LDS CC L1/LDS CC L1/LDS Scheduler Scheduler hUMA Heterogeneous System Architecture (exposed via OpenCL 2.0)
  • 21. Copyright © 2015 Advanced Micro Devices 21 CPU HSA iGPU Physical Memory Unified (Bidirectionally Coherent, Pageable) Virtual Memory L2 L2 CC L1 CC L1 CC L1 CC L1 CC L1 CC L1 CC L1 hQ Scheduler Scheduler Dynamic parallelism, Context switching, Preemption, Concurrent execution Know Thy (HSA) Hardware!
  • 22. Copyright © 2015 Advanced Micro Devices 22 • Scope • Thread, workgroup, device, all HSA devices • Semantics • Acquire (require that memory writes of other threads within the scope become visible in current thread) • Release (writes of current thread become visible to other threads in current scope) C++11 Atomics/Opencl 2.0 Atomics
  • 23. Copyright © 2015 Advanced Micro Devices 23 • Initialize pool of 𝑤𝑡 as fine grain SMV with atomics enabled: clSVMAlloc (…, CL_MEM_READ_WRITE | CL_MEM_SVM_FINE_GRAIN_BUFFER | CL_MEM_SVM_ATOMICS,…); • CPU waits for GPU to finish an iteration: done = std::atomic_load_explicit (.., std::memory_order_acquire ); • GPU kernel “signals” when done with an iteration: atomic_store_explicit ( (global atomic_int *)(…), … memory_order_release, memory_scope_all_svm_devices ); C++11 Atomics/OpenCL 2.0 Atomics
  • 24. Copyright © 2015 Advanced Micro Devices 24 • The optimal partitioning of problem to threads may be non-obvious • Depends a lot on cache line size • Do not incur memory latency multiple times, align threads with cache lines. OpenCL Tips
  • 25. Copyright © 2015 Advanced Micro Devices 25 • X0,0, X0,1, … X0,15 ,…, X0,127 • X1,0, X1,1, … , X0,15 ,…, X1,127 • X2,0, X2,1, … , X0,15 ,…, X2,127 • XN-1,0, XN-1,1, … , XN,15 ,…, XN,127 K=2 Means, N=10000, in F=128 dims • M0,0, M0,2, … , M0,15 ,…, M0,127 • M1,0, X1,2, … , X0,15 ,…, M1,127
  • 26. Copyright © 2015 Advanced Micro Devices 26 • X0,0, X0,1, … X0,15 ,…, X0,127 • X1,0, X1,1, … , X0,15 ,…, X1,127 • X2,0, X2,1, … , X0,15 ,…, X2,127 • XN-1,0, XN-1,1, … , XN,15 ,…, XN,127 K=2 Means, N=10000, in F=128 dims • M0,0, M0,2, … , M0,15 ,…, M0,127 • M1,0, X1,2, … , X0,15 ,…, M1,127
  • 27. Copyright © 2015 Advanced Micro Devices 27 • The optimal partitioning of problem to threads may be non-obvious • Depends a lot on cache line size • Depends a lot on L2 size (and for virtual memory, on page size) • Don’t jump around virtual pages • Ensure you stay within L2 Know Thy hardware!
  • 28. Copyright © 2015 Advanced Micro Devices 28 Device/Main memory Device/Main memoryInput Kernel 1 Kernel 2 Device/Main memoryOutput Programmer’s View Virtual memory
  • 29. Copyright © 2015 Advanced Micro Devices 29 Input Kernel 1 Kernel 2 Output L2 L2 Device/Main memory Device/Main memory L2 Device/Main memory Ideal Physical View
  • 30. Copyright © 2015 Advanced Micro Devices 30 Device/Main memory L2 L2 Device/Main memoryInput Kernel 1 Kernel 2 Device/Main memory Output L2 L2 Be Mindful of your L2
  • 31. Copyright © 2015 Advanced Micro Devices 31 • Consumer/producer paradigm… • GPU: number crunching producer • CPU: supervises GPU to global convergence • mediated via C++11 platform atomics • Very easy to transition to OpenCL right NOW! • Replace all malloc code with: clSVMAlloc and clEnqueueSVMMap (if needed) • That’s it! No need to change any CPU code, and you can start writing kernels! Conclusions
  • 32. Copyright © 2015 Advanced Micro Devices 32 • Ready for prime time in real time! • Detection • Recognition • Tracking • Real-time learning Conclusions
  • 33. Copyright © 2015 Advanced Micro Devices 33 The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2015 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners. OpenCL is a trademark of Apple Inc. used by permission by Khronos. Disclaimer & Attribution