SlideShare a Scribd company logo
1 of 18
Download to read offline
Copyright © 2017 Xilinx, Inc. 1
Nick Ni and Vinod Kathail
May 2017
Caffe to Zynq:
State-of-the-Art Machine Learning Inference
Performance in Less Than 5 Watts
Copyright © 2017 Xilinx, Inc. 2
• Why Zynq SoCs for Deep Learning Inference
• Caffe to Zynq SoC in Seconds
• A Full System Example
Agenda
Copyright © 2017 Xilinx, Inc. 3
Diverse Applications with Diverse Design Targets
ADAS
High accuracy
Low latency
Medical
Diagnosis:
Small networks
Robotics:
Real-time
Hearing Aids:
Small network
Low latency
Translate &
AlphaGo:
Huge networks
Copyright © 2017 Xilinx, Inc. 4
Zynq Offers the Most Efficient Deep Learning
Inference
Copyright © 2017 Xilinx, Inc. 5
Zynq SoCs Offer Superior Throughput, Latency
6x
Images/sec/watt
1/5
Latency (ms)
Machine
Learning
Inference
Xilinx
Benchmark
Xilinx
Benchmark
• eGPU = nVidia TX1: nVidia GoogLeNet performance: https://devblogs.nvidia.com/parallelforall/jetpack-doubles-jetson-tx1-deep-learning-inference/
GoogLeNet
@ batch = 1
Xilinx ZU9 Xilinx ZU5 eGPU*
Images/s 370.0 155.0 70
Power (W) 7.0 4.5 7.9
Images/s/watt 53.0 34.5 8.9
Real Time
Applications
Latency
GoogLeNet
@ batch = 1
Xilinx ZU9 Xilinx ZU5 eGPU*
Images/s 370.0 155.0 70
Latency (ms) 2.7 6.4 14.2
GoogLeNet
@ batch = 8
Xilinx ZU9 Xilinx ZU5 eGPU*
Images/s 370.0 155.0 163
Latency (ms) 2.7 6.4 49.0
For large batch,
CPU/GPU/DSPs latency
increases significantly
Copyright © 2017 Xilinx, Inc. 6
Training: Process for machine to
“learn” and optimize model from data
Inference: Using trained models to
predict/estimate outcomes from new
observations in efficient deployments
INFERENCE
Fewer
“dog” ✓
“dog”
Input
”cat”
TRAINING
= ?
Many
labels
Error
Input
FP-32
FIXED-16
(INT16)
FIXED-8
(INT8)
Difference
vs FP32
VGG-16 86.6% 86.6% 86.4% (0.2%)
GoogLeNet 88.6% 88.5% 85.7% (2.9%)
SqueezeNet 81.4% 81.4% 80.3% (1.1%)
Inference now 8 bit and below
for maximum efficiency
The Divergence of Training and Inference
Top-5
Accuracy
https://arxiv.org/pdf/1510.00149v5.pdf
Copyright © 2017 Xilinx, Inc. 7
Inference Precisions Moving to Lower and Variable
Precision
Citation: https://arxiv.org/pdf/1510.00149.pdf
# of weight bits # of weight bits
Convolution Layers Fully Connected Layers
Copyright © 2017 Xilinx, Inc. 8
Future Proof Architecture for Any Precisions
Limited to 32 bit operations
SIMD operations at INT8/16
New devices required to
support change in precision efficiently
Reconfigurable to scale and
optimize for different precisions
Copyright © 2017 Xilinx, Inc. 9
TOPS
8b
• Reducing precision from 8b to 1b shrinks LUT cost by 40x
• Potential to scale CNN performance to above 23TOPS (ZU9)
BNN: Unparalleled Performance
8b
4b
1b1b1b1b1b1b1b1b
Q3TX2 ZU9
1.5
0.3 2.7
1b1b1b1b1b1b1b1b
1b1b1b1b1b1b1b1b
1b1b1b1b1b1b1b1b
1b1b1b1b1b1b1b1b
1b1b1b1b1b1b1b1b
1b1b1b1b1b1b1b1b
1b1b1b1b1b1b1b1b
1b1b1b1b1b1b1b1b
TOPS
(1b)
23
ZU9Q3TX2
1.5
0.3
Copyright © 2017 Xilinx, Inc. 10
Low Latency Inference by Layer to Layer Dataflow
On Chip
GPU/CPU: Dataflow using
Off chip memory
Xilinx: Maximum local data reuse
“layer merging” between layers
On-chip Memory
Nvidia Tegra X1 (GPU Regfile + L2) Xilinx ZU7 (BRAM + URAM)
6 Mb 38 Mb
Up to 6x More On-chip Memory than SoCs and eGPUs
Nvidia TX1 spec: http://wccftech.com/nvidia-tegra-x1-super-chip-announced-ces-2015-features-maxwell-core-architecture-256-cuda-cores/
Copyright © 2017 Xilinx, Inc. 11
Frameworks
Libraries and Tools
Development Kits
DNN
CNN
GoogLeNet
SSD
FCN …
OpenVX
integration
Copyright © 2017 Xilinx, Inc. 12
xFdnn: Direct Deep Learning Inference from Caffe
Compiles only ARM software code in minutes. No hardware compilation.
1 2 3Import .prototxt and
trained weights
Call prototxt
runtime API in
your application
Cross-compile for
Cortex-A53 and run
on a board
Copyright © 2017 Xilinx, Inc. 13
Deep Learning Design Examples
Copyright © 2017 Xilinx, Inc. 14
Building a Full Embedded Vision System
System Optimizing
Compiler Machine Learning
.prototxt
& Trained
Weights
DNN
CNN
GoogLeNet
SSD
FCN …
Scheduling of Pre-Optimized
Neural Network Layers
Optimized Accelerators
& Data Motion Network
Copyright © 2017 Xilinx, Inc. 15
Building a Full Embedded Vision System
C/C++/OpenCL
Creation
Profiling to Identify
Bottlenecks
System Optimizing
Compiler
Computer Vision
Machine Learning
Scheduling of Pre-Optimized
Neural Network Layers
Optimized Accelerators
& Data Motion Network
.prototxt
& Trained
Weights
DNN
CNN
GoogLeNet
SSD
FCN …
OpenVX
integration
Copyright © 2017 Xilinx, Inc. 16
Putting It All Together: CV and CNN with Multiple
Sensors
USB-3
Stereo
Vision
MIPI CSI ISP
SD Card
File Read
VPSS
Scaler
Video
Mixer
Frame
Buffer
Optical
Flow
CNN
500x500x10
RGB
1280x720p60
YUYV 4:2:2
2x720p60
YUYV 4:2:2
Frame
Buffer
Frame
Buffer
Frame
Buffer
HDMI TX
3840x2160p60
RGB
Copyright © 2017 Xilinx, Inc. 17
• Zynq SoCs offer superior performance and lower latency compared to
other SoC offerings
• reVISION stack provides seamless inference of custom deep learning
networks from Caffe to Zynq SoCs
• Available publically in end of May 2017
• Visit Xilinx.com/reVISION for more information
Summary
Come Visit the Xilinx Booth in the
Technology Showcase to see the
reVISION stack in action!!
Copyright © 2017 Xilinx, Inc. 18
• INT8 Whitepaper
• Machine Learning Whitepaper
• reVISION Backgrounder
• Additional Papers & Tutorials
• Xilinx Embedded Vision Videos
• Forums
Resources

More Related Content

More from Edge AI and Vision Alliance

“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...Edge AI and Vision Alliance
 
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...Edge AI and Vision Alliance
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...Edge AI and Vision Alliance
 
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic LeapEdge AI and Vision Alliance
 
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ..."Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
 
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
 
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
 
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ..."Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 

Recently uploaded (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

"Caffe to Zynq: State-of-the-Art Machine Learning Inference Performance in Less Than 5 Watts," a Presentation from Xilinx

  • 1. Copyright © 2017 Xilinx, Inc. 1 Nick Ni and Vinod Kathail May 2017 Caffe to Zynq: State-of-the-Art Machine Learning Inference Performance in Less Than 5 Watts
  • 2. Copyright © 2017 Xilinx, Inc. 2 • Why Zynq SoCs for Deep Learning Inference • Caffe to Zynq SoC in Seconds • A Full System Example Agenda
  • 3. Copyright © 2017 Xilinx, Inc. 3 Diverse Applications with Diverse Design Targets ADAS High accuracy Low latency Medical Diagnosis: Small networks Robotics: Real-time Hearing Aids: Small network Low latency Translate & AlphaGo: Huge networks
  • 4. Copyright © 2017 Xilinx, Inc. 4 Zynq Offers the Most Efficient Deep Learning Inference
  • 5. Copyright © 2017 Xilinx, Inc. 5 Zynq SoCs Offer Superior Throughput, Latency 6x Images/sec/watt 1/5 Latency (ms) Machine Learning Inference Xilinx Benchmark Xilinx Benchmark • eGPU = nVidia TX1: nVidia GoogLeNet performance: https://devblogs.nvidia.com/parallelforall/jetpack-doubles-jetson-tx1-deep-learning-inference/ GoogLeNet @ batch = 1 Xilinx ZU9 Xilinx ZU5 eGPU* Images/s 370.0 155.0 70 Power (W) 7.0 4.5 7.9 Images/s/watt 53.0 34.5 8.9 Real Time Applications Latency GoogLeNet @ batch = 1 Xilinx ZU9 Xilinx ZU5 eGPU* Images/s 370.0 155.0 70 Latency (ms) 2.7 6.4 14.2 GoogLeNet @ batch = 8 Xilinx ZU9 Xilinx ZU5 eGPU* Images/s 370.0 155.0 163 Latency (ms) 2.7 6.4 49.0 For large batch, CPU/GPU/DSPs latency increases significantly
  • 6. Copyright © 2017 Xilinx, Inc. 6 Training: Process for machine to “learn” and optimize model from data Inference: Using trained models to predict/estimate outcomes from new observations in efficient deployments INFERENCE Fewer “dog” ✓ “dog” Input ”cat” TRAINING = ? Many labels Error Input FP-32 FIXED-16 (INT16) FIXED-8 (INT8) Difference vs FP32 VGG-16 86.6% 86.6% 86.4% (0.2%) GoogLeNet 88.6% 88.5% 85.7% (2.9%) SqueezeNet 81.4% 81.4% 80.3% (1.1%) Inference now 8 bit and below for maximum efficiency The Divergence of Training and Inference Top-5 Accuracy https://arxiv.org/pdf/1510.00149v5.pdf
  • 7. Copyright © 2017 Xilinx, Inc. 7 Inference Precisions Moving to Lower and Variable Precision Citation: https://arxiv.org/pdf/1510.00149.pdf # of weight bits # of weight bits Convolution Layers Fully Connected Layers
  • 8. Copyright © 2017 Xilinx, Inc. 8 Future Proof Architecture for Any Precisions Limited to 32 bit operations SIMD operations at INT8/16 New devices required to support change in precision efficiently Reconfigurable to scale and optimize for different precisions
  • 9. Copyright © 2017 Xilinx, Inc. 9 TOPS 8b • Reducing precision from 8b to 1b shrinks LUT cost by 40x • Potential to scale CNN performance to above 23TOPS (ZU9) BNN: Unparalleled Performance 8b 4b 1b1b1b1b1b1b1b1b Q3TX2 ZU9 1.5 0.3 2.7 1b1b1b1b1b1b1b1b 1b1b1b1b1b1b1b1b 1b1b1b1b1b1b1b1b 1b1b1b1b1b1b1b1b 1b1b1b1b1b1b1b1b 1b1b1b1b1b1b1b1b 1b1b1b1b1b1b1b1b 1b1b1b1b1b1b1b1b TOPS (1b) 23 ZU9Q3TX2 1.5 0.3
  • 10. Copyright © 2017 Xilinx, Inc. 10 Low Latency Inference by Layer to Layer Dataflow On Chip GPU/CPU: Dataflow using Off chip memory Xilinx: Maximum local data reuse “layer merging” between layers On-chip Memory Nvidia Tegra X1 (GPU Regfile + L2) Xilinx ZU7 (BRAM + URAM) 6 Mb 38 Mb Up to 6x More On-chip Memory than SoCs and eGPUs Nvidia TX1 spec: http://wccftech.com/nvidia-tegra-x1-super-chip-announced-ces-2015-features-maxwell-core-architecture-256-cuda-cores/
  • 11. Copyright © 2017 Xilinx, Inc. 11 Frameworks Libraries and Tools Development Kits DNN CNN GoogLeNet SSD FCN … OpenVX integration
  • 12. Copyright © 2017 Xilinx, Inc. 12 xFdnn: Direct Deep Learning Inference from Caffe Compiles only ARM software code in minutes. No hardware compilation. 1 2 3Import .prototxt and trained weights Call prototxt runtime API in your application Cross-compile for Cortex-A53 and run on a board
  • 13. Copyright © 2017 Xilinx, Inc. 13 Deep Learning Design Examples
  • 14. Copyright © 2017 Xilinx, Inc. 14 Building a Full Embedded Vision System System Optimizing Compiler Machine Learning .prototxt & Trained Weights DNN CNN GoogLeNet SSD FCN … Scheduling of Pre-Optimized Neural Network Layers Optimized Accelerators & Data Motion Network
  • 15. Copyright © 2017 Xilinx, Inc. 15 Building a Full Embedded Vision System C/C++/OpenCL Creation Profiling to Identify Bottlenecks System Optimizing Compiler Computer Vision Machine Learning Scheduling of Pre-Optimized Neural Network Layers Optimized Accelerators & Data Motion Network .prototxt & Trained Weights DNN CNN GoogLeNet SSD FCN … OpenVX integration
  • 16. Copyright © 2017 Xilinx, Inc. 16 Putting It All Together: CV and CNN with Multiple Sensors USB-3 Stereo Vision MIPI CSI ISP SD Card File Read VPSS Scaler Video Mixer Frame Buffer Optical Flow CNN 500x500x10 RGB 1280x720p60 YUYV 4:2:2 2x720p60 YUYV 4:2:2 Frame Buffer Frame Buffer Frame Buffer HDMI TX 3840x2160p60 RGB
  • 17. Copyright © 2017 Xilinx, Inc. 17 • Zynq SoCs offer superior performance and lower latency compared to other SoC offerings • reVISION stack provides seamless inference of custom deep learning networks from Caffe to Zynq SoCs • Available publically in end of May 2017 • Visit Xilinx.com/reVISION for more information Summary Come Visit the Xilinx Booth in the Technology Showcase to see the reVISION stack in action!!
  • 18. Copyright © 2017 Xilinx, Inc. 18 • INT8 Whitepaper • Machine Learning Whitepaper • reVISION Backgrounder • Additional Papers & Tutorials • Xilinx Embedded Vision Videos • Forums Resources