SlideShare a Scribd company logo
1 of 54
Download to read offline
Deep Learning Neural
Network Acceleration at
the Edge
Andrea Gallo
VP Segments and Strategic Initiatives
29-Aug-2018
Vancouver
LEADING
COLLABORATION
IN THE ARM
ECOSYSTEM
Disclaimer
All information in this session is public
No confidential information has been disclosed
from private communication between Linaro
and Linaro members
URL’s to the original source are provided in each slide
Why Deep Learning?
End-to-End Learning for Many Tasks
Slide from DIY Deep Learning for Vision: a Hands-On Tutorial with Caffe
It’s complex!!!
Slide from DIY Deep Learning for Vision: a Hands-On Tutorial with Caffe
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
From cloud to edge devices
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
From cloud to edge devices
Always online
Uplink bandwidth and traffic
Latency vs real time constraints
Privacy concerns
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
From cloud to edge devices
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
From cloud to edge devices
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
From cloud to edge devices
AI/ML Frameworks
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
TensorFlow
Developed in-house by the Google Brain team
● Started as DistBelief in 2011
● Evolved into TensorFlow with its first commit in November 2015
● V1.0.0 released on Feb 11, 2017
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
TensorFlow
Developed in-house by the Google Brain team
● Started as DistBelief in 2011
● Evolved into TensorFlow with its first commit in November 2015
● V1.0.0 released on Feb 11, 2017
TensorFlow can be built as
● TensorFlow for cloud and datacenters
● TensorFlow Lite for mobile devices
● TensorFlow.js for AI in web browsers
TensorFlow models on tensorflow github
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
TensorFlow
Developed in-house by the Google Brain team
● Started as DistBelief in 2011
● Evolved into TensorFlow with its first commit in November 2015
● V1.0.0 released on Feb 11, 2017
TensorFlow can be built as Support multiple accelerators
● TensorFlow for cloud and datacenters → GPU and TPU
● TensorFlow Lite for mobile devices → Android NNAPI and NN HAL
● TensorFlow.js for AI in web browsers → WebGL
TensorFlow models on tensorflow github
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
TensorFlow
Developed in-house by the Google Brain team
● Started as DistBelief in 2011
● Evolved into TensorFlow with its first commit in November 2015
● V1.0.0 released on Feb 11, 2017
TensorFlow can be built as Support multiple accelerators
● TensorFlow for cloud and datacenters → GPU and TPU
● TensorFlow Lite for mobile devices → Android NNAPI and NN HAL
● TensorFlow.js for AI in web browsers → WebGL
TensorFlow models on tensorflow github
31,713 commits
1,624 contributors
1,610,734 lines of code
456 years of effort
1st Commit Nov ‘15
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
From TensorFlow to TensorFlow Lite
TensorFlow Lite uses FlatBuffers
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
TensorFlow 1st Commit in November 2015
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Caffe
● Made with expression, speed, and modularity in mind
● Developed by Berkeley AI Research (BAIR) and by community contributors
○ Yangqing Jia created the project during his PhD at UC Berkeley
○ Caffe is released under the BSD 2-Clause license
● Focus has been vision, but also handles sequences, speech, text
● Tools, reference models, demos, and recipes → Caffe Zoo
● Seamless switch between CPU and GPU
caffe.berkeleyvision.org github.com/BVLC/caffe
4,137 commits
314 contributors
76,076 lines of code
19 years of effort
1st commit in Sept‘13
15,000+ forks
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Caffe2
Caffe2 improves Caffe 1.0 in a series of directions
● First-class support for large-scale distributed training
● Mobile deployment
● New hardware support (in addition to CPU and CUDA)
● Flexibility for future directions such as quantized computation
● Stress tested by the vast scale of Facebook applications
● Examples and pre-trained models available from the Caffe2 Zoo
● Running on mobile devices with Android and iOS
○ Step-by-step tutorial with camera demo
● Caffe1 models do not run with Caffe2
○ Converter tool available
3,678 commits
332 contributors
275,560 lines of code
73 years of effort
1st commit in June ‘15
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Caffe2 1st commit in June 2015
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
MxNet
MXNet is a multi-language machine learning (ML) library to ease the development of
ML algorithms, especially for deep neural networks. MXNet is computation and
memory efficient and runs on various heterogeneous systems, ranging from mobile
devices to distributed GPU clusters.
Currently, MXNet is supported by Intel, Dato, Baidu, Microsoft, Wolfram Research,
and research institutions such as Carnegie Mellon, MIT, the University of
Washington, and the Hong Kong University of Science and Technology.
Gluon API, examples, tutorials and pre-trained models from the Gluon model zoo
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
mxnet 1st Commit in April 2015
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
mxnet 1st Commit in April 2015
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Deep Learning framework comparison
https://www.openhub.net/p/_compare?project_0=MXNet&project_1=caffe2&project_2=TensorFlow
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
https://www.openhub.net/p/_compare?project_0=MXNet&project_1=caffe2&project_2=TensorFlow
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
https://www.openhub.net/p/_compare?project_0=MXNet&project_1=caffe2&project_2=TensorFlow
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Observations
● Each cloud player has its own deep learning framework
● Each AI framework has its own entire ecosystem of formats, tools, model store
● Each AI framework represents a significant investment
● Scaling and acceleration are fundamental to performance
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Observations
● Each cloud player has its own deep learning framework
● Each AI framework has its own entire ecosystem of formats, tools, model store
● Each AI framework represents a significant investment
● Scaling and acceleration are fundamental to performance
If you want a really cool job like Manjunath, Yangqing or Mu Li….
INVENT A GREAT NEW AI/ML FRAMEWORK
NN accelerators and
software solutions
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Google Edge TPU
The Edge TPU is Google's purpose-built
ASIC chip designed to run TensorFlow Lite
ML inference at the edge
● AIY Edge TPU Dev Board
● AIY Edge TPU Accelerator
https://aiyprojects.withgoogle.com/edge-tpu/
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Arm Mali-G72
Arm Mali-G72 is the second generation
Bifrost-based GPU for High Performance
products. Benefitting from advanced
technologies such as claused shaders and full
system coherency, Mali-G72 adds increased tile
buffer memory supporting up to 16 x
Multi-Sample Anti-Aliasing at minimal
performance cost. Arithmetic optimizations
tailored to complex Machine Learning and High
Fidelity Mobile Gaming use cases provide 25%
higher energy efficiency, 20% better
performance density and 40% greater overall
performance than devices based on previous
generation Bifrost GPU.
https://developer.arm.com/products/graphics-and-multimedia/mali-gpus/mali-g72-gpu
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Arm ML processor
The Arm Machine Learning processor is an
optimized, ground-up design for machine learning
acceleration, targeting mobile and adjacent
markets:
● optimized fixed-function engines for
best-in-class performance
● additional programmable layer engines
support the execution of non-convolution
layers, and the implementation of selected
primitives and operators
The network control unit manages the overall
execution and traversal of the network and the DMA
moves data in and out of the main memory.
Onboard memory allows central storage for weights
and feature maps
https://developer.arm.com/products/processors/machine-learning/arm-ml-processor
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Arm OD processor
● Detects object in real time with Full HD at 60fps.
● Object sizes from 50x60 pixels to full screen.
● Virtually unlimited objects detected per frame.
● Detailed people model provides rich metadata
and allows detection of direction, trajectory,
pose and gesture.
● Advanced software running on accompanying
application processor allows for higher-level
behaviour to be determined, including
sophisticated inter-frame tracking.
● Additional software libraries enable higher-level,
on-device features, such as face recognition.
https://developer.arm.com/products/processors/machine-learning/arm-od-processor
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Arm NN
Arm NN SDK is a set of open-source Linux software and
tools that enables machine learning workloads on
power-efficient devices. It provides a bridge between
existing neural network frameworks and power-efficient
Arm Cortex CPUs, Arm Mali GPUs or the Arm Machine
Learning processor.
Arm NN SDK utilizes the Compute Library to target
programmable cores, such as Cortex-A CPUs and Mali
GPUs, as efficiently as possible. It includes support for the
Arm Machine Learning processor and, via CMSIS-NN,
support for Cortex-M CPUs.
https://developer.arm.com/products/processors/machine-learning/arm-nn
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Arm NN
https://developer.arm.com/products/processors/machine-learning/arm-nn
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Qualcomm
https://connect.linaro.org/resources/hkg18/hkg18-306/
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
HiSilicon
● 99 operators
● Caffe, TensorFlow, TensorFlow Lite, Huawei HiAI SDK, Android NN
● Converter tools from AI models to serialized offline model
https://connect.linaro.org/resources/hkg18/hkg18-302/
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Mediatek
https://www.forbes.com/sites/tiriasresearch/2017/03/31/mediatek-brings-neural-networks-to-devices/#6468bd5f3eac
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
An ecosystem of 3rd parties providing NN IP and tools
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Observations
● Complete offload vs heterogenous computing
● Shared memory vs sub-system memories and DMA
● Fixed operators and software fallback
● Graph split vs cost of context switch
● Serialized models and converter tools
CPU
NPU
RAM
CPU
GPU
RAM
RAM DSP RAM DLA
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Observations
● Complete offload vs heterogenous computing
● Shared memory vs sub-system memories and DMA
● Fixed operators and software fallback
● Graph split vs cost of context switch
● Serialized models and converter tools
● Forked and accelerated inference engine for each NN IP and each framework
→ high total cost of ownership
→ delayed rebases and updates
→ delayed security fixes
Call to Action
Linaro Collaboration
Members fund Linaro and drive work
through engineering steering committees
Member and Linaro engineers
collaborate to develop work once, for all
Linaro delivers output to members,
into open source projects, and
into the community
Now ~25 members, up from 6 in 2010
Over 300 OSS engineers globally,
including 140 Linaro staff
Core Members
Club Members
Group Members
Community Members
Linaro works Upstream
Delivering high value collaboration
Top 5 company contributor to Linux and
Zephyr kernels
Contributor to >70 open source projects;
many maintained by Linaro engineers
Company 4.8-4.13 Changesets %
1 Intel 10,833 13.1%
2 Red Hat 5,965 7.2%
3 Linaro 4,636 5.6%
Source: 2017 Linux Kernel Development Report, Linux Foundation
Selected projects Linaro contributes to
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
Open Neural Network Exchange (ONNX)
An open source format for AI models
An extensible computation graph model
Definitions of built-in operators and standard data types
Initial focus on inference
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
ONNX Interface for Framework Integration (ONNXIFI)
Standardized interface for neural network inference on special-purpose
accelerators, CPUs, GPUs, DSPs, and FPGAs
Dynamic discovery of available backends and supported ONNX operators
Initialize and deinitialize backends
Specify memory locations and metadata
Run an ONNX graph
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
ONNXIFI API Call Flow
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
https://developer.android.com/ndk/guides/neuralnetworks/
Android NN API
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
https://developer.arm.com/products/processors/machine-learning/arm-nn
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
● Common model description format and APIs to the runtime
● Common optimized runtime inference engine for Arm-based SoC
● Dynamic plug-in framework to support multiple 3rd party NPU, CPU, GPU, DSP
● CI loops on reference development boards to measure accuracy, performance
speed up and regression testing
Areas of Collaboration
Discussions started last March
AI/ML Resources from HKG18
HKG18-417 - OpenCL support by NNVM & TVM
HKG18-413 - AI and Machine Learning BoF
HKG18-405 - Accelerating Neural Networks with...
HKG18-312 - CMSIS-NN
HKG18-306 - Overview of Qualcomm SNPE
HKG18-304 - Scalable AI server
HKG18-302 - Huawei HiAI : Unlock The Future
HKG18-200K2 - Keynote: Accelerating AI from Cloud to Edge
LEADING COLLABORATION
IN THE ARM ECOSYSTEM
https://connect.linaro.org/ai-neural-networks-arm-summit/

More Related Content

What's hot

RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V International
 
RISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmwareRISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmwareRISC-V International
 
LAS16-201: ART JIT in Android N
LAS16-201: ART JIT in Android NLAS16-201: ART JIT in Android N
LAS16-201: ART JIT in Android NLinaro
 
LAS16-108: JerryScript and other scripting languages for IoT
LAS16-108: JerryScript and other scripting languages for IoTLAS16-108: JerryScript and other scripting languages for IoT
LAS16-108: JerryScript and other scripting languages for IoTLinaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLinaro
 
Codasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutionsCodasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutionsRISC-V International
 
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLinaro
 
BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE Linaro
 
BUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoCBUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoCLinaro
 
LAS16-106: GNU Toolchain Development Lifecycle
LAS16-106: GNU Toolchain Development LifecycleLAS16-106: GNU Toolchain Development Lifecycle
LAS16-106: GNU Toolchain Development LifecycleLinaro
 
Las16 309 - lua jit arm64 port - status
Las16 309 - lua jit arm64 port - statusLas16 309 - lua jit arm64 port - status
Las16 309 - lua jit arm64 port - statusLinaro
 
LAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLinaro
 
LAS16-109: LAS16-109: The status quo and the future of 96Boards
LAS16-109: LAS16-109: The status quo and the future of 96BoardsLAS16-109: LAS16-109: The status quo and the future of 96Boards
LAS16-109: LAS16-109: The status quo and the future of 96BoardsLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
LAS16-400K2: TianoCore – Open Source UEFI Community Update
LAS16-400K2: TianoCore – Open Source UEFI Community UpdateLAS16-400K2: TianoCore – Open Source UEFI Community Update
LAS16-400K2: TianoCore – Open Source UEFI Community UpdateLinaro
 
Fueling the datasphere how RISC-V enables the storage ecosystem
Fueling the datasphere   how RISC-V enables the storage ecosystemFueling the datasphere   how RISC-V enables the storage ecosystem
Fueling the datasphere how RISC-V enables the storage ecosystemRISC-V International
 
LAS16-100K1: Welcome Keynote
LAS16-100K1: Welcome KeynoteLAS16-100K1: Welcome Keynote
LAS16-100K1: Welcome KeynoteLinaro
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Linaro
 

What's hot (20)

RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
 
RISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmwareRISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmware
 
LAS16-201: ART JIT in Android N
LAS16-201: ART JIT in Android NLAS16-201: ART JIT in Android N
LAS16-201: ART JIT in Android N
 
LAS16-108: JerryScript and other scripting languages for IoT
LAS16-108: JerryScript and other scripting languages for IoTLAS16-108: JerryScript and other scripting languages for IoT
LAS16-108: JerryScript and other scripting languages for IoT
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
 
Codasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutionsCodasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutions
 
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
 
BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE
 
BUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoCBUD17 Socionext SC2A11 ARM Server SoC
BUD17 Socionext SC2A11 ARM Server SoC
 
LAS16-106: GNU Toolchain Development Lifecycle
LAS16-106: GNU Toolchain Development LifecycleLAS16-106: GNU Toolchain Development Lifecycle
LAS16-106: GNU Toolchain Development Lifecycle
 
Las16 309 - lua jit arm64 port - status
Las16 309 - lua jit arm64 port - statusLas16 309 - lua jit arm64 port - status
Las16 309 - lua jit arm64 port - status
 
LAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMG
 
LAS16-109: LAS16-109: The status quo and the future of 96Boards
LAS16-109: LAS16-109: The status quo and the future of 96BoardsLAS16-109: LAS16-109: The status quo and the future of 96Boards
LAS16-109: LAS16-109: The status quo and the future of 96Boards
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
LAS16-400K2: TianoCore – Open Source UEFI Community Update
LAS16-400K2: TianoCore – Open Source UEFI Community UpdateLAS16-400K2: TianoCore – Open Source UEFI Community Update
LAS16-400K2: TianoCore – Open Source UEFI Community Update
 
Fueling the datasphere how RISC-V enables the storage ecosystem
Fueling the datasphere   how RISC-V enables the storage ecosystemFueling the datasphere   how RISC-V enables the storage ecosystem
Fueling the datasphere how RISC-V enables the storage ecosystem
 
LAS16-100K1: Welcome Keynote
LAS16-100K1: Welcome KeynoteLAS16-100K1: Welcome Keynote
LAS16-100K1: Welcome Keynote
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
 
Open j9 jdk on RISC-V
Open j9 jdk on RISC-VOpen j9 jdk on RISC-V
Open j9 jdk on RISC-V
 

Similar to Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo

Parallel universe-issue-29
Parallel universe-issue-29Parallel universe-issue-29
Parallel universe-issue-29DESMOND YUEN
 
Why Pay for Open Source Linux? Avoid the Hidden Cost of DIY
Why Pay for Open Source Linux? Avoid the Hidden Cost of DIYWhy Pay for Open Source Linux? Avoid the Hidden Cost of DIY
Why Pay for Open Source Linux? Avoid the Hidden Cost of DIYEnterprise Management Associates
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0Ganesan Narayanasamy
 
Top 7 Frameworks for Integration AI in App Development
Top 7 Frameworks for Integration AI in App DevelopmentTop 7 Frameworks for Integration AI in App Development
Top 7 Frameworks for Integration AI in App DevelopmentInexture Solutions
 
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...Edge AI and Vision Alliance
 
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...Edureka!
 
Achieving AI @scale on Mobile Devices
Achieving AI @scale on Mobile DevicesAchieving AI @scale on Mobile Devices
Achieving AI @scale on Mobile DevicesQualcomm Research
 
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation..."Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...Edge AI and Vision Alliance
 
Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2aspyker
 
NetflixOSS Meetup season 3 episode 2
NetflixOSS Meetup season 3 episode 2NetflixOSS Meetup season 3 episode 2
NetflixOSS Meetup season 3 episode 2Ruslan Meshenberg
 
Html5 workshop part 1
Html5 workshop part 1Html5 workshop part 1
Html5 workshop part 1NAILBITER
 
DockerDay2015: Keynote
DockerDay2015: KeynoteDockerDay2015: Keynote
DockerDay2015: KeynoteDocker-Hanoi
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Codemotion
 
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Mark Goldstein
 
如何在 Ubuntu 上更快、更便捷地部署物联网设备
如何在 Ubuntu 上更快、更便捷地部署物联网设备如何在 Ubuntu 上更快、更便捷地部署物联网设备
如何在 Ubuntu 上更快、更便捷地部署物联网设备Rex Tsai
 

Similar to Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo (20)

Parallel universe-issue-29
Parallel universe-issue-29Parallel universe-issue-29
Parallel universe-issue-29
 
Why Pay for Open Source Linux? Avoid the Hidden Cost of DIY
Why Pay for Open Source Linux? Avoid the Hidden Cost of DIYWhy Pay for Open Source Linux? Avoid the Hidden Cost of DIY
Why Pay for Open Source Linux? Avoid the Hidden Cost of DIY
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 
Top 7 Frameworks for Integration AI in App Development
Top 7 Frameworks for Integration AI in App DevelopmentTop 7 Frameworks for Integration AI in App Development
Top 7 Frameworks for Integration AI in App Development
 
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
 
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
 
Achieving AI @scale on Mobile Devices
Achieving AI @scale on Mobile DevicesAchieving AI @scale on Mobile Devices
Achieving AI @scale on Mobile Devices
 
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation..."Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
 
Satyam_Singh_cv
Satyam_Singh_cvSatyam_Singh_cv
Satyam_Singh_cv
 
Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2
 
NetflixOSS Meetup season 3 episode 2
NetflixOSS Meetup season 3 episode 2NetflixOSS Meetup season 3 episode 2
NetflixOSS Meetup season 3 episode 2
 
Cloud to Edge
Cloud to EdgeCloud to Edge
Cloud to Edge
 
Html5 workshop part 1
Html5 workshop part 1Html5 workshop part 1
Html5 workshop part 1
 
Sundance's presentation at B:RAI 2020
Sundance's presentation at B:RAI 2020Sundance's presentation at B:RAI 2020
Sundance's presentation at B:RAI 2020
 
DockerDay2015: Keynote
DockerDay2015: KeynoteDockerDay2015: Keynote
DockerDay2015: Keynote
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...
 
Build 2019 Recap
Build 2019 RecapBuild 2019 Recap
Build 2019 Recap
 
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
 
如何在 Ubuntu 上更快、更便捷地部署物联网设备
如何在 Ubuntu 上更快、更便捷地部署物联网设备如何在 Ubuntu 上更快、更便捷地部署物联网设备
如何在 Ubuntu 上更快、更便捷地部署物联网设备
 

More from Linaro

Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramLinaro
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNLinaro
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...Linaro
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...Linaro
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionLinaro
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersLinaro
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightLinaro
 

More from Linaro (20)

Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NN
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: Introduction
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 Servers
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with Coresight
 

Recently uploaded

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - AvrilIvanti
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 

Recently uploaded (20)

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - Avril
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo

  • 1. Deep Learning Neural Network Acceleration at the Edge Andrea Gallo VP Segments and Strategic Initiatives 29-Aug-2018 Vancouver
  • 2. LEADING COLLABORATION IN THE ARM ECOSYSTEM Disclaimer All information in this session is public No confidential information has been disclosed from private communication between Linaro and Linaro members URL’s to the original source are provided in each slide
  • 3. Why Deep Learning? End-to-End Learning for Many Tasks Slide from DIY Deep Learning for Vision: a Hands-On Tutorial with Caffe
  • 4. It’s complex!!! Slide from DIY Deep Learning for Vision: a Hands-On Tutorial with Caffe
  • 5. LEADING COLLABORATION IN THE ARM ECOSYSTEM From cloud to edge devices
  • 6. LEADING COLLABORATION IN THE ARM ECOSYSTEM From cloud to edge devices Always online Uplink bandwidth and traffic Latency vs real time constraints Privacy concerns
  • 7. LEADING COLLABORATION IN THE ARM ECOSYSTEM From cloud to edge devices
  • 8. LEADING COLLABORATION IN THE ARM ECOSYSTEM From cloud to edge devices
  • 9. LEADING COLLABORATION IN THE ARM ECOSYSTEM From cloud to edge devices
  • 11. LEADING COLLABORATION IN THE ARM ECOSYSTEM TensorFlow Developed in-house by the Google Brain team ● Started as DistBelief in 2011 ● Evolved into TensorFlow with its first commit in November 2015 ● V1.0.0 released on Feb 11, 2017
  • 12. LEADING COLLABORATION IN THE ARM ECOSYSTEM TensorFlow Developed in-house by the Google Brain team ● Started as DistBelief in 2011 ● Evolved into TensorFlow with its first commit in November 2015 ● V1.0.0 released on Feb 11, 2017 TensorFlow can be built as ● TensorFlow for cloud and datacenters ● TensorFlow Lite for mobile devices ● TensorFlow.js for AI in web browsers TensorFlow models on tensorflow github
  • 13. LEADING COLLABORATION IN THE ARM ECOSYSTEM TensorFlow Developed in-house by the Google Brain team ● Started as DistBelief in 2011 ● Evolved into TensorFlow with its first commit in November 2015 ● V1.0.0 released on Feb 11, 2017 TensorFlow can be built as Support multiple accelerators ● TensorFlow for cloud and datacenters → GPU and TPU ● TensorFlow Lite for mobile devices → Android NNAPI and NN HAL ● TensorFlow.js for AI in web browsers → WebGL TensorFlow models on tensorflow github
  • 14. LEADING COLLABORATION IN THE ARM ECOSYSTEM TensorFlow Developed in-house by the Google Brain team ● Started as DistBelief in 2011 ● Evolved into TensorFlow with its first commit in November 2015 ● V1.0.0 released on Feb 11, 2017 TensorFlow can be built as Support multiple accelerators ● TensorFlow for cloud and datacenters → GPU and TPU ● TensorFlow Lite for mobile devices → Android NNAPI and NN HAL ● TensorFlow.js for AI in web browsers → WebGL TensorFlow models on tensorflow github 31,713 commits 1,624 contributors 1,610,734 lines of code 456 years of effort 1st Commit Nov ‘15
  • 15. LEADING COLLABORATION IN THE ARM ECOSYSTEM From TensorFlow to TensorFlow Lite TensorFlow Lite uses FlatBuffers
  • 16. LEADING COLLABORATION IN THE ARM ECOSYSTEM TensorFlow 1st Commit in November 2015
  • 17.
  • 18. LEADING COLLABORATION IN THE ARM ECOSYSTEM Caffe ● Made with expression, speed, and modularity in mind ● Developed by Berkeley AI Research (BAIR) and by community contributors ○ Yangqing Jia created the project during his PhD at UC Berkeley ○ Caffe is released under the BSD 2-Clause license ● Focus has been vision, but also handles sequences, speech, text ● Tools, reference models, demos, and recipes → Caffe Zoo ● Seamless switch between CPU and GPU caffe.berkeleyvision.org github.com/BVLC/caffe 4,137 commits 314 contributors 76,076 lines of code 19 years of effort 1st commit in Sept‘13 15,000+ forks
  • 19. LEADING COLLABORATION IN THE ARM ECOSYSTEM Caffe2 Caffe2 improves Caffe 1.0 in a series of directions ● First-class support for large-scale distributed training ● Mobile deployment ● New hardware support (in addition to CPU and CUDA) ● Flexibility for future directions such as quantized computation ● Stress tested by the vast scale of Facebook applications ● Examples and pre-trained models available from the Caffe2 Zoo ● Running on mobile devices with Android and iOS ○ Step-by-step tutorial with camera demo ● Caffe1 models do not run with Caffe2 ○ Converter tool available 3,678 commits 332 contributors 275,560 lines of code 73 years of effort 1st commit in June ‘15
  • 20. LEADING COLLABORATION IN THE ARM ECOSYSTEM Caffe2 1st commit in June 2015
  • 21.
  • 22. LEADING COLLABORATION IN THE ARM ECOSYSTEM MxNet MXNet is a multi-language machine learning (ML) library to ease the development of ML algorithms, especially for deep neural networks. MXNet is computation and memory efficient and runs on various heterogeneous systems, ranging from mobile devices to distributed GPU clusters. Currently, MXNet is supported by Intel, Dato, Baidu, Microsoft, Wolfram Research, and research institutions such as Carnegie Mellon, MIT, the University of Washington, and the Hong Kong University of Science and Technology. Gluon API, examples, tutorials and pre-trained models from the Gluon model zoo
  • 23. LEADING COLLABORATION IN THE ARM ECOSYSTEM mxnet 1st Commit in April 2015
  • 24. LEADING COLLABORATION IN THE ARM ECOSYSTEM mxnet 1st Commit in April 2015
  • 25.
  • 26. LEADING COLLABORATION IN THE ARM ECOSYSTEM Deep Learning framework comparison https://www.openhub.net/p/_compare?project_0=MXNet&project_1=caffe2&project_2=TensorFlow
  • 27. LEADING COLLABORATION IN THE ARM ECOSYSTEM https://www.openhub.net/p/_compare?project_0=MXNet&project_1=caffe2&project_2=TensorFlow
  • 28. LEADING COLLABORATION IN THE ARM ECOSYSTEM https://www.openhub.net/p/_compare?project_0=MXNet&project_1=caffe2&project_2=TensorFlow
  • 29. LEADING COLLABORATION IN THE ARM ECOSYSTEM Observations ● Each cloud player has its own deep learning framework ● Each AI framework has its own entire ecosystem of formats, tools, model store ● Each AI framework represents a significant investment ● Scaling and acceleration are fundamental to performance
  • 30. LEADING COLLABORATION IN THE ARM ECOSYSTEM Observations ● Each cloud player has its own deep learning framework ● Each AI framework has its own entire ecosystem of formats, tools, model store ● Each AI framework represents a significant investment ● Scaling and acceleration are fundamental to performance If you want a really cool job like Manjunath, Yangqing or Mu Li…. INVENT A GREAT NEW AI/ML FRAMEWORK
  • 32. LEADING COLLABORATION IN THE ARM ECOSYSTEM Google Edge TPU The Edge TPU is Google's purpose-built ASIC chip designed to run TensorFlow Lite ML inference at the edge ● AIY Edge TPU Dev Board ● AIY Edge TPU Accelerator https://aiyprojects.withgoogle.com/edge-tpu/
  • 33. LEADING COLLABORATION IN THE ARM ECOSYSTEM Arm Mali-G72 Arm Mali-G72 is the second generation Bifrost-based GPU for High Performance products. Benefitting from advanced technologies such as claused shaders and full system coherency, Mali-G72 adds increased tile buffer memory supporting up to 16 x Multi-Sample Anti-Aliasing at minimal performance cost. Arithmetic optimizations tailored to complex Machine Learning and High Fidelity Mobile Gaming use cases provide 25% higher energy efficiency, 20% better performance density and 40% greater overall performance than devices based on previous generation Bifrost GPU. https://developer.arm.com/products/graphics-and-multimedia/mali-gpus/mali-g72-gpu
  • 34. LEADING COLLABORATION IN THE ARM ECOSYSTEM Arm ML processor The Arm Machine Learning processor is an optimized, ground-up design for machine learning acceleration, targeting mobile and adjacent markets: ● optimized fixed-function engines for best-in-class performance ● additional programmable layer engines support the execution of non-convolution layers, and the implementation of selected primitives and operators The network control unit manages the overall execution and traversal of the network and the DMA moves data in and out of the main memory. Onboard memory allows central storage for weights and feature maps https://developer.arm.com/products/processors/machine-learning/arm-ml-processor
  • 35. LEADING COLLABORATION IN THE ARM ECOSYSTEM Arm OD processor ● Detects object in real time with Full HD at 60fps. ● Object sizes from 50x60 pixels to full screen. ● Virtually unlimited objects detected per frame. ● Detailed people model provides rich metadata and allows detection of direction, trajectory, pose and gesture. ● Advanced software running on accompanying application processor allows for higher-level behaviour to be determined, including sophisticated inter-frame tracking. ● Additional software libraries enable higher-level, on-device features, such as face recognition. https://developer.arm.com/products/processors/machine-learning/arm-od-processor
  • 36. LEADING COLLABORATION IN THE ARM ECOSYSTEM Arm NN Arm NN SDK is a set of open-source Linux software and tools that enables machine learning workloads on power-efficient devices. It provides a bridge between existing neural network frameworks and power-efficient Arm Cortex CPUs, Arm Mali GPUs or the Arm Machine Learning processor. Arm NN SDK utilizes the Compute Library to target programmable cores, such as Cortex-A CPUs and Mali GPUs, as efficiently as possible. It includes support for the Arm Machine Learning processor and, via CMSIS-NN, support for Cortex-M CPUs. https://developer.arm.com/products/processors/machine-learning/arm-nn
  • 37. LEADING COLLABORATION IN THE ARM ECOSYSTEM Arm NN https://developer.arm.com/products/processors/machine-learning/arm-nn
  • 38. LEADING COLLABORATION IN THE ARM ECOSYSTEM Qualcomm https://connect.linaro.org/resources/hkg18/hkg18-306/
  • 39. LEADING COLLABORATION IN THE ARM ECOSYSTEM HiSilicon ● 99 operators ● Caffe, TensorFlow, TensorFlow Lite, Huawei HiAI SDK, Android NN ● Converter tools from AI models to serialized offline model https://connect.linaro.org/resources/hkg18/hkg18-302/
  • 40. LEADING COLLABORATION IN THE ARM ECOSYSTEM Mediatek https://www.forbes.com/sites/tiriasresearch/2017/03/31/mediatek-brings-neural-networks-to-devices/#6468bd5f3eac
  • 41. LEADING COLLABORATION IN THE ARM ECOSYSTEM An ecosystem of 3rd parties providing NN IP and tools
  • 42. LEADING COLLABORATION IN THE ARM ECOSYSTEM Observations ● Complete offload vs heterogenous computing ● Shared memory vs sub-system memories and DMA ● Fixed operators and software fallback ● Graph split vs cost of context switch ● Serialized models and converter tools CPU NPU RAM CPU GPU RAM RAM DSP RAM DLA
  • 43. LEADING COLLABORATION IN THE ARM ECOSYSTEM Observations ● Complete offload vs heterogenous computing ● Shared memory vs sub-system memories and DMA ● Fixed operators and software fallback ● Graph split vs cost of context switch ● Serialized models and converter tools ● Forked and accelerated inference engine for each NN IP and each framework → high total cost of ownership → delayed rebases and updates → delayed security fixes
  • 45. Linaro Collaboration Members fund Linaro and drive work through engineering steering committees Member and Linaro engineers collaborate to develop work once, for all Linaro delivers output to members, into open source projects, and into the community Now ~25 members, up from 6 in 2010 Over 300 OSS engineers globally, including 140 Linaro staff Core Members Club Members Group Members Community Members
  • 46. Linaro works Upstream Delivering high value collaboration Top 5 company contributor to Linux and Zephyr kernels Contributor to >70 open source projects; many maintained by Linaro engineers Company 4.8-4.13 Changesets % 1 Intel 10,833 13.1% 2 Red Hat 5,965 7.2% 3 Linaro 4,636 5.6% Source: 2017 Linux Kernel Development Report, Linux Foundation Selected projects Linaro contributes to
  • 47. LEADING COLLABORATION IN THE ARM ECOSYSTEM Open Neural Network Exchange (ONNX) An open source format for AI models An extensible computation graph model Definitions of built-in operators and standard data types Initial focus on inference
  • 48. LEADING COLLABORATION IN THE ARM ECOSYSTEM ONNX Interface for Framework Integration (ONNXIFI) Standardized interface for neural network inference on special-purpose accelerators, CPUs, GPUs, DSPs, and FPGAs Dynamic discovery of available backends and supported ONNX operators Initialize and deinitialize backends Specify memory locations and metadata Run an ONNX graph
  • 49. LEADING COLLABORATION IN THE ARM ECOSYSTEM ONNXIFI API Call Flow
  • 50. LEADING COLLABORATION IN THE ARM ECOSYSTEM https://developer.android.com/ndk/guides/neuralnetworks/ Android NN API
  • 51. LEADING COLLABORATION IN THE ARM ECOSYSTEM https://developer.arm.com/products/processors/machine-learning/arm-nn
  • 52. LEADING COLLABORATION IN THE ARM ECOSYSTEM ● Common model description format and APIs to the runtime ● Common optimized runtime inference engine for Arm-based SoC ● Dynamic plug-in framework to support multiple 3rd party NPU, CPU, GPU, DSP ● CI loops on reference development boards to measure accuracy, performance speed up and regression testing Areas of Collaboration
  • 53. Discussions started last March AI/ML Resources from HKG18 HKG18-417 - OpenCL support by NNVM & TVM HKG18-413 - AI and Machine Learning BoF HKG18-405 - Accelerating Neural Networks with... HKG18-312 - CMSIS-NN HKG18-306 - Overview of Qualcomm SNPE HKG18-304 - Scalable AI server HKG18-302 - Huawei HiAI : Unlock The Future HKG18-200K2 - Keynote: Accelerating AI from Cloud to Edge
  • 54. LEADING COLLABORATION IN THE ARM ECOSYSTEM https://connect.linaro.org/ai-neural-networks-arm-summit/