SlideShare a Scribd company logo
1 of 32
Download to read offline
Koan-Sin Tan,
freedom@computer.org
COSCUP, Aug 1st, 2020
Running TFLite on Your Mobile
Devices
• disclaimer: opinions are my own

• feel free to interrupt me if you have any questions during the presentation

• questions could be Taiwanese, English, or Mandarin
• Used open source before the term “open
source” is used
• A software guy, learned to use Unix and open
source software on VAX-11/780 running 4.3BSD
• Used to be a programming language junkie
• Worked on various system software, e.g., CPU
scheduling and power management of non-
CPU components
• Recently, on NN performance on edge devices
related stuff
• Contributed from time to time to TensorFlow Lite
• started a command line label_image for TFLite
who i am
https://gunkies.org/w/images/c/c1/DEC-VAX-11-780.jpg
VAX 11/780 CPU consists of TTL ICs
https://en.wikipedia.org/wiki/Transistor%E2%80%93transistor_logic https://en.wikipedia.org/wiki/7400-series_integrated_circuits
Why TFLite?
• TensorFlow Lite

• TensorFlow is one of the most popular machine learning frameworks

• TFLite: a lightweight runtime for edge devices

• originally mobile devices —> mobile and IoT/embedded devices

• could be accelerated by GPU, DSP, or ASIC accelerator

• How about PyTorch?

• yes, it is popular, but not on mobile devices yet

• Yes, there are other open source NN frameworks. No one is as comprehensive as TF Lite, as far as I can tell

• See my talk slide deck at COSCUP 2019 for more discussion, https://www.slideshare.net/kstan2/status-
quo-of-tensor-flow-lite-on-edge-devices-coscup-2019
Outline
• Overview of TFLite on Android and iOS devices,

• TFLite metadata and TFLite Android code generator,

• Some new features: CoreML delegate and XNNPACK delegate
What is TensorFlow Lite
• TensorFlow Lite is a cross-platform framework for deploying ML on mobile
devices and embedded systems

• Mobile devices -> mobile and IoT/embedded devices

• TFLite for Android and iOS

• TFLu: TFLite micro for micro-controllers
Why ML on Edge devices
• Low latency & close knit interactions

• “There is an old network saying: Bandwidth problems can be cured with money.
 Latency problems are harder because the speed of light is fixed — you can't bribe
God.” -- David D. Clark,

• network connectivity

• you probably heard “always-on” back from 3G days, you know that’s not true in
the 5G era

• privacy preserving

• sensors
from TF Dev Summit 2020, https://youtu.be/27Zx-4GOQA8
• We’ll talk about

• TFLite metadata and codegen which are in tflite support library

• two delegates which enable using hardware capabilities

• What others you may want to dig into

• quantization, fixed point, integer

• ARM dot product instruction, Apple A13 matrix operations in CPUs (yes, CPU)

• GPU delegate started quantized models couple month ago

• GPUs usually support fp16 first

• new MLIR-based runtimes, such as TFRT and IREE

• I’ll talk a little bit about TFRT tomorrow
So how to start using TFLite
• TFLite actually has two main parts

• interpreter: loads and runs a model on various hardware

• converter: converts TF models to a TFLite specific format to be used by the
interpreter

• see https://www.tensorflow.org/lite/guide for more introduction materials

• There is a good guide on how to load a model and do inference on devices
using TFLite interpreter, in Java, Swift, Objective-C, C++, and Python

• https://www.tensorflow.org/lite/guide/inference
load and run a model in C++
other APIs are wrappers around C++ code
https://www.tensorflow.org/lite/guide/inference
TFLite metadata and TFLite
Android code generator
TFLite Metadata
• before TFLite Metadata was introduced, when we load and run a model

• it’s user’s/developer’s responsibility to figure out what input tensors and output tensors are. E.g.,

• we know image a classifier usually expects preprocessed (resizing, cropping, padding, etc.) and normalized ([0,
1] or [-1, 1]) data

• label file is not included

• in TFLite metadata, there are three parts in the schema:

• Model information - Overall description of the model as well as items such as licence terms.
See ModelMetadata.
• Input information - Description of the inputs and pre-processing required such as normalization.
See SubGraphMetadata.input_tensor_metadata.

• Output information - Description of the output and post-processing required such as mapping to labels.
See SubGraphMetadata.output_tensor_metadata.
https://www.tensorflow.org/lite/convert/metadata
• Supported Input / Output types

• Feature - Numbers which are unsigned integers or float32.

• Image - Metadata currently supports RGB and greyscale images.

• Bounding box - Rectangular shape bounding boxes. The schema supports a
variety of numbering schemes.

• Pack the associated files, e.g.,

• label file(s)

• Normalization and quantization parameters
• With example at https://
www.tensorflow.org/lite/convert/
metadata, we can create a image
classifier with

• image input, and

• label output
https://www.tensorflow.org/lite/convert/metadata
• https://developer.android.com/
studio/preview/features#tensor-
flow-lite-models
CoreML Classifier model
and autogen headers for Objective-C
My exercise to use Android CameraX and TFLite codegen in Kotlin
• To test TFLite metadata and codegen, I need an Android app that can

• grab camera inputs and 

• convert them into Android Bitmap to feed into the generated model
wrapper. 

• Since I know nothing about Android Camera and Kotlin, I started this from the
CameraX tutorial. It seems quite easy.

• https://github.com/freedomtan/CameraxTFLite
https://github.com/freedomtan/CameraxTFLite/blob/master/my_classify_wrapper/myclassifiermodel.md
https://github.com/freedomtan/CameraxTFLite/blob/master/app/src/main/java/com/mediatek/cameraxtflite/MainActivity.kt#L182-L215
screenshot of the simple app
Other new things
What is a TFLite delegate?
• “A TensorFlow Lite delegate is a way to delegate part or all of graph execution to another executor.”

• Why delegates?

• running computation-intensive NN models on mobile devices is resource demanding for mobile CPUs,
processing power and energy consumption could be problems

• and matrix-multiplication which is there core of convolution and fully connected ops is highly parallel

• Thus, some devices have hardware accelerators, such as GPU or DSP, that provide better performance
and higher energy efficiency thru Android NNAPI

• To use NNAPI, TFLite has an NNAPI delegate from the very beginning. Then, there are GPU delegates
(GL ES, OpenCL, and Metal for now. Vulkan one is coming) and others.

• my COSCUP 2019 slide deck on how NNAPI and GPU delegates work , https://www.slideshare.net/
kstan2/tflite-nnapi-and-gpu-delegates
XNNPACK and CoreML Delegates
• “XNNPACK is a highly optimized library of floating-point neural network inference operators for ARM,
WebAssembly, and x86 platforms.” 

• “XNNPACK is not intended for direct use by deep learning practitioners and researchers; instead it
provides low-level performance primitives for accelerating high-level machine learning frameworks, such
as TensorFlow Lite, TensorFlow.js, PyTorch, and MediaPipe.", https://github.com/google/XNNPACK

• NNPACK —> QNNPACK —> XNNPACK

• In TFLite, there is a XNNPACK delegate

• CoreML is Apple’s machine learning framework

• the only formal way to use Neural Engine, Apple’s NN accelerator started from A11

• nope, CoreML cannot use A11 Neural Engine, https://www.slideshare.net/kstan2/why-you-cannot-
use-neural-engine-to-run-your-nn-models-on-a11-devices
• convolution is at the core of current
neural network models

• How convolution is implemented either
in SW or HW

• “direct convolution”: 6- or 7-layer
nested for loops,

• im2col, then GEMM,

• other transforms, e.g., Winograd

• XNNPACK found a way to efficiently
reuse GEMM
XNNPACK
https://arxiv.org/pdf/1907.02129.pdf
Using XNNPACK in label_image.cc
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/label_image/
label_image.cc#L109-L116
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/evaluation/utils.h#L64-L88
Using CoreML delegate
https://github.com/freedomtan/glDelegateBenchmark/blob/master/glDelegateBenchmark/ViewController.mm#L61-L68
model name CPU 1 thread (ms) CPU 2 threads (ms) GPU (ms) CoreML Delegate (ms) [4]
Mobilenet V1 1.0 224 26.54 18.21 10.91 2.03
PoseNet 34.14 23.62 16.75 3.34
DeepLab V3 (257x257) 39.65 29.87 20.43 9.10
Mobilnet V2 SSD COCO 44.94 34.05 19.73 11.54
On iPhone 11 Pro, I got
Concluding remarks
• TFLite is getting more and more mature and comprehensive

• If you haven’t started using it, you may want to start with TFLite metadata and
Android code generators

• nope, there is no iOS code generator (yet)

• To speed up execution of NN models, use TFL delegates

• note that not all accelerators are created equal

• some are fp only; some are int/quant only
Fin
A13 AMX (Apple Matrix Extension?)
sgemm and dgemm in BLAS
• For (2048x4096) x (40x96x4096) matrix multiplication
• sgemm (32-bit floating point) speed: A13 > My MBP >> A12 > A11
• dgemm (64-bit floating point) speed: My MBP > A13 >> A12 > A11

More Related Content

What's hot

Pre trained language model
Pre trained language modelPre trained language model
Pre trained language modelJiWenKim
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowDatabricks
 
Open Device Programmability: Hands-on Intro to RESTCONF (and a bit of NETCONF)
Open Device Programmability: Hands-on Intro to RESTCONF (and a bit of NETCONF)Open Device Programmability: Hands-on Intro to RESTCONF (and a bit of NETCONF)
Open Device Programmability: Hands-on Intro to RESTCONF (and a bit of NETCONF)Cisco DevNet
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...Edge AI and Vision Alliance
 
Java tricks for high-load server programming
Java tricks for high-load server programmingJava tricks for high-load server programming
Java tricks for high-load server programmingAndrei Pangin
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Introduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowIntroduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowPaolo Tomeo
 
Onnx and onnx runtime
Onnx and onnx runtimeOnnx and onnx runtime
Onnx and onnx runtimeVishwas N
 
⼤語⾔模型 LLM 應⽤開發入⾨
⼤語⾔模型 LLM 應⽤開發入⾨⼤語⾔模型 LLM 應⽤開發入⾨
⼤語⾔模型 LLM 應⽤開發入⾨Wen-Tien Chang
 
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an OverviewNatural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overviewalessio_ferrari
 
RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...NTT Software Innovation Center
 
Conversational AI with Transformer Models
Conversational AI with Transformer ModelsConversational AI with Transformer Models
Conversational AI with Transformer ModelsDatabricks
 
Gemini Introduction
Gemini IntroductionGemini Introduction
Gemini IntroductionLynn Langit
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
 
A Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowA Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowKoan-Sin Tan
 
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...Bluechip Technologies
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...Chris Fregly
 
Overview of kubernetes network functions
Overview of kubernetes network functionsOverview of kubernetes network functions
Overview of kubernetes network functionsHungWei Chiu
 

What's hot (20)

Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflow
 
Open Device Programmability: Hands-on Intro to RESTCONF (and a bit of NETCONF)
Open Device Programmability: Hands-on Intro to RESTCONF (and a bit of NETCONF)Open Device Programmability: Hands-on Intro to RESTCONF (and a bit of NETCONF)
Open Device Programmability: Hands-on Intro to RESTCONF (and a bit of NETCONF)
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
 
Java tricks for high-load server programming
Java tricks for high-load server programmingJava tricks for high-load server programming
Java tricks for high-load server programming
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Introduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowIntroduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlow
 
Onnx and onnx runtime
Onnx and onnx runtimeOnnx and onnx runtime
Onnx and onnx runtime
 
⼤語⾔模型 LLM 應⽤開發入⾨
⼤語⾔模型 LLM 應⽤開發入⾨⼤語⾔模型 LLM 應⽤開發入⾨
⼤語⾔模型 LLM 應⽤開發入⾨
 
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an OverviewNatural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
 
RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...
 
Conversational AI with Transformer Models
Conversational AI with Transformer ModelsConversational AI with Transformer Models
Conversational AI with Transformer Models
 
A Peek into TFRT
A Peek into TFRTA Peek into TFRT
A Peek into TFRT
 
Gemini Introduction
Gemini IntroductionGemini Introduction
Gemini Introduction
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
A Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowA Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlow
 
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
HuggingFace AI - Hugging Face lets users create interactive, in-browser demos...
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
 
Overview of kubernetes network functions
Overview of kubernetes network functionsOverview of kubernetes network functions
Overview of kubernetes network functions
 
L4 Microkernel :: Design Overview
L4 Microkernel :: Design OverviewL4 Microkernel :: Design Overview
L4 Microkernel :: Design Overview
 

Similar to Running TFLite on Your Mobile Devices, 2020

Machine learning and Deep learning on edge devices using TensorFlow
Machine learning and Deep learning on edge devices using TensorFlowMachine learning and Deep learning on edge devices using TensorFlow
Machine learning and Deep learning on edge devices using TensorFlowAditya Bhattacharya
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on AndroidKoan-Sin Tan
 
Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?GetInData
 
TEE - kernel support is now upstream. What this means for open source security
TEE - kernel support is now upstream. What this means for open source securityTEE - kernel support is now upstream. What this means for open source security
TEE - kernel support is now upstream. What this means for open source securityLinaro
 
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...Advanced Internet of Things firmware engineering with Thingsquare and Contiki...
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...Adam Dunkels
 
The Flink - Apache Bigtop integration
The Flink - Apache Bigtop integrationThe Flink - Apache Bigtop integration
The Flink - Apache Bigtop integrationMárton Balassi
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyftmarkgrover
 
Scale machine learning deployment
Scale machine learning deploymentScale machine learning deployment
Scale machine learning deploymentGang Tao
 
Bit_Bucket_x31_Final
Bit_Bucket_x31_FinalBit_Bucket_x31_Final
Bit_Bucket_x31_FinalSam Knutson
 
TLDK - FD.io Sept 2016
TLDK - FD.io Sept 2016 TLDK - FD.io Sept 2016
TLDK - FD.io Sept 2016 Benoit Hudzia
 
It's a Jungle Out There – IoT and MRuby
It's a Jungle Out There – IoT and MRubyIt's a Jungle Out There – IoT and MRuby
It's a Jungle Out There – IoT and MRubymatustomlein
 

Similar to Running TFLite on Your Mobile Devices, 2020 (20)

Machine learning and Deep learning on edge devices using TensorFlow
Machine learning and Deep learning on edge devices using TensorFlowMachine learning and Deep learning on edge devices using TensorFlow
Machine learning and Deep learning on edge devices using TensorFlow
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on Android
 
Stackato v6
Stackato v6Stackato v6
Stackato v6
 
Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?
 
TEE - kernel support is now upstream. What this means for open source security
TEE - kernel support is now upstream. What this means for open source securityTEE - kernel support is now upstream. What this means for open source security
TEE - kernel support is now upstream. What this means for open source security
 
Smart Object Architecture
Smart Object ArchitectureSmart Object Architecture
Smart Object Architecture
 
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...Advanced Internet of Things firmware engineering with Thingsquare and Contiki...
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...
 
Stackato v4
Stackato v4Stackato v4
Stackato v4
 
Opnet tutorial
Opnet tutorialOpnet tutorial
Opnet tutorial
 
The Flink - Apache Bigtop integration
The Flink - Apache Bigtop integrationThe Flink - Apache Bigtop integration
The Flink - Apache Bigtop integration
 
gcdtmp
gcdtmpgcdtmp
gcdtmp
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Stackato
StackatoStackato
Stackato
 
Stackato v5
Stackato v5Stackato v5
Stackato v5
 
An Introduction to OMNeT++ 5.1
An Introduction to OMNeT++ 5.1An Introduction to OMNeT++ 5.1
An Introduction to OMNeT++ 5.1
 
Scale machine learning deployment
Scale machine learning deploymentScale machine learning deployment
Scale machine learning deployment
 
Bit_Bucket_x31_Final
Bit_Bucket_x31_FinalBit_Bucket_x31_Final
Bit_Bucket_x31_Final
 
TLDK - FD.io Sept 2016
TLDK - FD.io Sept 2016 TLDK - FD.io Sept 2016
TLDK - FD.io Sept 2016
 
It's a Jungle Out There – IoT and MRuby
It's a Jungle Out There – IoT and MRubyIt's a Jungle Out There – IoT and MRuby
It's a Jungle Out There – IoT and MRuby
 

More from Koan-Sin Tan

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on androidKoan-Sin Tan
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsKoan-Sin Tan
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolKoan-Sin Tan
 
A Peek into Google's Edge TPU
A Peek into Google's Edge TPUA Peek into Google's Edge TPU
A Peek into Google's Edge TPUKoan-Sin Tan
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Koan-Sin Tan
 
open source nn frameworks on cellphones
open source nn frameworks on cellphonesopen source nn frameworks on cellphones
open source nn frameworks on cellphonesKoan-Sin Tan
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016Koan-Sin Tan
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserKoan-Sin Tan
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchKoan-Sin Tan
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android BenchmarksKoan-Sin Tan
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsKoan-Sin Tan
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08Koan-Sin Tan
 

More from Koan-Sin Tan (13)

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source Tools
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
 
A Peek into Google's Edge TPU
A Peek into Google's Edge TPUA Peek into Google's Edge TPU
A Peek into Google's Edge TPU
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
 
open source nn frameworks on cellphones
open source nn frameworks on cellphonesopen source nn frameworks on cellphones
open source nn frameworks on cellphones
 
Caffe2 on Android
Caffe2 on AndroidCaffe2 on Android
Caffe2 on Android
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk User
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of Smartwatch
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android Benchmarks
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08
 

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Running TFLite on Your Mobile Devices, 2020

  • 1. Koan-Sin Tan, freedom@computer.org COSCUP, Aug 1st, 2020 Running TFLite on Your Mobile Devices
  • 2. • disclaimer: opinions are my own • feel free to interrupt me if you have any questions during the presentation • questions could be Taiwanese, English, or Mandarin
  • 3. • Used open source before the term “open source” is used • A software guy, learned to use Unix and open source software on VAX-11/780 running 4.3BSD • Used to be a programming language junkie • Worked on various system software, e.g., CPU scheduling and power management of non- CPU components • Recently, on NN performance on edge devices related stuff • Contributed from time to time to TensorFlow Lite • started a command line label_image for TFLite who i am https://gunkies.org/w/images/c/c1/DEC-VAX-11-780.jpg
  • 4. VAX 11/780 CPU consists of TTL ICs https://en.wikipedia.org/wiki/Transistor%E2%80%93transistor_logic https://en.wikipedia.org/wiki/7400-series_integrated_circuits
  • 5. Why TFLite? • TensorFlow Lite • TensorFlow is one of the most popular machine learning frameworks • TFLite: a lightweight runtime for edge devices • originally mobile devices —> mobile and IoT/embedded devices • could be accelerated by GPU, DSP, or ASIC accelerator • How about PyTorch? • yes, it is popular, but not on mobile devices yet • Yes, there are other open source NN frameworks. No one is as comprehensive as TF Lite, as far as I can tell • See my talk slide deck at COSCUP 2019 for more discussion, https://www.slideshare.net/kstan2/status- quo-of-tensor-flow-lite-on-edge-devices-coscup-2019
  • 6. Outline • Overview of TFLite on Android and iOS devices, • TFLite metadata and TFLite Android code generator, • Some new features: CoreML delegate and XNNPACK delegate
  • 7. What is TensorFlow Lite • TensorFlow Lite is a cross-platform framework for deploying ML on mobile devices and embedded systems • Mobile devices -> mobile and IoT/embedded devices • TFLite for Android and iOS • TFLu: TFLite micro for micro-controllers
  • 8. Why ML on Edge devices • Low latency & close knit interactions • “There is an old network saying: Bandwidth problems can be cured with money.  Latency problems are harder because the speed of light is fixed — you can't bribe God.” -- David D. Clark, • network connectivity • you probably heard “always-on” back from 3G days, you know that’s not true in the 5G era • privacy preserving • sensors
  • 9. from TF Dev Summit 2020, https://youtu.be/27Zx-4GOQA8
  • 10. • We’ll talk about • TFLite metadata and codegen which are in tflite support library • two delegates which enable using hardware capabilities • What others you may want to dig into • quantization, fixed point, integer • ARM dot product instruction, Apple A13 matrix operations in CPUs (yes, CPU) • GPU delegate started quantized models couple month ago • GPUs usually support fp16 first • new MLIR-based runtimes, such as TFRT and IREE • I’ll talk a little bit about TFRT tomorrow
  • 11. So how to start using TFLite • TFLite actually has two main parts • interpreter: loads and runs a model on various hardware • converter: converts TF models to a TFLite specific format to be used by the interpreter • see https://www.tensorflow.org/lite/guide for more introduction materials • There is a good guide on how to load a model and do inference on devices using TFLite interpreter, in Java, Swift, Objective-C, C++, and Python • https://www.tensorflow.org/lite/guide/inference
  • 12. load and run a model in C++ other APIs are wrappers around C++ code https://www.tensorflow.org/lite/guide/inference
  • 13. TFLite metadata and TFLite Android code generator
  • 14. TFLite Metadata • before TFLite Metadata was introduced, when we load and run a model • it’s user’s/developer’s responsibility to figure out what input tensors and output tensors are. E.g., • we know image a classifier usually expects preprocessed (resizing, cropping, padding, etc.) and normalized ([0, 1] or [-1, 1]) data • label file is not included • in TFLite metadata, there are three parts in the schema: • Model information - Overall description of the model as well as items such as licence terms. See ModelMetadata. • Input information - Description of the inputs and pre-processing required such as normalization. See SubGraphMetadata.input_tensor_metadata. • Output information - Description of the output and post-processing required such as mapping to labels. See SubGraphMetadata.output_tensor_metadata. https://www.tensorflow.org/lite/convert/metadata
  • 15. • Supported Input / Output types • Feature - Numbers which are unsigned integers or float32. • Image - Metadata currently supports RGB and greyscale images. • Bounding box - Rectangular shape bounding boxes. The schema supports a variety of numbering schemes. • Pack the associated files, e.g., • label file(s) • Normalization and quantization parameters
  • 16. • With example at https:// www.tensorflow.org/lite/convert/ metadata, we can create a image classifier with • image input, and • label output https://www.tensorflow.org/lite/convert/metadata
  • 18. CoreML Classifier model and autogen headers for Objective-C
  • 19. My exercise to use Android CameraX and TFLite codegen in Kotlin • To test TFLite metadata and codegen, I need an Android app that can • grab camera inputs and • convert them into Android Bitmap to feed into the generated model wrapper. • Since I know nothing about Android Camera and Kotlin, I started this from the CameraX tutorial. It seems quite easy. • https://github.com/freedomtan/CameraxTFLite
  • 22. screenshot of the simple app
  • 24. What is a TFLite delegate? • “A TensorFlow Lite delegate is a way to delegate part or all of graph execution to another executor.” • Why delegates? • running computation-intensive NN models on mobile devices is resource demanding for mobile CPUs, processing power and energy consumption could be problems • and matrix-multiplication which is there core of convolution and fully connected ops is highly parallel • Thus, some devices have hardware accelerators, such as GPU or DSP, that provide better performance and higher energy efficiency thru Android NNAPI • To use NNAPI, TFLite has an NNAPI delegate from the very beginning. Then, there are GPU delegates (GL ES, OpenCL, and Metal for now. Vulkan one is coming) and others. • my COSCUP 2019 slide deck on how NNAPI and GPU delegates work , https://www.slideshare.net/ kstan2/tflite-nnapi-and-gpu-delegates
  • 25. XNNPACK and CoreML Delegates • “XNNPACK is a highly optimized library of floating-point neural network inference operators for ARM, WebAssembly, and x86 platforms.” • “XNNPACK is not intended for direct use by deep learning practitioners and researchers; instead it provides low-level performance primitives for accelerating high-level machine learning frameworks, such as TensorFlow Lite, TensorFlow.js, PyTorch, and MediaPipe.", https://github.com/google/XNNPACK • NNPACK —> QNNPACK —> XNNPACK • In TFLite, there is a XNNPACK delegate • CoreML is Apple’s machine learning framework • the only formal way to use Neural Engine, Apple’s NN accelerator started from A11 • nope, CoreML cannot use A11 Neural Engine, https://www.slideshare.net/kstan2/why-you-cannot- use-neural-engine-to-run-your-nn-models-on-a11-devices
  • 26. • convolution is at the core of current neural network models • How convolution is implemented either in SW or HW • “direct convolution”: 6- or 7-layer nested for loops, • im2col, then GEMM, • other transforms, e.g., Winograd • XNNPACK found a way to efficiently reuse GEMM XNNPACK https://arxiv.org/pdf/1907.02129.pdf
  • 27. Using XNNPACK in label_image.cc https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/label_image/ label_image.cc#L109-L116 https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/evaluation/utils.h#L64-L88
  • 28. Using CoreML delegate https://github.com/freedomtan/glDelegateBenchmark/blob/master/glDelegateBenchmark/ViewController.mm#L61-L68 model name CPU 1 thread (ms) CPU 2 threads (ms) GPU (ms) CoreML Delegate (ms) [4] Mobilenet V1 1.0 224 26.54 18.21 10.91 2.03 PoseNet 34.14 23.62 16.75 3.34 DeepLab V3 (257x257) 39.65 29.87 20.43 9.10 Mobilnet V2 SSD COCO 44.94 34.05 19.73 11.54 On iPhone 11 Pro, I got
  • 29. Concluding remarks • TFLite is getting more and more mature and comprehensive • If you haven’t started using it, you may want to start with TFLite metadata and Android code generators • nope, there is no iOS code generator (yet) • To speed up execution of NN models, use TFL delegates • note that not all accelerators are created equal • some are fp only; some are int/quant only
  • 30. Fin
  • 31. A13 AMX (Apple Matrix Extension?)
  • 32. sgemm and dgemm in BLAS • For (2048x4096) x (40x96x4096) matrix multiplication • sgemm (32-bit floating point) speed: A13 > My MBP >> A12 > A11 • dgemm (64-bit floating point) speed: My MBP > A13 >> A12 > A11