More Related Content Similar to “Deep Learning on Mobile Devices,” a Presentation from Siddha Ganju (20) More from Edge AI and Vision Alliance (20) “Deep Learning on Mobile Devices,” a Presentation from Siddha Ganju1. © 2020 Your Company Name
Deep Learning
On Mobile
A Practitioner’s Guide
Siddha Ganju
September 2020
2. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/ 2
3. © 2020 Your Company Name
Deep Learning
On Mobile
A Practitioner’s Guide
Siddha Ganju
September 2020
4. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/ 4
@SiddhaGanju
@MeherKasam
@AnirudhKoul
5. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Why Deep Learning on Mobile?
5
Privacy Reliability Cost Latency
6. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Latency Is Expensive!
6
[Amazon 2008]
100
ms
1%
loss
7. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Latency Is Expensive!
8
[Google Research, Webpagetest.org]
>3 sec
Load time
53%
Bounce
Mobile Site Visits
8. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/ 9
9. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Power of 10
10
[Miller 1968; Card et al. 1991; Nielsen 1993]
0.1s
Seamless
1s
Uninterrupted flow
of thought
10s
Limit of attention
10. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/ 11
High Quality Dataset Hardware
Efficient Mobile
Inference Engine
Efficient Model DL App+ + + =
11. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
How do I train my model?
12
12. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Learn to
Play Melodica
13
3 Months
13. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/ 15
Already
Play Piano?
14. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
FINE TUNE
Your skills
16
3 1
Months Week
15. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Fine tuning
17
Assemble
a dataset
Find a pre-
trained model
Fine-tune a pre-
trained model
Run using existing
frameworks
Don’t Be A Hero
— Andrej Karpathy
16. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
CustomVision.ai
18
Use Fatkun Browser Extension to download images from Search Engine,
or use Bing Image Search API to programmatically download photos with proper rights
Upload Images
Bring your own labeled images, or
use custom vision to quickly add tags
to any unlabeled images
Train
Use your labeled images to teach
custom vision the concepts you care
about
Evaluate
Use simple REST API calls to
quickly tag images with your new
custom computer vision model
89% 93% 91%
19. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
How do I run my models?
21
20. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Core ML TF Lite ML Kit
21. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Apple Ecosystem
23
Metal BNNS + MPS Core ML Core ML 2 Core ML 3
2014 2016 2017 2018 2019
• Tiny models (~ KB)!
• 1-bit model quantization support
• Batch API for improved performance
• Conversion support for MXNet, ONNX
• tf-coreml
22. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Apple Ecosystem
24
Metal BNNS + MPS Core ML Core ML 2 Core ML 3
2014 2016 2017 2018 2019
• On-device training
• Personalization
• Create ML UI
23. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Core ML Benchmark
25
Core ML Runtime Speed by OS
0
20
40
60
80
100
120
iOS 11 iOS 12
Relative speed across devices
https://heartbeat.fritz.ai/ios-12-core-ml-benchmarks-b7a79811aac1
Execution Time (MS) ON Apple Devices
727
538
129
637
557
109114
77
44
90 74
35
78 71
3328 26 1824 20 14
0
100
200
300
400
500
600
700
800
Inception v3 Resnet-50 MobileNet
iPhone 5s (2013) iPhone 6 (2014) iPhone 6s (2015) iPhone 7 (2016)
iPhone X (2017) iPhone XS (2018) iPhone 11 Pro (2019)
GPUs became a
thing here!
24. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
TensorFlow Ecosystem
26
TensorFlow TensorFlow Mobile TensorFlow Lite
2015 2016 2018
Smaller Faster Minimal dependencies
Allows running custom
operators
25. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
TensorFlow Lite is small
27
1.5MB
TensorFlow
Mobile
300KB
Core Interpreter +
Supported Operations
26. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
TensorFlow Lite is Fast
28
Takes advantage of on-
device hardware
acceleration
Static memory and
static execution plan
• Decreases
load time
FlatBuffers
• Reduces code footprint,
memory usage
• Reduces CPU cycles on
serialization and
deserialization
• Improves
startup time
Pre-fused activations
• Combining batch
normalization layer
with previous
convolution
27. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
TensorFlow Ecosystem
29
Smaller Faster Minimal dependencies
Allows running custom
operators
TensorFlow TensorFlow Mobile TensorFlow Lite
2015 2016 2018
28. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
TensorFlow Ecosystem
30
$ tflite_convert --keras_model_file = keras_model.h5 --output_file=foo.tflite
TensorFlow TensorFlow Mobile TensorFlow Lite
2015 2016 2018
29. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
TensorFlow Ecosystem
31
TensorFlow TensorFlow Mobile TensorFlow Lite
2015 2016 2018
Trained TensorFlow
Model
TF Lite Converter .tflite model
Android App
iOS App
30. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
ML Kit
32
var vision = Vision.vision()
let faceDetector = vision.faceDetector(options: options)
let image = VisionImage(image: uiImage)
faceDetector.process(visionImage) { // callback }
Easy to use Abstraction over
TensorFlow Lite
Built-in APIs for image
labeling, OCR, face
detection, barcode
scanning, landmark
detection, smart reply
Model management
with Firebase
A/B testing
31. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
How do I keep
my IP safe?
33
32. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Fritz
Full fledged mobile lifecycle support
Deployment, instrumentation, etc. from Python
Image
Labeling
Pose
Estimation
Image
Segmentation
Analytics +
Monitoring
Object
Detection
Model
Management
Style
Transfer
Model
Protection
34
33. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/ 36
Does my model make me look fat?
Apple does not allow
apps over 200 MB to
be downloaded over
cellular network
Download on
demand, and
interpret on device
instead
34. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
What effect does hardware
have on performance?
37
35. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Big things come in small packages
38
36. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Effect of Hardware
39
L-R: iPhone XS, iPhone
X, iPhone 5
https://twitter.com/matthieurouif/status/1126575118812110854?s=11
37. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
TensorFlow Lite benchmarks
40
Alpha Lab releases Numericcal: http://alpha.lab.numericcal.com/
38. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
TensorFlow Lite benchmarks
41
Crowdsourcing AI Benchmark App by Andrey Ignatov from ETH Zurich. http://ai-benchmark.com/
39. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Alchemy by Fritz
42
https://alchemy.fritz.ai/
Python library to analyze and estimate
mobile performance
No need to deploy on mobile
40. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Which devices should I support?
43
No need to deploy on mobile
To get 95% device coverage,
support phones released in the last
4 years
For unsupported phones, offer
graceful degradation (lower frame
rate, cloud inference, etc.)
41. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Could all of this result in heavy energy use?
44
42. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Energy considerations
45
You don’t usually run AI models constantly; you run it for a
few seconds
Bigger question — do you really need to run it at 30 FPS? Could it be run at 1 FPS?
With a modern flagship phone, running MobileNet at 30 FPS should burn battery in 2–
3 hours
43. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Energy reduction from 30 FPS to 1 FPS
46
Percentage GPU utilization with varying frames per second
iPad Pro 2017
0
20
40
60
0 5 10 15 20 25
PercentageGPUUtilization
Frames Per Second
SqueezeNet MobileNet ResNet-50
30
44. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
What exciting applications can
I build?
47
45. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Seeing AI
48
Aim: Help blind users identify products using barcode
Issue: Blind users don’t know where the barcode is
Solution: Guide user in finding a barcode with audio cues
Audible Barcode recognition
46. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/ 49
AR Hand Puppets,
Hart Woolery from
2020CV, Object
Detection (Hand) + Key
Point Estimation
[https://twitter.com/2020cv_inc/status/1093219359676280832]
47. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/ 50
Object Detection (Ball,
Hoop, Player) + Body
Pose + Perspective
Transformation
[HomeCourt.ai]
48. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
https://twitter.com/smashfactory/status/1139461813710442496
Remove objects
51
Brian Schulman, Adventurous Co.
Object Segmentation + Image Inpainting
49. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
https://twitter.com/braddwyer/status/910030265006923776
Magic Sudoku App
52
Edge Detection + Classification + AR Kit
50. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Snapchat
53
Face Swap GANs
51. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Can I make my model even
more efficient?
54
52. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
How To Find Efficient Pre-Trained Models
55
Papers with Code
https://paperswithcode.com/sota
Model Zoo
https://modelzoo.co
53. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Model Pruning
57
Song Han, Jeff Pool, John Tran, William J. Dally, "Learning both Weights and Connections for Efficient Neural Networks", 2015
Aim: Remove all connections with absolute weights below a threshold
54. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Pruning in Keras
58
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
prune.Prune(tf.keras.layers.Dense(512, activation=tf.nn.relu)),
tf.keras.layers.Dropout(0.2),
prune.Prune(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
])
55. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
So many techniques — So little time!
60
Channel pruning01
Model quantization02
ThiNet (Filter pruning)03
Weight sharing04
Automatic Mixed Precision05
Network distillation06
56. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Pocket Flow – 1 Line to Make a Model Efficient
61
Tencent AI Labs created an Automatic Model Compression (AutoMC) framework
57. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
AutoML – Let AI Design an Efficient Arch
64
• Neural Architecture Search (NAS) — An automated
approach for designing models using reinforcement
learning while maximizing accuracy.
• Hardware Aware NAS = Maximizes accuracy while
minimizing run-time on device
• Incorporates latency information into the reward
objective function
• Measure real-world inference latency by executing on a
particular platform
• 1.5x faster than MobileNetV2 (MnasNet)
• ResNet-50 accuracy with 19x less parameters
• SSD300 mAP with 35x fewer FLOPs
58. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Evolution of Mobile NAS Methods
65
Method Top-1 Acc (%) Pixel-1 Runtime
Search Cost
(GPU Hours)
MobileNetV1 70.6 113 Manual
MobileNetV2 72.0 75 Manual
MnasNet 74.0 76 40,000 (4 years+)
ProxylessNas-R 74.6 78 200
Single-Path NAS 74.9 79.5 3.75 hours
59. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
ProxylessNAS – Per Hardware Tuned CNNs
66
Han Cai and Ligeng Zhu and Song Han, "ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware", ICLR 2019
60. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Can I improve my model without
accessing user data?
68
61. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
On-Device Training in Core ML
• Core ML 3 introduced on device learning
• Never have to send training data to the server with the help of MLUpdateTask
• Schedule training when device is charging to save power
69
let updateTask = try MLUpdateTask(
forModelAt: modelUrl,
trainingData: trainingData,
configuration: configuration,
completionHandler: { [weak self]
self.model = context.model context.model.write(to: newModelUrl)
})
62. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Federated Learning!!!
70
https://federated.withgoogle.com/
63. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
TensorFlow Federated
71
https://github.com/tensorflow/federated
Train a global model using 1000s
of devices without access to data
Encryption + Secure Aggregation
Protocol
Can take a few days to wait for
aggregations to build up
64. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
Mobile AI Development Lifecycle
72
01
Collect Data
Label Data
Train Model
Convert ModelOptimize Performance
Deploy
Monitor 02
03
0405
06
07
65. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
What we learned today
73
Why deep learning on mobile?01 Benchmarking05
Building a model02 State-of-the-art applications06
Running a model03 Making a model more efficient07
Hardware factors04 Federated learning08
66. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
How do I access the slides instantly?
74
@PracticalDLBook
http://PracticalDeepLearning.ai
Icons credit - Orion Icon Library https://orioniconlibrary.com
67. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/ 75
@SiddhaGanju
@MeherKasam
@AnirudhKoul
68. © 2020 Practical Deep Learning for Cloud, Mobile and Edge; http://practicaldeeplearning.ai/
That’s all, folks!
76