SlideShare a Scribd company logo
1 of 12
Download to read offline
Caffe2 on Android
Koan-Sin Tan

freedom@computer.org

March 8th, 2018

Hsinchu Coding Serfs Meeting
Quick Intro
• Caffe 2

• 2nd generation of Caffe, which was the most popular deep learning
framework (before TensorFlow) from Berkeley

• What's the difference? Caffe2 improves Caffe 1.0 in a series of directions:

• first-class support for large-scale distributed training

• mobile deployment
• new hardware support (in addition to CPU and CUDA)

• flexibility for future directions such as quantized computation

• stress tested by the vast scale of Facebook applications
https://caffe2.ai/docs/caffe-migration.html
Caffe2 on Android
• Official Android demo

• https://caffe2.ai/docs/AI-Camera-demo-android.html, https://github.com/caffe2/
AICamera

• SqueezeNet 1.1:

• 5.8/5.7 fps on Samsung S7 and Google Pixel

• not very impressive

• OpenGL backend

• https://www.facebook.com/Caffe2AI/videos/126340488008269/

• up to 6X speedup (24 FPS) compared to CPU on high-end Android devices (e.g.
Galaxy S8) for style transfer models
https://trends.google.com/trends/explore?q=tensorflow,caffe2
• Tensorflow Lite is also looking for the possibility of
OpenGL ES backend

• https://github.com/tensorflow/tensorflow/issues/16189
What can we use on
Android now
https://github.com/caffe2/caffe2/tree/master/caffe2/mobile/contrib
Caffe2 backends for
Android I know
• ARM CPU:

• NNPACK, Eigen: quite mature

• OpenGL ES:

• OpenGL: not actively maintained (?)

• ARM Compute Library (GL ES part): newly added, still growing

• NEON, and OpenCL

• NNAPI: not fully integrated yet.
How to build
• > scripts/build_android.sh
• With that, no test command line binary test

• Caffe 2 has some tests and a simple command line benchmark tool
called speed_benchmark
> scripts/build_android.sh -DBUILD_TEST -DBUILD_BINARY

• then we can get build_android/bin/speed_benchmark and
other test binaries

• Pytorch has a good tutorial on using it, http://pytorch.org/tutorials/
advanced/super_resolution_with_caffe2.html
Some results
• > ./speed_benchmark --input_file input.blobproto --input
data --init_net init_net.pb --net predict_net.pb --
caffe2_log_level=0

01-06 23:15:42.073 32623 32623 I native : [I net_simple.cc:101] Starting benchmark.
01-06 23:15:42.074 32623 32623 I native : [I net_simple.cc:102] Running warmup runs.
01-06 23:15:42.074 32623 32623 I native : [I net_simple.cc:112] Main runs.
01-06 23:15:43.805 32623 32623 I native : [I net_simple.cc:123] Main run finished. Milliseconds per iter:
173.15. Iters per second: 5.77535
Some results
• ARM Compute Library backend: Caffe2 addend a Compute Libarry backend on in the end of Februrary 2018. With some tweaks, it's
possible to run SqueezeNet 1.1 faster than CPU (NNPAC) with OpenGL

01-04 03:41:38.297 25523 25523 I native : [I gl_model_test.h:52] [C2DEBUG] Benchmarking OpenGL Net

01-04 03:41:38.297 25523 25523 I native : [I net_gl.cc:104] Starting benchmark.

01-04 03:41:38.297 25523 25523 I native : [I net_gl.cc:105] Running warmup runs.

01-04 03:41:38.796 25523 25523 I native : [I net_gl.cc:121] Main runs.

01-04 03:41:43.107 25523 25523 I native : [I net_gl.cc:134] [C2DEBUG] Main run finished. Milliseconds per iter: 43.1077. Iters per
second: 23.1977

01-04 03:41:43.110 25523 25523 I native : [I gl_model_test.h:66] [C2DEBUG] Benchmarking CPU Net

01-04 03:41:43.110 25523 25523 I native : [I net_simple.cc:101] Starting benchmark.

01-04 03:41:43.110 25523 25523 I native : [I net_simple.cc:102] Running warmup runs.

01-04 03:41:43.768 25523 25523 I native : [I net_simple.cc:112] Main runs.

01-04 03:41:50.229 25523 25523 I native : [I net_simple.cc:123] Main run finished. Milliseconds per iter: 64.6136. Iters per
second: 15.4766
Comparing with TF Lite
• cmake is easier than bazel :-)

• Relatively large, or say comprehensive. If you want to enable something like on-device learning. It's
easier to start with TFLite.

• binary could be large

• Code looks cleaning

• Review process, or say, software engineering not as rigid as TensorFlow

• TF has a larger team (?)

• See, https://www.oreilly.com/ideas/how-the-tensorflow-team-handles-open-source-support

• Some interesting code,

• The Observer design pattern could be used to measure performance, https://en.wikipedia.org/wiki/
Observer_pattern

• https://github.com/caffe2/caffe2/tree/master/caffe2/observers
fin

More Related Content

What's hot

TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesKoan-Sin Tan
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Koan-Sin Tan
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android BenchmarksKoan-Sin Tan
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsKoan-Sin Tan
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsKoan-Sin Tan
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolKoan-Sin Tan
 
OpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingOpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingAndreas Schreiber
 
Hacking - high school intro
Hacking - high school introHacking - high school intro
Hacking - high school introPeter Hlavaty
 
One bite and all your dreams will come true: Analyzing and Attacking Apple Ke...
One bite and all your dreams will come true: Analyzing and Attacking Apple Ke...One bite and all your dreams will come true: Analyzing and Attacking Apple Ke...
One bite and all your dreams will come true: Analyzing and Attacking Apple Ke...Priyanka Aash
 
Minko - Scripting 3D apps with Lua and C++
Minko - Scripting 3D apps with Lua and C++Minko - Scripting 3D apps with Lua and C++
Minko - Scripting 3D apps with Lua and C++Minko3D
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISAGanesan Narayanasamy
 
Rainbow Over the Windows: More Colors Than You Could Expect
Rainbow Over the Windows: More Colors Than You Could ExpectRainbow Over the Windows: More Colors Than You Could Expect
Rainbow Over the Windows: More Colors Than You Could ExpectPeter Hlavaty
 
Cloud Deep Learning Chips Training & Inference
Cloud Deep Learning Chips Training & InferenceCloud Deep Learning Chips Training & Inference
Cloud Deep Learning Chips Training & InferenceMr. Vengineer
 
Introduction to Python GUI development with Delphi for Python - Part 1: Del...
Introduction to Python GUI development with Delphi for Python - Part 1:   Del...Introduction to Python GUI development with Delphi for Python - Part 1:   Del...
Introduction to Python GUI development with Delphi for Python - Part 1: Del...Embarcadero Technologies
 
Python on Android with Delphi FMX - The Cross Platform GUI Framework
Python on Android with Delphi FMX - The Cross Platform GUI Framework Python on Android with Delphi FMX - The Cross Platform GUI Framework
Python on Android with Delphi FMX - The Cross Platform GUI Framework Embarcadero Technologies
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioMuralidharan Deenathayalan
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimizationguest3eed30
 

What's hot (20)

TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android Benchmarks
 
Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source Tools
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
 
OpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingOpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel Programming
 
Hacking - high school intro
Hacking - high school introHacking - high school intro
Hacking - high school intro
 
One bite and all your dreams will come true: Analyzing and Attacking Apple Ke...
One bite and all your dreams will come true: Analyzing and Attacking Apple Ke...One bite and all your dreams will come true: Analyzing and Attacking Apple Ke...
One bite and all your dreams will come true: Analyzing and Attacking Apple Ke...
 
Minko - Scripting 3D apps with Lua and C++
Minko - Scripting 3D apps with Lua and C++Minko - Scripting 3D apps with Lua and C++
Minko - Scripting 3D apps with Lua and C++
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
 
Rainbow Over the Windows: More Colors Than You Could Expect
Rainbow Over the Windows: More Colors Than You Could ExpectRainbow Over the Windows: More Colors Than You Could Expect
Rainbow Over the Windows: More Colors Than You Could Expect
 
Is Python still production ready ? Ludovic Gasc
Is Python still production ready ? Ludovic GascIs Python still production ready ? Ludovic Gasc
Is Python still production ready ? Ludovic Gasc
 
Cloud Deep Learning Chips Training & Inference
Cloud Deep Learning Chips Training & InferenceCloud Deep Learning Chips Training & Inference
Cloud Deep Learning Chips Training & Inference
 
Introduction to Python GUI development with Delphi for Python - Part 1: Del...
Introduction to Python GUI development with Delphi for Python - Part 1:   Del...Introduction to Python GUI development with Delphi for Python - Part 1:   Del...
Introduction to Python GUI development with Delphi for Python - Part 1: Del...
 
Python for Delphi Developers - Part 2
Python for Delphi Developers - Part 2Python for Delphi Developers - Part 2
Python for Delphi Developers - Part 2
 
Python on Android with Delphi FMX - The Cross Platform GUI Framework
Python on Android with Delphi FMX - The Cross Platform GUI Framework Python on Android with Delphi FMX - The Cross Platform GUI Framework
Python on Android with Delphi FMX - The Cross Platform GUI Framework
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 

Similar to Caffe2 on Android

Community works for muli core embedded image processing
Community works for muli core embedded image processingCommunity works for muli core embedded image processing
Community works for muli core embedded image processingJeongpyo Kong
 
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013Nick Galbreath
 
"The Caffe2 Framework for Mobile and Embedded Deep Learning," a Presentation ...
"The Caffe2 Framework for Mobile and Embedded Deep Learning," a Presentation ..."The Caffe2 Framework for Mobile and Embedded Deep Learning," a Presentation ...
"The Caffe2 Framework for Mobile and Embedded Deep Learning," a Presentation ...Edge AI and Vision Alliance
 
Mesa and Its Debugging, Вадим Шовкопляс
Mesa and Its Debugging, Вадим ШовкоплясMesa and Its Debugging, Вадим Шовкопляс
Mesa and Its Debugging, Вадим ШовкоплясSigma Software
 
SenchaCon 2016: Develop, Test & Deploy with Docker - Jonas Schwabe
SenchaCon 2016: Develop, Test & Deploy with Docker - Jonas Schwabe SenchaCon 2016: Develop, Test & Deploy with Docker - Jonas Schwabe
SenchaCon 2016: Develop, Test & Deploy with Docker - Jonas Schwabe Sencha
 
How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014Puppet
 
From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R Kai Lichtenberg
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performanceinside-BigData.com
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development PipelineGlobalLogic Ukraine
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesSeungYong Oh
 
ASP.NET Core: The best of the new bits
ASP.NET Core: The best of the new bitsASP.NET Core: The best of the new bits
ASP.NET Core: The best of the new bitsKen Cenerelli
 
Introduction to Civil Infrastructure Platform
Introduction to Civil Infrastructure PlatformIntroduction to Civil Infrastructure Platform
Introduction to Civil Infrastructure PlatformSZ Lin
 
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...gree_tech
 
Bring-your-ML-Project-into-Production-v2.pdf
Bring-your-ML-Project-into-Production-v2.pdfBring-your-ML-Project-into-Production-v2.pdf
Bring-your-ML-Project-into-Production-v2.pdfLiang Yan
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesDatabricks
 
Using Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersUsing Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersJoy Qiao
 

Similar to Caffe2 on Android (20)

Community works for muli core embedded image processing
Community works for muli core embedded image processingCommunity works for muli core embedded image processing
Community works for muli core embedded image processing
 
Cuda
CudaCuda
Cuda
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
 
"The Caffe2 Framework for Mobile and Embedded Deep Learning," a Presentation ...
"The Caffe2 Framework for Mobile and Embedded Deep Learning," a Presentation ..."The Caffe2 Framework for Mobile and Embedded Deep Learning," a Presentation ...
"The Caffe2 Framework for Mobile and Embedded Deep Learning," a Presentation ...
 
Mesa and Its Debugging, Вадим Шовкопляс
Mesa and Its Debugging, Вадим ШовкоплясMesa and Its Debugging, Вадим Шовкопляс
Mesa and Its Debugging, Вадим Шовкопляс
 
SenchaCon 2016: Develop, Test & Deploy with Docker - Jonas Schwabe
SenchaCon 2016: Develop, Test & Deploy with Docker - Jonas Schwabe SenchaCon 2016: Develop, Test & Deploy with Docker - Jonas Schwabe
SenchaCon 2016: Develop, Test & Deploy with Docker - Jonas Schwabe
 
How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014
 
From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
 
september13.ppt
september13.pptseptember13.ppt
september13.ppt
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development Pipeline
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
ASP.NET Core: The best of the new bits
ASP.NET Core: The best of the new bitsASP.NET Core: The best of the new bits
ASP.NET Core: The best of the new bits
 
Introduction to Civil Infrastructure Platform
Introduction to Civil Infrastructure PlatformIntroduction to Civil Infrastructure Platform
Introduction to Civil Infrastructure Platform
 
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
 
CG-Orientation ppt.pptx
CG-Orientation ppt.pptxCG-Orientation ppt.pptx
CG-Orientation ppt.pptx
 
Bring-your-ML-Project-into-Production-v2.pdf
Bring-your-ML-Project-into-Production-v2.pdfBring-your-ML-Project-into-Production-v2.pdf
Bring-your-ML-Project-into-Production-v2.pdf
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
 
Using Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersUsing Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clusters
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Caffe2 on Android

  • 1. Caffe2 on Android Koan-Sin Tan freedom@computer.org March 8th, 2018 Hsinchu Coding Serfs Meeting
  • 2. Quick Intro • Caffe 2 • 2nd generation of Caffe, which was the most popular deep learning framework (before TensorFlow) from Berkeley • What's the difference? Caffe2 improves Caffe 1.0 in a series of directions: • first-class support for large-scale distributed training • mobile deployment • new hardware support (in addition to CPU and CUDA) • flexibility for future directions such as quantized computation • stress tested by the vast scale of Facebook applications https://caffe2.ai/docs/caffe-migration.html
  • 3. Caffe2 on Android • Official Android demo • https://caffe2.ai/docs/AI-Camera-demo-android.html, https://github.com/caffe2/ AICamera • SqueezeNet 1.1: • 5.8/5.7 fps on Samsung S7 and Google Pixel • not very impressive • OpenGL backend • https://www.facebook.com/Caffe2AI/videos/126340488008269/ • up to 6X speedup (24 FPS) compared to CPU on high-end Android devices (e.g. Galaxy S8) for style transfer models
  • 5. • Tensorflow Lite is also looking for the possibility of OpenGL ES backend • https://github.com/tensorflow/tensorflow/issues/16189
  • 6. What can we use on Android now https://github.com/caffe2/caffe2/tree/master/caffe2/mobile/contrib
  • 7. Caffe2 backends for Android I know • ARM CPU: • NNPACK, Eigen: quite mature • OpenGL ES: • OpenGL: not actively maintained (?) • ARM Compute Library (GL ES part): newly added, still growing • NEON, and OpenCL • NNAPI: not fully integrated yet.
  • 8. How to build • > scripts/build_android.sh • With that, no test command line binary test • Caffe 2 has some tests and a simple command line benchmark tool called speed_benchmark > scripts/build_android.sh -DBUILD_TEST -DBUILD_BINARY • then we can get build_android/bin/speed_benchmark and other test binaries • Pytorch has a good tutorial on using it, http://pytorch.org/tutorials/ advanced/super_resolution_with_caffe2.html
  • 9. Some results • > ./speed_benchmark --input_file input.blobproto --input data --init_net init_net.pb --net predict_net.pb -- caffe2_log_level=0 01-06 23:15:42.073 32623 32623 I native : [I net_simple.cc:101] Starting benchmark. 01-06 23:15:42.074 32623 32623 I native : [I net_simple.cc:102] Running warmup runs. 01-06 23:15:42.074 32623 32623 I native : [I net_simple.cc:112] Main runs. 01-06 23:15:43.805 32623 32623 I native : [I net_simple.cc:123] Main run finished. Milliseconds per iter: 173.15. Iters per second: 5.77535
  • 10. Some results • ARM Compute Library backend: Caffe2 addend a Compute Libarry backend on in the end of Februrary 2018. With some tweaks, it's possible to run SqueezeNet 1.1 faster than CPU (NNPAC) with OpenGL 01-04 03:41:38.297 25523 25523 I native : [I gl_model_test.h:52] [C2DEBUG] Benchmarking OpenGL Net 01-04 03:41:38.297 25523 25523 I native : [I net_gl.cc:104] Starting benchmark. 01-04 03:41:38.297 25523 25523 I native : [I net_gl.cc:105] Running warmup runs. 01-04 03:41:38.796 25523 25523 I native : [I net_gl.cc:121] Main runs. 01-04 03:41:43.107 25523 25523 I native : [I net_gl.cc:134] [C2DEBUG] Main run finished. Milliseconds per iter: 43.1077. Iters per second: 23.1977 01-04 03:41:43.110 25523 25523 I native : [I gl_model_test.h:66] [C2DEBUG] Benchmarking CPU Net 01-04 03:41:43.110 25523 25523 I native : [I net_simple.cc:101] Starting benchmark. 01-04 03:41:43.110 25523 25523 I native : [I net_simple.cc:102] Running warmup runs. 01-04 03:41:43.768 25523 25523 I native : [I net_simple.cc:112] Main runs. 01-04 03:41:50.229 25523 25523 I native : [I net_simple.cc:123] Main run finished. Milliseconds per iter: 64.6136. Iters per second: 15.4766
  • 11. Comparing with TF Lite • cmake is easier than bazel :-) • Relatively large, or say comprehensive. If you want to enable something like on-device learning. It's easier to start with TFLite. • binary could be large • Code looks cleaning • Review process, or say, software engineering not as rigid as TensorFlow • TF has a larger team (?) • See, https://www.oreilly.com/ideas/how-the-tensorflow-team-handles-open-source-support • Some interesting code, • The Observer design pattern could be used to measure performance, https://en.wikipedia.org/wiki/ Observer_pattern • https://github.com/caffe2/caffe2/tree/master/caffe2/observers
  • 12. fin