Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
skymind.io | deeplearning.org | gitter.im/deeplearning4j
Deep Learning in Production
Building Production Class Deep Learni...
Topics
• Deep Learning in Production vs Academia
• Data Scientists vs Engineers
• Defining Production
• A solution
Deep Learning in Production vs
Academia
Academia/Research
Focus on accuracy and the latest architectures
Build proof of concepts quickly to validate an assumption...
Current state of research
Mostly funded by large consumer companies (Amazon,Google,Facebook,..)
Scant pockets of deep lear...
People in Deep Learning
• Talent still sparse
• Most are in research labs
• Some of them are enthusiasts or startup founde...
Industry (MOST Companies doing data science)
● Most use linear regression and random forest
● Prototyping happens in pytho...
Data Scientists vs Engineers
Data Scientists
• Math or stats background - know r or python
• Often a beginning coder - may have started in sql and
move...
Data Engineers
• Computer Science background
• Builds data pipelines and knows how to setup
production systems
• Doesn’t r...
The hybrid
• Been in the game a while knows CS and stats
• Knows SQL, machine learning, and how to operate a
spark cluster...
Most companies
• 2 separate teams
• Data scientists use python/r and sql, experiment with
data and come up with new models...
Startups
● Tend to employ generalists
● Usually 3-5 people who can sort of do both. Startups aren’t usually ready to
hire ...
Defining Production
Defining “Production”
● Varying degrees of scale
● Not everyone has terabytes of data
● Mysql and outsourced cloud service...
Hardware
• GPUs have very little market penetration
• Deep Learning also has very little market penetration
(despite the m...
Typical stack
• Web based product (go,ruby,python,scala,java,mix)
• Storage (1 or more sql databases, elasticsearch/solr)
...
Machine Learning at startups
• Random 1 off scripts for analysis
• Random 1 off notebooks
• 1 off ETL pipelines written in...
Machine Learning at big companies
• Random 1 off scripts for analysis
• Random 1 off notebooks
• Large numbers of separate...
Challenges in Production
• Serving user traffic (latency)
• Data access (connecting everything together)
• Large amounts o...
Challenges of Deep Learning in Production
• Same problems as machine learning
• Hard to interpret models
• Requires specia...
Closing the gap
Establish some best practices
• Kaggle is a good start for this - start with “somewhat real” problems
• Use higher level t...
Going to production
• Sometimes python is enough for simple stuff
• Data Engineering teams should consider java/scala
base...
Newer hardware
• Prototype on cloud infrastructure on a toy problem
• Try out this “GPU thing” and see what might be
invol...
In closing
• Use something open source to start off with
• Use something *supported* keep an eye on open
source activity
•...
Thank you!
Please visit
skymind.io/learn for more
information
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
Wrangleconf Big Data Malaysia 2016
Next
Download to read offline and view in fullscreen.

8

Share

Download to read offline

Deep learning in production with the best

Download to read offline

Getting deep learning adopted at your company. The current landscape of academia vs industry. Presentation at AI with the best (online conference):
http://ai.withthebest.com/

Deep learning in production with the best

  1. 1. skymind.io | deeplearning.org | gitter.im/deeplearning4j Deep Learning in Production Building Production Class Deep Learning Workflows for the Enterprise Adam Gibson / CTO Skymind AI With the Best / The Internet
  2. 2. Topics • Deep Learning in Production vs Academia • Data Scientists vs Engineers • Defining Production • A solution
  3. 3. Deep Learning in Production vs Academia
  4. 4. Academia/Research Focus on accuracy and the latest architectures Build proof of concepts quickly to validate an assumption Prototype as many ideas as quickly as possible to come up with a solution to a problem Publish often incremental results to increase publications
  5. 5. Current state of research Mostly funded by large consumer companies (Amazon,Google,Facebook,..) Scant pockets of deep learning academic institutions (CMU,Stanford,NYU,..) Large focus on audio and vision, somewhat spreading in to natural language processing Starting to focus more on reinforcement learning and better ways of tuning
  6. 6. People in Deep Learning • Talent still sparse • Most are in research labs • Some of them are enthusiasts or startup founders • Reality: Deep Learning hasn’t hit most of the world yet. It affects alot of people but most aren’t doing it.
  7. 7. Industry (MOST Companies doing data science) ● Most use linear regression and random forest ● Prototyping happens in python - these are data scientists ● Data Engineers hold the keys to the cluster (write code in java) ● Most problems are simple - analytics, churn prediction, maybe recommendation engines or price forecasting ● Deep Learning is seen as overkill - no gpus in your cluster
  8. 8. Data Scientists vs Engineers
  9. 9. Data Scientists • Math or stats background - know r or python • Often a beginning coder - may have started in sql and moved up to analytics • Know basic machine learning - problems are focused on replacing excel spreadsheets or solving business problems
  10. 10. Data Engineers • Computer Science background • Builds data pipelines and knows how to setup production systems • Doesn’t really know machine learning that well - usually willing to learn • Usually closer to the product team - may port python algorithms to java depending on level of ability
  11. 11. The hybrid • Been in the game a while knows CS and stats • Knows SQL, machine learning, and how to operate a spark cluster • Can formulate problems and figure out what projects to tackle next • Either understands business objectives or can implement machine learning algorithms themselves
  12. 12. Most companies • 2 separate teams • Data scientists use python/r and sql, experiment with data and come up with new models (very little machine learning) • Data engineers use java (sometimes .net) and work on terabytes of data - most time spent writing integrations and data pipelines
  13. 13. Startups ● Tend to employ generalists ● Usually 3-5 people who can sort of do both. Startups aren’t usually ready to hire specialists ● Sometimes have a product where something like deep learning is needed ● Usually ruby or python stack, not many users or scale ● Usually just want something simple to setup ● Not much need for compiled languages or scale yet - this comes later
  14. 14. Defining Production
  15. 15. Defining “Production” ● Varying degrees of scale ● Not everyone has terabytes of data ● Mysql and outsourced cloud services are “machine learning” for most startups ● Many will start out with scikit learn and flask, maybe add python based deep learning later. This is “good enough” - this is also what you see the most tutorials for ● Larger companies care more about other things - security,scale, and return on investment for projects. These companies use java ● If you’re google you use c++ or facebook you use your own version of php you wrote and maintain
  16. 16. Hardware • GPUs have very little market penetration • Deep Learning also has very little market penetration (despite the marketing) • Most of the world is cpus (this is changing very slowly) • Startups are fine with cloud - on prem data centers are usually dell or hp servers with red hat or ubuntu on them
  17. 17. Typical stack • Web based product (go,ruby,python,scala,java,mix) • Storage (1 or more sql databases, elasticsearch/solr) • Cloud infrastructure or on prem (bare metal) • Machine Learning - ???
  18. 18. Machine Learning at startups • Random 1 off scripts for analysis • Random 1 off notebooks • 1 off ETL pipelines written in java • 1 or more models tied to a rest api that talks to your product stack
  19. 19. Machine Learning at big companies • Random 1 off scripts for analysis • Random 1 off notebooks • Large numbers of separate data bases and applications run by different teams • Multiple disconnected apis • Some models connected to a spark or hadoop cluster
  20. 20. Challenges in Production • Serving user traffic (latency) • Data access (connecting everything together) • Large amounts of time spent on data pipeline code • Unclear metrics of success for the data team • Lack of innovation or “too much” eg: “chase the shiny new thing”
  21. 21. Challenges of Deep Learning in Production • Same problems as machine learning • Hard to interpret models • Requires specialized hardware • Not a lot of best practices • Lack of expertise (machine learning is hard enough)
  22. 22. Closing the gap
  23. 23. Establish some best practices • Kaggle is a good start for this - start with “somewhat real” problems • Use higher level tools - keras, otherwise easy to get lost in weeds • Consider having a real world goal - eg: if you’re in real estate figure out how to use a simple cnn (not the latest algorithm) for image search • Depending on need consider integration with hadoop/spark • Lastly - don’t treat deep learning as special. It’s still a subfield of machine learning
  24. 24. Going to production • Sometimes python is enough for simple stuff • Data Engineering teams should consider java/scala based solutions (disclaimer: highly opinionated here) • Follow same workflow - prototype in python port to production • Overall - scope to a core problem where deep learning is worth it
  25. 25. Newer hardware • Prototype on cloud infrastructure on a toy problem • Try out this “GPU thing” and see what might be involved • Learn the trade offs of cpus and gpus - don’t believe the marketing • Buy new hardware as needed
  26. 26. In closing • Use something open source to start off with • Use something *supported* keep an eye on open source activity • Don’t just believe the research. Papers are not your company. Do due diligence
  27. 27. Thank you! Please visit skymind.io/learn for more information
  • BenjaminKUCHCIK

    Jul. 31, 2017
  • AlisonKline

    Jul. 20, 2017
  • VenkataramanaPatchip

    Jul. 18, 2017
  • JrmeFOURMOND

    Jul. 15, 2017
  • FranckMarchand2

    Jul. 15, 2017
  • bunkertor

    Nov. 3, 2016
  • kyuhwanjung

    Sep. 25, 2016
  • DomDeSicilia

    Sep. 25, 2016

Getting deep learning adopted at your company. The current landscape of academia vs industry. Presentation at AI with the best (online conference): http://ai.withthebest.com/

Views

Total views

3,602

On Slideshare

0

From embeds

0

Number of embeds

244

Actions

Downloads

86

Shares

0

Comments

0

Likes

8

×