SlideShare a Scribd company logo
1 of 58
prototype -> production
Make your ML app rock
Agenda
• Problems with current workflow
• Interactive exploration to enterprise API
• Data Science Platforms
• My recommendation
About me @geoHeil
• Data Scientist at T-Mobile Austria
• Business Informatics at Vienna University of Technology
• Built predictive startup (predictr.eu)
• Data science projects at university
Ed, 41
Professional developer
Cares about Testing, CI,
stability
John, 28
Phd. cool kid
Wants to build
awesome app
Simple?
Goal: smart application improves business processes
John’s
Smart app
Ed’s
Business
process
Simple?
Goal: smart application improves business processes
Ed’s
Business
process
ML modes: similarity of environments?
Exploration
• Flexibility
• Easy to use
• reusability
Production
• Performance
• Scalability
• Monitoring
• API
Interaction required to improve business process
ML modes
from https://www.youtube.com/watch?v=R-6nAwLyWCI
flexibility performance
Stackup
Problems
• Move to production means
redevelopment from scratch
Solutions
• Notebooks as API
Prototype problem at current project
Easy move to the JVM?
Consultant
R
Me
Python
Production
JVM
native C dependencies
Stackup
Problems
• Move to production means
redevelopment from scratch
• Enterprise operations handle JVM
only
Solutions
• Notebooks as API
• Re develop from scratch
Prototype problem at current project
Easy move to the JVM?
Consultant
R
Me
Python
Production
JVM
native C dependencies
Data exchange possibilities (API)
Pickle – python only
Hadoop file formats (avro/parquet)
Thrift, protobuf
Message queue
REST
Stackup
Problems
• Move to production means
redevelopment from scratch
• Enterprise operations handle JVM
only
Solutions
• Notebooks as API
• Use analytics via an API
Big data starts at
20GB. Want to use
fancy hadoop cluster
We can buy a
server with 6 TB
RAM
3 types of big data
1. Fits in memory (6 TB of RAM …)
2. Raw data too large for memory, but aggregated data works
well
3. Too big => ml needs to be big as well
Stackup
Problems
• Move to production means
redevelopment from scratch
• Enterprise operations handle JVM
only
• Enterprise operations handle JVM
only
• Inflexible big data tools
Solutions
• Notebooks as API
• Use analytics via an API
• Your data is not “really big” and
still fits in memory
Security is
not my job
Disagree /
infoSec
Stackup
Problems
• Move to production means
redevelopment from scratch
• Enterprise operations handle JVM
only
• Inflexible big data tools
• Security not taken care of
Solutions
• Notebooks as API
• Use analytics via an API
• Your data is not “really big” and
still fits in memory ->keep using
python / R / notebooks
• Kerberized hadoop cluster :(
Exploration to
Enterprise API
small data & R prototype
Separation of concerns.
Startup data science – predicting cash flows
• Custom backend (JVM)
• Data science and via an API (OpenCPU / R )
• Partly in backend (Renjin)
Other possibilities
• JNI (java native interface) :(
• JNA (java native access)
• Rkafka (did not have a MQ in infrastructure)
• Custom service (rest call) to JNA enabled server (too
costly)
Music streaming
Anomaly detection big data
Source
https://www.youtube.com/watch?v=t63SF2UkD0A&feature=youtu.be
project facts
• We were using a ms-sql backup (600 GB)
• Spark + parquet compressed it to 3 GB
• No cluster during development of the project, only laptops
+ 60 GB RAM server
• Most of the time spent in garbage collection (15 sec on
real cluster, 17 Minutes on laptop)
Data science stack
• Type 2 big data (aggregation allows for local in memory
processing in python/R)
• Spark as (REST) API
POST /jars/app_name jobserver:port/jars/myjob
POST jobserver:port/contexts/context_for_myapp
POST "paramKey = paramValue"
jobserver:port/myjob?appName=myjob&classPaht=path.to.main&con
text=context_for_myapp
• Aggregated data fed to R via REST-API
Frontend Backend
Data-science
SQL aggregation / spark job-server
Spark cluster
Laptop J
R
via opencpu
Spark aggregaton & R as API
REST call
API
incompatibilities
L
Data science platform
Can the architecture be simplified?
Cloud solutions
• Notebook as API: Databricks workflows / Domino data lab
• Google, Microsoft, Amazon
• Several data science platform startups bigml, dataiku,
...
(+) cluster deploy on click
(+) some integrate notebooks well
(-) control over data?
What is missing?
Custom models, Control over data,
Testing, CI, AB testing, retraining
Several solutions – same problem
Lets try lean
Back to spark architecture overview …
Missing API layer / model deployment
Hydrospheredata/mist notebook, CI -> e2e
CI & testing +1
Notebook e2e +1
But again: a lot of
moving parts
Highly experimental
Seldon –e2e ml platform for enterprise
Seldon architecture
K8s for high availability
Hot model deployments
A-B testing
Holdout group
Containerized micro
services conforming to
seldon’s REST API
Overall verygood
But: outdated python
2.xx
Kubernetes
mandatory
In an ideal world
What I dream of …
Whish list
• Flexibility to experiment (notebooks)on big enough
hardware
• Make these easily available as an API in a pre-production
environment to gain quick business feedback
• A-B testing, holdout group, containers
• More “developer” mindset (Testing, CI, security) for data
scientists
Reality is different.
How I will move forward with my current
project
Write a JVM-based custom backend which operations and existing developers
can maintain. Apparently this is a better fit than a platform turnkey solution.
How to integrate spark?
Spark deployment modes revisited ...
Spark deployment scenarios
• Batch / bulk prediction in cluster -> job scheduling
overhead
• Long running spark application?(SJS, pipeline persistence
àlocal spark context)
• Predictive service without spark
• PMML? jpmml/sklearn2pmml
• scoring without spark -> mleap and SPARK-16365
What is your approach?
Thanks. @geoHeil
PMML - Openscoring
• Based on PMML (predictive markup model language)
(+) stay in java/xml world (enterprise operations J)
(+) quick predictions
(+) mature
(-) not all models suitable for PMML / some algorithms not
implemented
(-) xml
PMML + retraining oryx.io
prediction.IO
h2o steam
E2e platform
Build + deploy
interoparbility
Enterprise
permissions
Based on h2o-flow
pipeline.io notebook à
prediction, e2e
“Extend ml pipelines to
serve production users“
How do tools stack up regarding security?
https://www.youtube.com/watch?v=t63SF2UkD0A&feature=youtu.be
Python (what I learnt later on)
• Easily can deployed on its own (if ops can handle this)
• Python4j/ pyspark/ spylon?
Science in Python, production in java – spylon, Video
• Bring code via custom UDF to data in pySpark
• Model = fitted sk-learn model
• Requires model to be parallelizable
others
• Jupyter notebook to REST API (IBM interactive dashboard
http://blog.ibmjstart.net/2016/01/28/jupyter-notebooks-as-restful-microservices/)
• Apache toree (interactive spark as notebook)

More Related Content

What's hot

Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stack
Vikrant Chauhan
 

What's hot (20)

Elk - An introduction
Elk - An introductionElk - An introduction
Elk - An introduction
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Introducing ELK
Introducing ELKIntroducing ELK
Introducing ELK
 
Adding Search to Relational Databases
Adding Search to Relational DatabasesAdding Search to Relational Databases
Adding Search to Relational Databases
 
Pinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastorePinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastore
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stack
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELK
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
 
Elk stack
Elk stackElk stack
Elk stack
 
Log analytics with ELK stack
Log analytics with ELK stackLog analytics with ELK stack
Log analytics with ELK stack
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
Elk
Elk Elk
Elk
 
Introduction To Kibana
Introduction To KibanaIntroduction To Kibana
Introduction To Kibana
 
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMillDelivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and Iceberg
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 

Viewers also liked

Viewers also liked (11)

Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong Yan
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in Python
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis
 
Practical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibPractical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlib
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
 
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 

Similar to Machine learning model to production

Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013
Marc Gille
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
Paco Nathan
 

Similar to Machine learning model to production (20)

IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Webinar september 2013
Webinar september 2013Webinar september 2013
Webinar september 2013
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglot
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 

Recently uploaded

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 

Recently uploaded (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

Machine learning model to production

  • 1. prototype -> production Make your ML app rock
  • 2. Agenda • Problems with current workflow • Interactive exploration to enterprise API • Data Science Platforms • My recommendation
  • 3. About me @geoHeil • Data Scientist at T-Mobile Austria • Business Informatics at Vienna University of Technology • Built predictive startup (predictr.eu) • Data science projects at university
  • 4. Ed, 41 Professional developer Cares about Testing, CI, stability John, 28 Phd. cool kid Wants to build awesome app
  • 5. Simple? Goal: smart application improves business processes John’s Smart app Ed’s Business process
  • 6. Simple? Goal: smart application improves business processes Ed’s Business process
  • 7. ML modes: similarity of environments? Exploration • Flexibility • Easy to use • reusability Production • Performance • Scalability • Monitoring • API Interaction required to improve business process ML modes
  • 9. Stackup Problems • Move to production means redevelopment from scratch Solutions • Notebooks as API
  • 10. Prototype problem at current project Easy move to the JVM? Consultant R Me Python Production JVM native C dependencies
  • 11. Stackup Problems • Move to production means redevelopment from scratch • Enterprise operations handle JVM only Solutions • Notebooks as API • Re develop from scratch
  • 12. Prototype problem at current project Easy move to the JVM? Consultant R Me Python Production JVM native C dependencies
  • 13. Data exchange possibilities (API) Pickle – python only Hadoop file formats (avro/parquet) Thrift, protobuf Message queue REST
  • 14. Stackup Problems • Move to production means redevelopment from scratch • Enterprise operations handle JVM only Solutions • Notebooks as API • Use analytics via an API
  • 15. Big data starts at 20GB. Want to use fancy hadoop cluster We can buy a server with 6 TB RAM
  • 16. 3 types of big data 1. Fits in memory (6 TB of RAM …) 2. Raw data too large for memory, but aggregated data works well 3. Too big => ml needs to be big as well
  • 17. Stackup Problems • Move to production means redevelopment from scratch • Enterprise operations handle JVM only • Enterprise operations handle JVM only • Inflexible big data tools Solutions • Notebooks as API • Use analytics via an API • Your data is not “really big” and still fits in memory
  • 18. Security is not my job Disagree / infoSec
  • 19. Stackup Problems • Move to production means redevelopment from scratch • Enterprise operations handle JVM only • Inflexible big data tools • Security not taken care of Solutions • Notebooks as API • Use analytics via an API • Your data is not “really big” and still fits in memory ->keep using python / R / notebooks • Kerberized hadoop cluster :(
  • 21. small data & R prototype Separation of concerns.
  • 22. Startup data science – predicting cash flows • Custom backend (JVM) • Data science and via an API (OpenCPU / R ) • Partly in backend (Renjin)
  • 23. Other possibilities • JNI (java native interface) :( • JNA (java native access) • Rkafka (did not have a MQ in infrastructure) • Custom service (rest call) to JNA enabled server (too costly)
  • 25.
  • 26.
  • 27.
  • 29. project facts • We were using a ms-sql backup (600 GB) • Spark + parquet compressed it to 3 GB • No cluster during development of the project, only laptops + 60 GB RAM server • Most of the time spent in garbage collection (15 sec on real cluster, 17 Minutes on laptop)
  • 30. Data science stack • Type 2 big data (aggregation allows for local in memory processing in python/R) • Spark as (REST) API POST /jars/app_name jobserver:port/jars/myjob POST jobserver:port/contexts/context_for_myapp POST "paramKey = paramValue" jobserver:port/myjob?appName=myjob&classPaht=path.to.main&con text=context_for_myapp • Aggregated data fed to R via REST-API
  • 31. Frontend Backend Data-science SQL aggregation / spark job-server Spark cluster Laptop J R via opencpu Spark aggregaton & R as API REST call API incompatibilities L
  • 32. Data science platform Can the architecture be simplified?
  • 33. Cloud solutions • Notebook as API: Databricks workflows / Domino data lab • Google, Microsoft, Amazon • Several data science platform startups bigml, dataiku, ... (+) cluster deploy on click (+) some integrate notebooks well (-) control over data?
  • 34. What is missing? Custom models, Control over data, Testing, CI, AB testing, retraining
  • 35. Several solutions – same problem
  • 36. Lets try lean Back to spark architecture overview …
  • 37. Missing API layer / model deployment
  • 39. CI & testing +1 Notebook e2e +1 But again: a lot of moving parts Highly experimental
  • 40. Seldon –e2e ml platform for enterprise
  • 41. Seldon architecture K8s for high availability Hot model deployments A-B testing Holdout group Containerized micro services conforming to seldon’s REST API Overall verygood But: outdated python 2.xx Kubernetes mandatory
  • 42. In an ideal world What I dream of …
  • 43. Whish list • Flexibility to experiment (notebooks)on big enough hardware • Make these easily available as an API in a pre-production environment to gain quick business feedback • A-B testing, holdout group, containers • More “developer” mindset (Testing, CI, security) for data scientists
  • 44. Reality is different. How I will move forward with my current project
  • 45. Write a JVM-based custom backend which operations and existing developers can maintain. Apparently this is a better fit than a platform turnkey solution.
  • 46. How to integrate spark? Spark deployment modes revisited ...
  • 47. Spark deployment scenarios • Batch / bulk prediction in cluster -> job scheduling overhead • Long running spark application?(SJS, pipeline persistence àlocal spark context) • Predictive service without spark • PMML? jpmml/sklearn2pmml • scoring without spark -> mleap and SPARK-16365
  • 48. What is your approach? Thanks. @geoHeil
  • 49. PMML - Openscoring • Based on PMML (predictive markup model language) (+) stay in java/xml world (enterprise operations J) (+) quick predictions (+) mature (-) not all models suitable for PMML / some algorithms not implemented (-) xml
  • 50. PMML + retraining oryx.io
  • 52. h2o steam E2e platform Build + deploy interoparbility Enterprise permissions Based on h2o-flow
  • 53.
  • 54. pipeline.io notebook à prediction, e2e “Extend ml pipelines to serve production users“
  • 55. How do tools stack up regarding security? https://www.youtube.com/watch?v=t63SF2UkD0A&feature=youtu.be
  • 56. Python (what I learnt later on) • Easily can deployed on its own (if ops can handle this) • Python4j/ pyspark/ spylon?
  • 57. Science in Python, production in java – spylon, Video • Bring code via custom UDF to data in pySpark • Model = fitted sk-learn model • Requires model to be parallelizable
  • 58. others • Jupyter notebook to REST API (IBM interactive dashboard http://blog.ibmjstart.net/2016/01/28/jupyter-notebooks-as-restful-microservices/) • Apache toree (interactive spark as notebook)

Editor's Notes

  1. Hi Georg. Talk about how to not have a smart prototype script rot in the corner. First talk ;) Question: Who has played with machine learning who is familiar with R / python? Who is using big data technology in production? Who is drving business decisions with ML?
  2. Discussion about how you deploy models
  3. Apache Toree, Jupyter notebooks as REST api (IBM)
  4. Notebooks can execute JVM code as well