SlideShare a Scribd company logo
1 of 40
robertwdempsey.com
Building a
Production-Level
Machine Learning Pipeline
Robert Dempsey, CEO
Atlantic Dominion Solutions
robertwdempsey.com Production ML Pipelines
Robert Dempsey
2
Entrepreneur, Software Engineer
Books and online courses
Lotus Guides, District Data Labs
Atlantic Dominion Solutions, LLC
Professional
Author
Instructor
Owner
robertwdempsey.com Production ML Pipelines
We’ve mastered three jobs so you can
focus on one - growing your business.
3
robertwdempsey.com Production ML Pipelines
The Three Jobs
At Atlantic Dominion Solutions we perform three functions for our
customers:
Consulting: we assess and advise in the areas of technology, team and
process to determine how machine learning can have the biggest impact on
your business.
Implementation: after a strategy session to determine the work you need we
get to work using our proven methodology and begin delivering smarter
applications.
Training: continuous improvement requires continuous learning. We provide
both on-premises and online training.
4
robertwdempsey.com Production ML Pipelines
Writing the Book
Co-authoring the book Building
Machine Learning Pipelines.
Written for software developers and
data scientists, Building Machine
Learning Pipelines teaches the skills
required to create and use the
infrastructure needed to run modern
intelligent systems.
machinelearningpipelines.com
5
robertwdempsey.com Production ML Pipelines6
What’s your biggest issue?
robertwdempsey.com Production ML Pipelines7
Technology is LEAST important
robertwdempsey.com Production ML Pipelines8
The REPORT Framework™
robertwdempsey.com Production ML Pipelines
REPORT Framework™
Risk Tolerance
Expectations
Product
Operations
Results
Team
9
robertwdempsey.com Production ML Pipelines
Risk Tolerance
Question: How risk averse are you?
Some companies happily deploy beta and release candidate versions of cutting
edge open source software. Others enjoy the freedom of open source and look
for only mature applications. And yet a third category swear off open source
all together and only buy software that comes with a license and a support
contract. Where does your company sit on the risk aversion spectrum?
Question: What are your non-technology risks?
Technology aside, what happens if your project fails? Do you get fired? Does
the entire team get fired? Do the naysayers get to say “I told you so” in a
meeting?
10
robertwdempsey.com Production ML Pipelines
Expectations
Question: What are the expectations around the project?
Here are a few questions to get you started:
• Non-Technical
• How long do you think the project will take? How much do you
expect it to cost?
• What are others expecting the system will be able to do?
• Technical
• How much volume does the system need to be able to process? In
what amount of time?
• What level of downtime can you absorb?
11
robertwdempsey.com Production ML Pipelines
Product
Question: What does the product roadmap say?
At a minimum a bullet point list will help set the expectations of others,
and allow you to make trade-offs as the project moves forward. It also
helps you measure results - discussed later - on an incremental basis,
which will help your team know if they are making progress, or not.
Question: What’s the budget and estimated ROI?
As with expectations and product roadmap, whether formalized or not,
there is always, or should always be a budget as well as an estimated
ROI. Write it down and use it as one of your metrics.
12
robertwdempsey.com Production ML Pipelines
Operations
Question: Got DevOps?
DevOps, sometimes called TechOps, is a group that manages
and maintains the technology infrastructure of the organization.
Just because you have a DevOps team doesn’t mean you want
to add additional strain on them by firing up more servers.
With cloud providers like AWS you still have to do some
infrastructure support and maintenance. The larger your
business the more support work there will be.
13
robertwdempsey.com Production ML Pipelines
Results
Question: What does the end result look like?
Here’s a very partial list of results we’ve seen measured:
• The project was completed on X date by X time.
• The project cost $X amount of money to complete.
• The team worked no more than 40 hours each week to get
the project done.
• X, Y and Z features are in the product and have 90%
automated test coverage.
14
robertwdempsey.com Production ML Pipelines
Team
Question: Are the right people on the bus to get the project completed?
Having the right people with the right skills, both hard and soft, can
make or break a project.
Question: Does each team member have the tools and support they
need to be successful?
• Does the team have the support of senior leadership?
• Are they going to encounter a deluge of bureaucratic red tape that
will slow their progress?
• Are development and testing environments available?
15
robertwdempsey.com Production ML Pipelines
ML Pipeline
Toolbox
16
robertwdempsey.com Production ML Pipelines
The “Standard” ML Pipeline
17
Collect Store Enrich
Train /
Apply
Visualize
Infrastructure
robertwdempsey.com Production ML Pipelines
Infrastructure
• Servers
• Amazon EC2
• Data center
• Container Technologies
• Docker
• Amazon Elastic Container Service (ECS)
18
robertwdempsey.com Production ML Pipelines
Collect
• Programming Languages
• Python
• Scala
• Go
• R
• Pre-Built Tools
• Pentaho Data Integration
• Various web scraping tools
19
robertwdempsey.com Production ML Pipelines
Store
• Elasticsearch
• Apache Kafka
• Redis
• Cassandra
• MongoDB
• SQL
• Amazon S3
• HDFS
• Many others
20
robertwdempsey.com Production ML Pipelines
Enrich
• Apache Storm
• Apache Spark
• Amazon Elastic MapReduce (EMR)
• Apache Nifi
• Airflow (Airbnb)
21
robertwdempsey.com Production ML Pipelines
Train / Apply
• Python Libraries
• Scikit-learn
• Pandas
• Spark Libraries
• MLlib
• Deep Learning
• Tensorflow
• PyTorch
22
robertwdempsey.com Production ML Pipelines
Visualize
• Kibana
• Grafana
• Amazon Athena (for S3)
• Flask
• D3.js
23
robertwdempsey.com Production ML Pipelines
Machine Learning
Pipeline Architectures
24
robertwdempsey.com Production ML Pipelines
Architecture 1
25
Agent
File
System
Apache
Spark
File
System
Agent ES
1 2 3
robertwdempsey.com Production ML Pipelines
Architecture 1 Choices
This pipeline was built at a company building a new platform
using all leading-edge technologies, and was a temporary
solution until another pipeline was built.
• Risk Aversion: not an issue.
• Expectations: the pipeline needed to be run in production
and be able to handle the amount of data the company had
in a timely fashion.
• Product: this was a short-term solution to process data until
the desired pipeline was ready to be deployed into
production.
26
robertwdempsey.com Production ML Pipelines
Architecture 1 Choices
• Operations: due to its simplicity and limited functionality,
the solution became a one-server solution deployed by an
engineer working in unison with an internal devops team
member.
• Results: the pipeline was deployed on time and was able to
process all the data within the parameters
• Team: after a consultant built the first version of the
application an internal team member took over and
deployed it into production.
27
robertwdempsey.com Production ML Pipelines
Architecture 2
28
Agent
1 2 3
Agent
Agent
ES
S3
HDFS
Apache
Kafka
Apache
Storm
robertwdempsey.com Production ML Pipelines
Architecture 2 Choices
This pipeline was built at a startup focused on data collection
and was core to the product.
• Risk Aversion: this was the second version of a previously
developed and well proven pipeline so risk aversion was low.
• Expectations: as a core product the pipeline was expected to
be continuously evolving, able to be horizontally scaled, able
to handle a growing amount of data, and have 100% uptime.
• Product: the functionality built was in line with a product
roadmap that was reviewed on a monthly basis.
29
robertwdempsey.com Production ML Pipelines
Architecture 2 Choices
• Operations: an internal devops team managed the
infrastructure while engineers were expected to support the
associated applications and data processors
• Results: the pipeline could be horizontally scaled, handled
between 1-2TB of data per day, and had 99.9% uptime.
• Team: the devops and engineering teams worked together
to produce and support it.
30
robertwdempsey.com Production ML Pipelines
Architecture 3
31
Agent
1 2 3
Agent
Agent
Athena
S3
S3
Apache
Spark
robertwdempsey.com Production ML Pipelines
Architecture 3 Choices
This pipeline was built at a company building a new platform
using all leading-edge technologies, and was a temporary
solution until another pipeline was built.
• Risk Aversion: this system was mission critical for delivering
data in real-time to customers. Failure was not an option so
best in class practices needed to be implemented included
using hosted solutions such as Databricks and S3.
• Expectations: this system would scale as data collection
efforts grew and would be extremely fault tolerant.
32
robertwdempsey.com Production ML Pipelines
Architecture 3 Choices
• Product: this system would be extended to accommodate
additional product offerings so flexibility was important.
• Operations: this system was maintained by the engineers
who built it as there no separate devops team.
• Results: the system processed several TBs of data per hour
(need to double check this) with minimal downtime.
• Team: the team supporting the pipeline set up monitoring
and alerting to ensure uptime and worked with other
engineering groups to deconflict deployments that might
impact the pipeline.
33
robertwdempsey.com Production ML Pipelines
Architecture 4
34
Agent
1 2 3
Agent
Agent
ES
S3
HDFS
Apache
Kafka
Apache
Spark
HBase
robertwdempsey.com Production ML Pipelines
Architecture 4 Choices
This pipeline was built at a company building a new platform using all
leading-edge technologies, and was a temporary solution until another
pipeline was built.
• Risk Aversion: this system supported a key customer and was being
implemented as a means to resolve data loss and data discrepancies
that had plagued a legacy system.
• Expectations: this system would be resilient in the event of an outage
so that no data would be lost.
• Product: this system would ultimately be replaced by a more general
system designed to support multiple customers, so it was considered
extremely critical yet a one-off.
35
robertwdempsey.com Production ML Pipelines
Architecture 4 Choices
• Operations: this system was maintained by the engineers
who built it as at the time there was no technical operations
team in place.
• Results: the system processed hundreds of GBs of data per
day with infrequent outages.
• Team: once deployed, the team of developers who built this
pipeline began work on incorporating its features into a
more generalized stream processing platform.
36
robertwdempsey.com Production ML Pipelines
Q&A
37
robertwdempsey.com Production ML Pipelines
Free Guide
robertwdempsey.com/machineryai
38
robertwdempsey.com Production ML Pipelines
Where to Find Me
Website
Lotus Guides
LinkedIn
Twitter
Github
39
robertwdempsey.com
lotusguides.com
robertwdempsey
rdempsey
rdempsey
robertwdempsey.com Production ML Pipelines
Thank You!
40

More Related Content

What's hot

ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
 ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens... ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...Databricks
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningLviv Startup Club
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowDatabricks
 
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Databricks
 
Provenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine LearningProvenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine LearningAnand Sampat
 
Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2Richard Coffey
 
Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Anand Sampat
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Productioniguazio
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowJan Kirenz
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Databricks
 
Managers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsManagers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsGianmario Spacagna
 
Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleNoriaki Tatsumi
 
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at ScaleInfrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at ScaleRobb Boyd
 
Ml infra at an early stage
Ml infra at an early stageMl infra at an early stage
Ml infra at an early stageNick Handel
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
 
Py data scikit-production
Py data scikit-productionPy data scikit-production
Py data scikit-productionTuri, Inc.
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkDatabricks
 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningSergey Karayev
 

What's hot (20)

ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
 ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens... ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine Learning
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflow
 
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
 
Provenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine LearningProvenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine Learning
 
Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2
 
Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Ai use cases
Ai use casesAi use cases
Ai use cases
 
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
 
Managers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsManagers guide to effective building of machine learning products
Managers guide to effective building of machine learning products
 
Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scale
 
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at ScaleInfrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
 
Ml infra at an early stage
Ml infra at an early stageMl infra at an early stage
Ml infra at an early stage
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
Py data scikit-production
Py data scikit-productionPy data scikit-production
Py data scikit-production
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache Spark
 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep Learning
 

Viewers also liked

Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructurejoshwills
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operationsStepan Pushkarev
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CaptureJeff Klukas
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanHakka Labs
 
Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...PyData
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsTuri, Inc.
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningStepan Pushkarev
 
Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learnJeff Klukas
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataRobert Dempsey
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In ProductionSamir Bessalah
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelinesjeykottalam
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architectureStepan Pushkarev
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017Carol Smith
 

Viewers also liked (14)

Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructure
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong Yan
 
Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
 
Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learn
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of Data
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 

Similar to Building Production ML Pipelines

Introduction to Agile Hardware
Introduction to Agile Hardware Introduction to Agile Hardware
Introduction to Agile Hardware Cprime
 
7 Practices to Expand Performance and Effective Collaboration in DevOps
7 Practices to Expand Performance and Effective Collaboration in DevOps7 Practices to Expand Performance and Effective Collaboration in DevOps
7 Practices to Expand Performance and Effective Collaboration in DevOpsDynatrace
 
Agileand saas davepatterson_armandofox_050813webinar
Agileand saas davepatterson_armandofox_050813webinarAgileand saas davepatterson_armandofox_050813webinar
Agileand saas davepatterson_armandofox_050813webinarRoberto Jr. Figueroa
 
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...Roberto Pérez Alcolea
 
Metrics to Power DevOps
Metrics to Power DevOpsMetrics to Power DevOps
Metrics to Power DevOpsCollabNet
 
Modern software architect post the agile wave
Modern software architect post the agile waveModern software architect post the agile wave
Modern software architect post the agile waveNiels Bech Nielsen
 
Introduction to DevOps slides.pdf
Introduction to DevOps slides.pdfIntroduction to DevOps slides.pdf
Introduction to DevOps slides.pdfBoreVishnusai
 
Agile Development – Why requirements matter
Agile Development – Why requirements matterAgile Development – Why requirements matter
Agile Development – Why requirements matterAgile Austria Conference
 
Development And Operations PowerPoint Presentation Slides
Development And Operations PowerPoint Presentation Slides Development And Operations PowerPoint Presentation Slides
Development And Operations PowerPoint Presentation Slides SlideTeam
 
Agile Governance for Hybrid Programs
Agile Governance for Hybrid ProgramsAgile Governance for Hybrid Programs
Agile Governance for Hybrid ProgramsCprime
 
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Agile India
 
Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...
Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...
Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...Philippe Krief
 
Driving Faster Analytics at Symphony Health
Driving Faster Analytics at Symphony HealthDriving Faster Analytics at Symphony Health
Driving Faster Analytics at Symphony HealthPrecisely
 
Open / Drupal Camp Presentation: Brent Bice
Open / Drupal Camp Presentation: Brent BiceOpen / Drupal Camp Presentation: Brent Bice
Open / Drupal Camp Presentation: Brent BiceLevelTen Interactive
 
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013Emtec Inc.
 
Software architecture in a DevOps world
Software architecture in a DevOps worldSoftware architecture in a DevOps world
Software architecture in a DevOps worldBert Jan Schrijver
 

Similar to Building Production ML Pipelines (20)

Introduction to Agile Hardware
Introduction to Agile Hardware Introduction to Agile Hardware
Introduction to Agile Hardware
 
7 Practices to Expand Performance and Effective Collaboration in DevOps
7 Practices to Expand Performance and Effective Collaboration in DevOps7 Practices to Expand Performance and Effective Collaboration in DevOps
7 Practices to Expand Performance and Effective Collaboration in DevOps
 
Agileand saas davepatterson_armandofox_050813webinar
Agileand saas davepatterson_armandofox_050813webinarAgileand saas davepatterson_armandofox_050813webinar
Agileand saas davepatterson_armandofox_050813webinar
 
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
 
Developing apps faster
Developing apps fasterDeveloping apps faster
Developing apps faster
 
Capstone- Milestone 3
Capstone- Milestone 3Capstone- Milestone 3
Capstone- Milestone 3
 
Metrics to Power DevOps
Metrics to Power DevOpsMetrics to Power DevOps
Metrics to Power DevOps
 
Modern software architect post the agile wave
Modern software architect post the agile waveModern software architect post the agile wave
Modern software architect post the agile wave
 
Introduction to DevOps slides.pdf
Introduction to DevOps slides.pdfIntroduction to DevOps slides.pdf
Introduction to DevOps slides.pdf
 
Agile Development – Why requirements matter
Agile Development – Why requirements matterAgile Development – Why requirements matter
Agile Development – Why requirements matter
 
Development And Operations PowerPoint Presentation Slides
Development And Operations PowerPoint Presentation Slides Development And Operations PowerPoint Presentation Slides
Development And Operations PowerPoint Presentation Slides
 
OOP 2014 - Lifecycle By Design
OOP 2014 - Lifecycle By DesignOOP 2014 - Lifecycle By Design
OOP 2014 - Lifecycle By Design
 
Agile Governance for Hybrid Programs
Agile Governance for Hybrid ProgramsAgile Governance for Hybrid Programs
Agile Governance for Hybrid Programs
 
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...
 
Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...
Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...
Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...
 
Driving Faster Analytics at Symphony Health
Driving Faster Analytics at Symphony HealthDriving Faster Analytics at Symphony Health
Driving Faster Analytics at Symphony Health
 
Open / Drupal Camp Presentation: Brent Bice
Open / Drupal Camp Presentation: Brent BiceOpen / Drupal Camp Presentation: Brent Bice
Open / Drupal Camp Presentation: Brent Bice
 
Utils_Presentation_Richard U
Utils_Presentation_Richard UUtils_Presentation_Richard U
Utils_Presentation_Richard U
 
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
 
Software architecture in a DevOps world
Software architecture in a DevOps worldSoftware architecture in a DevOps world
Software architecture in a DevOps world
 

More from Robert Dempsey

Analyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudAnalyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudRobert Dempsey
 
Practical Predictive Modeling in Python
Practical Predictive Modeling in PythonPractical Predictive Modeling in Python
Practical Predictive Modeling in PythonRobert Dempsey
 
Creating Your First Predictive Model In Python
Creating Your First Predictive Model In PythonCreating Your First Predictive Model In Python
Creating Your First Predictive Model In PythonRobert Dempsey
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With PythonRobert Dempsey
 
DC Python Intro Slides - Rob's Version
DC Python Intro Slides - Rob's VersionDC Python Intro Slides - Rob's Version
DC Python Intro Slides - Rob's VersionRobert Dempsey
 
Content Marketing Strategy for 2013
Content Marketing Strategy for 2013Content Marketing Strategy for 2013
Content Marketing Strategy for 2013Robert Dempsey
 
Creating Lead-Generating Social Media Campaigns
Creating Lead-Generating Social Media CampaignsCreating Lead-Generating Social Media Campaigns
Creating Lead-Generating Social Media CampaignsRobert Dempsey
 
Google AdWords Introduction
Google AdWords IntroductionGoogle AdWords Introduction
Google AdWords IntroductionRobert Dempsey
 
20 Tips For Freelance Success
20 Tips For Freelance Success20 Tips For Freelance Success
20 Tips For Freelance SuccessRobert Dempsey
 
How To Turn Your Business Into A Media Powerhouse
How To Turn Your Business Into A Media PowerhouseHow To Turn Your Business Into A Media Powerhouse
How To Turn Your Business Into A Media PowerhouseRobert Dempsey
 
Agile Teams as Innovation Teams
Agile Teams as Innovation TeamsAgile Teams as Innovation Teams
Agile Teams as Innovation TeamsRobert Dempsey
 
Introduction to kanban
Introduction to kanbanIntroduction to kanban
Introduction to kanbanRobert Dempsey
 
Get The **** Up And Market
Get The **** Up And MarketGet The **** Up And Market
Get The **** Up And MarketRobert Dempsey
 
Introduction To Inbound Marketing
Introduction To Inbound MarketingIntroduction To Inbound Marketing
Introduction To Inbound MarketingRobert Dempsey
 
Writing Agile Requirements
Writing  Agile  RequirementsWriting  Agile  Requirements
Writing Agile RequirementsRobert Dempsey
 
Introduction To Scrum For Managers
Introduction To Scrum For ManagersIntroduction To Scrum For Managers
Introduction To Scrum For ManagersRobert Dempsey
 
Introduction to Agile for Managers
Introduction to Agile for ManagersIntroduction to Agile for Managers
Introduction to Agile for ManagersRobert Dempsey
 

More from Robert Dempsey (20)

Analyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudAnalyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The Cloud
 
Practical Predictive Modeling in Python
Practical Predictive Modeling in PythonPractical Predictive Modeling in Python
Practical Predictive Modeling in Python
 
Creating Your First Predictive Model In Python
Creating Your First Predictive Model In PythonCreating Your First Predictive Model In Python
Creating Your First Predictive Model In Python
 
Growth Hacking 101
Growth Hacking 101Growth Hacking 101
Growth Hacking 101
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With Python
 
DC Python Intro Slides - Rob's Version
DC Python Intro Slides - Rob's VersionDC Python Intro Slides - Rob's Version
DC Python Intro Slides - Rob's Version
 
Content Marketing Strategy for 2013
Content Marketing Strategy for 2013Content Marketing Strategy for 2013
Content Marketing Strategy for 2013
 
Creating Lead-Generating Social Media Campaigns
Creating Lead-Generating Social Media CampaignsCreating Lead-Generating Social Media Campaigns
Creating Lead-Generating Social Media Campaigns
 
Goal Writing Workshop
Goal Writing WorkshopGoal Writing Workshop
Goal Writing Workshop
 
Google AdWords Introduction
Google AdWords IntroductionGoogle AdWords Introduction
Google AdWords Introduction
 
20 Tips For Freelance Success
20 Tips For Freelance Success20 Tips For Freelance Success
20 Tips For Freelance Success
 
How To Turn Your Business Into A Media Powerhouse
How To Turn Your Business Into A Media PowerhouseHow To Turn Your Business Into A Media Powerhouse
How To Turn Your Business Into A Media Powerhouse
 
Agile Teams as Innovation Teams
Agile Teams as Innovation TeamsAgile Teams as Innovation Teams
Agile Teams as Innovation Teams
 
Introduction to kanban
Introduction to kanbanIntroduction to kanban
Introduction to kanban
 
Get The **** Up And Market
Get The **** Up And MarketGet The **** Up And Market
Get The **** Up And Market
 
Introduction To Inbound Marketing
Introduction To Inbound MarketingIntroduction To Inbound Marketing
Introduction To Inbound Marketing
 
Writing Agile Requirements
Writing  Agile  RequirementsWriting  Agile  Requirements
Writing Agile Requirements
 
Twitter For Business
Twitter For BusinessTwitter For Business
Twitter For Business
 
Introduction To Scrum For Managers
Introduction To Scrum For ManagersIntroduction To Scrum For Managers
Introduction To Scrum For Managers
 
Introduction to Agile for Managers
Introduction to Agile for ManagersIntroduction to Agile for Managers
Introduction to Agile for Managers
 

Recently uploaded

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

Building Production ML Pipelines

  • 1. robertwdempsey.com Building a Production-Level Machine Learning Pipeline Robert Dempsey, CEO Atlantic Dominion Solutions
  • 2. robertwdempsey.com Production ML Pipelines Robert Dempsey 2 Entrepreneur, Software Engineer Books and online courses Lotus Guides, District Data Labs Atlantic Dominion Solutions, LLC Professional Author Instructor Owner
  • 3. robertwdempsey.com Production ML Pipelines We’ve mastered three jobs so you can focus on one - growing your business. 3
  • 4. robertwdempsey.com Production ML Pipelines The Three Jobs At Atlantic Dominion Solutions we perform three functions for our customers: Consulting: we assess and advise in the areas of technology, team and process to determine how machine learning can have the biggest impact on your business. Implementation: after a strategy session to determine the work you need we get to work using our proven methodology and begin delivering smarter applications. Training: continuous improvement requires continuous learning. We provide both on-premises and online training. 4
  • 5. robertwdempsey.com Production ML Pipelines Writing the Book Co-authoring the book Building Machine Learning Pipelines. Written for software developers and data scientists, Building Machine Learning Pipelines teaches the skills required to create and use the infrastructure needed to run modern intelligent systems. machinelearningpipelines.com 5
  • 6. robertwdempsey.com Production ML Pipelines6 What’s your biggest issue?
  • 7. robertwdempsey.com Production ML Pipelines7 Technology is LEAST important
  • 8. robertwdempsey.com Production ML Pipelines8 The REPORT Framework™
  • 9. robertwdempsey.com Production ML Pipelines REPORT Framework™ Risk Tolerance Expectations Product Operations Results Team 9
  • 10. robertwdempsey.com Production ML Pipelines Risk Tolerance Question: How risk averse are you? Some companies happily deploy beta and release candidate versions of cutting edge open source software. Others enjoy the freedom of open source and look for only mature applications. And yet a third category swear off open source all together and only buy software that comes with a license and a support contract. Where does your company sit on the risk aversion spectrum? Question: What are your non-technology risks? Technology aside, what happens if your project fails? Do you get fired? Does the entire team get fired? Do the naysayers get to say “I told you so” in a meeting? 10
  • 11. robertwdempsey.com Production ML Pipelines Expectations Question: What are the expectations around the project? Here are a few questions to get you started: • Non-Technical • How long do you think the project will take? How much do you expect it to cost? • What are others expecting the system will be able to do? • Technical • How much volume does the system need to be able to process? In what amount of time? • What level of downtime can you absorb? 11
  • 12. robertwdempsey.com Production ML Pipelines Product Question: What does the product roadmap say? At a minimum a bullet point list will help set the expectations of others, and allow you to make trade-offs as the project moves forward. It also helps you measure results - discussed later - on an incremental basis, which will help your team know if they are making progress, or not. Question: What’s the budget and estimated ROI? As with expectations and product roadmap, whether formalized or not, there is always, or should always be a budget as well as an estimated ROI. Write it down and use it as one of your metrics. 12
  • 13. robertwdempsey.com Production ML Pipelines Operations Question: Got DevOps? DevOps, sometimes called TechOps, is a group that manages and maintains the technology infrastructure of the organization. Just because you have a DevOps team doesn’t mean you want to add additional strain on them by firing up more servers. With cloud providers like AWS you still have to do some infrastructure support and maintenance. The larger your business the more support work there will be. 13
  • 14. robertwdempsey.com Production ML Pipelines Results Question: What does the end result look like? Here’s a very partial list of results we’ve seen measured: • The project was completed on X date by X time. • The project cost $X amount of money to complete. • The team worked no more than 40 hours each week to get the project done. • X, Y and Z features are in the product and have 90% automated test coverage. 14
  • 15. robertwdempsey.com Production ML Pipelines Team Question: Are the right people on the bus to get the project completed? Having the right people with the right skills, both hard and soft, can make or break a project. Question: Does each team member have the tools and support they need to be successful? • Does the team have the support of senior leadership? • Are they going to encounter a deluge of bureaucratic red tape that will slow their progress? • Are development and testing environments available? 15
  • 16. robertwdempsey.com Production ML Pipelines ML Pipeline Toolbox 16
  • 17. robertwdempsey.com Production ML Pipelines The “Standard” ML Pipeline 17 Collect Store Enrich Train / Apply Visualize Infrastructure
  • 18. robertwdempsey.com Production ML Pipelines Infrastructure • Servers • Amazon EC2 • Data center • Container Technologies • Docker • Amazon Elastic Container Service (ECS) 18
  • 19. robertwdempsey.com Production ML Pipelines Collect • Programming Languages • Python • Scala • Go • R • Pre-Built Tools • Pentaho Data Integration • Various web scraping tools 19
  • 20. robertwdempsey.com Production ML Pipelines Store • Elasticsearch • Apache Kafka • Redis • Cassandra • MongoDB • SQL • Amazon S3 • HDFS • Many others 20
  • 21. robertwdempsey.com Production ML Pipelines Enrich • Apache Storm • Apache Spark • Amazon Elastic MapReduce (EMR) • Apache Nifi • Airflow (Airbnb) 21
  • 22. robertwdempsey.com Production ML Pipelines Train / Apply • Python Libraries • Scikit-learn • Pandas • Spark Libraries • MLlib • Deep Learning • Tensorflow • PyTorch 22
  • 23. robertwdempsey.com Production ML Pipelines Visualize • Kibana • Grafana • Amazon Athena (for S3) • Flask • D3.js 23
  • 24. robertwdempsey.com Production ML Pipelines Machine Learning Pipeline Architectures 24
  • 25. robertwdempsey.com Production ML Pipelines Architecture 1 25 Agent File System Apache Spark File System Agent ES 1 2 3
  • 26. robertwdempsey.com Production ML Pipelines Architecture 1 Choices This pipeline was built at a company building a new platform using all leading-edge technologies, and was a temporary solution until another pipeline was built. • Risk Aversion: not an issue. • Expectations: the pipeline needed to be run in production and be able to handle the amount of data the company had in a timely fashion. • Product: this was a short-term solution to process data until the desired pipeline was ready to be deployed into production. 26
  • 27. robertwdempsey.com Production ML Pipelines Architecture 1 Choices • Operations: due to its simplicity and limited functionality, the solution became a one-server solution deployed by an engineer working in unison with an internal devops team member. • Results: the pipeline was deployed on time and was able to process all the data within the parameters • Team: after a consultant built the first version of the application an internal team member took over and deployed it into production. 27
  • 28. robertwdempsey.com Production ML Pipelines Architecture 2 28 Agent 1 2 3 Agent Agent ES S3 HDFS Apache Kafka Apache Storm
  • 29. robertwdempsey.com Production ML Pipelines Architecture 2 Choices This pipeline was built at a startup focused on data collection and was core to the product. • Risk Aversion: this was the second version of a previously developed and well proven pipeline so risk aversion was low. • Expectations: as a core product the pipeline was expected to be continuously evolving, able to be horizontally scaled, able to handle a growing amount of data, and have 100% uptime. • Product: the functionality built was in line with a product roadmap that was reviewed on a monthly basis. 29
  • 30. robertwdempsey.com Production ML Pipelines Architecture 2 Choices • Operations: an internal devops team managed the infrastructure while engineers were expected to support the associated applications and data processors • Results: the pipeline could be horizontally scaled, handled between 1-2TB of data per day, and had 99.9% uptime. • Team: the devops and engineering teams worked together to produce and support it. 30
  • 31. robertwdempsey.com Production ML Pipelines Architecture 3 31 Agent 1 2 3 Agent Agent Athena S3 S3 Apache Spark
  • 32. robertwdempsey.com Production ML Pipelines Architecture 3 Choices This pipeline was built at a company building a new platform using all leading-edge technologies, and was a temporary solution until another pipeline was built. • Risk Aversion: this system was mission critical for delivering data in real-time to customers. Failure was not an option so best in class practices needed to be implemented included using hosted solutions such as Databricks and S3. • Expectations: this system would scale as data collection efforts grew and would be extremely fault tolerant. 32
  • 33. robertwdempsey.com Production ML Pipelines Architecture 3 Choices • Product: this system would be extended to accommodate additional product offerings so flexibility was important. • Operations: this system was maintained by the engineers who built it as there no separate devops team. • Results: the system processed several TBs of data per hour (need to double check this) with minimal downtime. • Team: the team supporting the pipeline set up monitoring and alerting to ensure uptime and worked with other engineering groups to deconflict deployments that might impact the pipeline. 33
  • 34. robertwdempsey.com Production ML Pipelines Architecture 4 34 Agent 1 2 3 Agent Agent ES S3 HDFS Apache Kafka Apache Spark HBase
  • 35. robertwdempsey.com Production ML Pipelines Architecture 4 Choices This pipeline was built at a company building a new platform using all leading-edge technologies, and was a temporary solution until another pipeline was built. • Risk Aversion: this system supported a key customer and was being implemented as a means to resolve data loss and data discrepancies that had plagued a legacy system. • Expectations: this system would be resilient in the event of an outage so that no data would be lost. • Product: this system would ultimately be replaced by a more general system designed to support multiple customers, so it was considered extremely critical yet a one-off. 35
  • 36. robertwdempsey.com Production ML Pipelines Architecture 4 Choices • Operations: this system was maintained by the engineers who built it as at the time there was no technical operations team in place. • Results: the system processed hundreds of GBs of data per day with infrequent outages. • Team: once deployed, the team of developers who built this pipeline began work on incorporating its features into a more generalized stream processing platform. 36
  • 38. robertwdempsey.com Production ML Pipelines Free Guide robertwdempsey.com/machineryai 38
  • 39. robertwdempsey.com Production ML Pipelines Where to Find Me Website Lotus Guides LinkedIn Twitter Github 39 robertwdempsey.com lotusguides.com robertwdempsey rdempsey rdempsey
  • 40. robertwdempsey.com Production ML Pipelines Thank You! 40