Building Production ML Pipelines

robertwdempsey.com
Building a
Production-Level
Machine Learning Pipeline
Robert Dempsey, CEO
Atlantic Dominion Solutions

robertwdempsey.com Production ML Pipelines
Robert Dempsey
2
Entrepreneur, Software Engineer
Books and online courses
Lotus Guides, District Data Labs
Atlantic Dominion Solutions, LLC
Professional
Author
Instructor
Owner

We’ve mastered three jobs so you can
focus on one - growing your business.
3

The Three Jobs
At Atlantic Dominion Solutions we perform three functions for our
customers:
Consulting: we assess and advise in the areas of technology, team and
process to determine how machine learning can have the biggest impact on
your business.
Implementation: after a strategy session to determine the work you need we
get to work using our proven methodology and begin delivering smarter
applications.
Training: continuous improvement requires continuous learning. We provide
both on-premises and online training.
4

Writing the Book
Co-authoring the book Building
Machine Learning Pipelines.
Written for software developers and
data scientists, Building Machine
Learning Pipelines teaches the skills
required to create and use the
infrastructure needed to run modern
intelligent systems.
machinelearningpipelines.com
5

robertwdempsey.com Production ML Pipelines6
What’s your biggest issue?

Technology is LEAST important

The REPORT Framework™

REPORT Framework™
Risk Tolerance
Expectations
Product
Operations
Results
Team
9

Risk Tolerance
Question: How risk averse are you?
Some companies happily deploy beta and release candidate versions of cutting
edge open source software. Others enjoy the freedom of open source and look
for only mature applications. And yet a third category swear off open source
all together and only buy software that comes with a license and a support
contract. Where does your company sit on the risk aversion spectrum?
Question: What are your non-technology risks?
Technology aside, what happens if your project fails? Do you get ﬁred? Does
the entire team get ﬁred? Do the naysayers get to say “I told you so” in a
meeting?
10

Expectations
Question: What are the expectations around the project?
Here are a few questions to get you started:
• Non-Technical
• How long do you think the project will take? How much do you
expect it to cost?
• What are others expecting the system will be able to do?
• Technical
• How much volume does the system need to be able to process? In
what amount of time?
• What level of downtime can you absorb?
11

Product
Question: What does the product roadmap say?
At a minimum a bullet point list will help set the expectations of others,
and allow you to make trade-offs as the project moves forward. It also
helps you measure results - discussed later - on an incremental basis,
which will help your team know if they are making progress, or not.
Question: What’s the budget and estimated ROI?
As with expectations and product roadmap, whether formalized or not,
there is always, or should always be a budget as well as an estimated
ROI. Write it down and use it as one of your metrics.
12

Operations
Question: Got DevOps?
DevOps, sometimes called TechOps, is a group that manages
and maintains the technology infrastructure of the organization.
Just because you have a DevOps team doesn’t mean you want
to add additional strain on them by ﬁring up more servers.
With cloud providers like AWS you still have to do some
infrastructure support and maintenance. The larger your
business the more support work there will be.
13

Results
Question: What does the end result look like?
Here’s a very partial list of results we’ve seen measured:
• The project was completed on X date by X time.
• The project cost $X amount of money to complete.
• The team worked no more than 40 hours each week to get
the project done.
• X, Y and Z features are in the product and have 90%
automated test coverage.
14

Team
Question: Are the right people on the bus to get the project completed?
Having the right people with the right skills, both hard and soft, can
make or break a project.
Question: Does each team member have the tools and support they
need to be successful?
• Does the team have the support of senior leadership?
• Are they going to encounter a deluge of bureaucratic red tape that
will slow their progress?
• Are development and testing environments available?
15

ML Pipeline
Toolbox
16

The “Standard” ML Pipeline
17
Collect Store Enrich
Train /
Apply
Visualize
Infrastructure

Infrastructure
• Servers
• Amazon EC2
• Data center
• Container Technologies
• Docker
• Amazon Elastic Container Service (ECS)
18

Collect
• Programming Languages
• Python
• Scala
• Go
• R
• Pre-Built Tools
• Pentaho Data Integration
• Various web scraping tools
19

Store
• Elasticsearch
• Apache Kafka
• Redis
• Cassandra
• MongoDB
• SQL
• Amazon S3
• HDFS
• Many others
20

Enrich
• Apache Storm
• Apache Spark
• Amazon Elastic MapReduce (EMR)
• Apache Niﬁ
• Airﬂow (Airbnb)
21

Train / Apply
• Python Libraries
• Scikit-learn
• Pandas
• Spark Libraries
• MLlib
• Deep Learning
• Tensorﬂow
• PyTorch
22

Visualize
• Kibana
• Grafana
• Amazon Athena (for S3)
• Flask
• D3.js
23

Machine Learning
Pipeline Architectures
24

Architecture 1
25
Agent
File
System
Apache
Spark
File
System
Agent ES
1 2 3

Architecture 1 Choices
This pipeline was built at a company building a new platform
using all leading-edge technologies, and was a temporary
solution until another pipeline was built.
• Risk Aversion: not an issue.
• Expectations: the pipeline needed to be run in production
and be able to handle the amount of data the company had
in a timely fashion.
• Product: this was a short-term solution to process data until
the desired pipeline was ready to be deployed into
production.
26

• Operations: due to its simplicity and limited functionality,
the solution became a one-server solution deployed by an
engineer working in unison with an internal devops team
member.
• Results: the pipeline was deployed on time and was able to
process all the data within the parameters
• Team: after a consultant built the ﬁrst version of the
application an internal team member took over and
deployed it into production.
27

Architecture 2
28
Agent
1 2 3
Agent
Agent
ES
S3
HDFS
Apache
Kafka
Apache
Storm

This pipeline was built at a startup focused on data collection
and was core to the product.
• Risk Aversion: this was the second version of a previously
developed and well proven pipeline so risk aversion was low.
• Expectations: as a core product the pipeline was expected to
be continuously evolving, able to be horizontally scaled, able
to handle a growing amount of data, and have 100% uptime.
• Product: the functionality built was in line with a product
roadmap that was reviewed on a monthly basis.
29

• Operations: an internal devops team managed the
infrastructure while engineers were expected to support the
associated applications and data processors
• Results: the pipeline could be horizontally scaled, handled
between 1-2TB of data per day, and had 99.9% uptime.
• Team: the devops and engineering teams worked together
to produce and support it.
30

Architecture 3
31
Agent
1 2 3
Agent
Agent
Athena
S3
S3
Apache
Spark

This pipeline was built at a company building a new platform
using all leading-edge technologies, and was a temporary
solution until another pipeline was built.
• Risk Aversion: this system was mission critical for delivering
data in real-time to customers. Failure was not an option so
best in class practices needed to be implemented included
using hosted solutions such as Databricks and S3.
• Expectations: this system would scale as data collection
efforts grew and would be extremely fault tolerant.
32

• Product: this system would be extended to accommodate
additional product offerings so ﬂexibility was important.
• Operations: this system was maintained by the engineers
who built it as there no separate devops team.
• Results: the system processed several TBs of data per hour
(need to double check this) with minimal downtime.
• Team: the team supporting the pipeline set up monitoring
and alerting to ensure uptime and worked with other
engineering groups to deconﬂict deployments that might
impact the pipeline.
33

Architecture 4
34
Agent
1 2 3
Agent
Agent
ES
S3
HDFS
Apache
Kafka
Apache
Spark
HBase

This pipeline was built at a company building a new platform using all
leading-edge technologies, and was a temporary solution until another
pipeline was built.
• Risk Aversion: this system supported a key customer and was being
implemented as a means to resolve data loss and data discrepancies
that had plagued a legacy system.
• Expectations: this system would be resilient in the event of an outage
so that no data would be lost.
• Product: this system would ultimately be replaced by a more general
system designed to support multiple customers, so it was considered
extremely critical yet a one-off.
35

• Operations: this system was maintained by the engineers
who built it as at the time there was no technical operations
team in place.
• Results: the system processed hundreds of GBs of data per
day with infrequent outages.
• Team: once deployed, the team of developers who built this
pipeline began work on incorporating its features into a
more generalized stream processing platform.
36

Q&A
37

Free Guide
robertwdempsey.com/machineryai
38

Where to Find Me
Website
Lotus Guides
LinkedIn
Twitter
Github
39
robertwdempsey.com
lotusguides.com
robertwdempsey
rdempsey
rdempsey

Thank You!
40

Building Production ML Pipelines

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Building Production ML Pipelines

Similar to Building Production ML Pipelines (20)

More from Robert Dempsey

More from Robert Dempsey (20)

Recently uploaded

Recently uploaded (20)

Building Production ML Pipelines