With so many options to choose from how do you select the right technologies to use for your machine learning pipeline? Do you purchase bare metal and hire a devops team, install Spark on EC2 instances, use EMR and other AWS services, combine Spark and Elasticsearch?! View this talk to get a first-hand experience of building ML pipelines: what options were looked at, how the final solution was selected, the tradeoffs made and the final results.
2. robertwdempsey.com Production ML Pipelines
Robert Dempsey
2
Entrepreneur, Software Engineer
Books and online courses
Lotus Guides, District Data Labs
Atlantic Dominion Solutions, LLC
Professional
Author
Instructor
Owner
4. robertwdempsey.com Production ML Pipelines
The Three Jobs
At Atlantic Dominion Solutions we perform three functions for our
customers:
Consulting: we assess and advise in the areas of technology, team and
process to determine how machine learning can have the biggest impact on
your business.
Implementation: after a strategy session to determine the work you need we
get to work using our proven methodology and begin delivering smarter
applications.
Training: continuous improvement requires continuous learning. We provide
both on-premises and online training.
4
5. robertwdempsey.com Production ML Pipelines
Writing the Book
Co-authoring the book Building
Machine Learning Pipelines.
Written for software developers and
data scientists, Building Machine
Learning Pipelines teaches the skills
required to create and use the
infrastructure needed to run modern
intelligent systems.
machinelearningpipelines.com
5
10. robertwdempsey.com Production ML Pipelines
Risk Tolerance
Question: How risk averse are you?
Some companies happily deploy beta and release candidate versions of cutting
edge open source software. Others enjoy the freedom of open source and look
for only mature applications. And yet a third category swear off open source
all together and only buy software that comes with a license and a support
contract. Where does your company sit on the risk aversion spectrum?
Question: What are your non-technology risks?
Technology aside, what happens if your project fails? Do you get fired? Does
the entire team get fired? Do the naysayers get to say “I told you so” in a
meeting?
10
11. robertwdempsey.com Production ML Pipelines
Expectations
Question: What are the expectations around the project?
Here are a few questions to get you started:
• Non-Technical
• How long do you think the project will take? How much do you
expect it to cost?
• What are others expecting the system will be able to do?
• Technical
• How much volume does the system need to be able to process? In
what amount of time?
• What level of downtime can you absorb?
11
12. robertwdempsey.com Production ML Pipelines
Product
Question: What does the product roadmap say?
At a minimum a bullet point list will help set the expectations of others,
and allow you to make trade-offs as the project moves forward. It also
helps you measure results - discussed later - on an incremental basis,
which will help your team know if they are making progress, or not.
Question: What’s the budget and estimated ROI?
As with expectations and product roadmap, whether formalized or not,
there is always, or should always be a budget as well as an estimated
ROI. Write it down and use it as one of your metrics.
12
13. robertwdempsey.com Production ML Pipelines
Operations
Question: Got DevOps?
DevOps, sometimes called TechOps, is a group that manages
and maintains the technology infrastructure of the organization.
Just because you have a DevOps team doesn’t mean you want
to add additional strain on them by firing up more servers.
With cloud providers like AWS you still have to do some
infrastructure support and maintenance. The larger your
business the more support work there will be.
13
14. robertwdempsey.com Production ML Pipelines
Results
Question: What does the end result look like?
Here’s a very partial list of results we’ve seen measured:
• The project was completed on X date by X time.
• The project cost $X amount of money to complete.
• The team worked no more than 40 hours each week to get
the project done.
• X, Y and Z features are in the product and have 90%
automated test coverage.
14
15. robertwdempsey.com Production ML Pipelines
Team
Question: Are the right people on the bus to get the project completed?
Having the right people with the right skills, both hard and soft, can
make or break a project.
Question: Does each team member have the tools and support they
need to be successful?
• Does the team have the support of senior leadership?
• Are they going to encounter a deluge of bureaucratic red tape that
will slow their progress?
• Are development and testing environments available?
15
17. robertwdempsey.com Production ML Pipelines
The “Standard” ML Pipeline
17
Collect Store Enrich
Train /
Apply
Visualize
Infrastructure
18. robertwdempsey.com Production ML Pipelines
Infrastructure
• Servers
• Amazon EC2
• Data center
• Container Technologies
• Docker
• Amazon Elastic Container Service (ECS)
18
19. robertwdempsey.com Production ML Pipelines
Collect
• Programming Languages
• Python
• Scala
• Go
• R
• Pre-Built Tools
• Pentaho Data Integration
• Various web scraping tools
19
20. robertwdempsey.com Production ML Pipelines
Store
• Elasticsearch
• Apache Kafka
• Redis
• Cassandra
• MongoDB
• SQL
• Amazon S3
• HDFS
• Many others
20
26. robertwdempsey.com Production ML Pipelines
Architecture 1 Choices
This pipeline was built at a company building a new platform
using all leading-edge technologies, and was a temporary
solution until another pipeline was built.
• Risk Aversion: not an issue.
• Expectations: the pipeline needed to be run in production
and be able to handle the amount of data the company had
in a timely fashion.
• Product: this was a short-term solution to process data until
the desired pipeline was ready to be deployed into
production.
26
27. robertwdempsey.com Production ML Pipelines
Architecture 1 Choices
• Operations: due to its simplicity and limited functionality,
the solution became a one-server solution deployed by an
engineer working in unison with an internal devops team
member.
• Results: the pipeline was deployed on time and was able to
process all the data within the parameters
• Team: after a consultant built the first version of the
application an internal team member took over and
deployed it into production.
27
29. robertwdempsey.com Production ML Pipelines
Architecture 2 Choices
This pipeline was built at a startup focused on data collection
and was core to the product.
• Risk Aversion: this was the second version of a previously
developed and well proven pipeline so risk aversion was low.
• Expectations: as a core product the pipeline was expected to
be continuously evolving, able to be horizontally scaled, able
to handle a growing amount of data, and have 100% uptime.
• Product: the functionality built was in line with a product
roadmap that was reviewed on a monthly basis.
29
30. robertwdempsey.com Production ML Pipelines
Architecture 2 Choices
• Operations: an internal devops team managed the
infrastructure while engineers were expected to support the
associated applications and data processors
• Results: the pipeline could be horizontally scaled, handled
between 1-2TB of data per day, and had 99.9% uptime.
• Team: the devops and engineering teams worked together
to produce and support it.
30
32. robertwdempsey.com Production ML Pipelines
Architecture 3 Choices
This pipeline was built at a company building a new platform
using all leading-edge technologies, and was a temporary
solution until another pipeline was built.
• Risk Aversion: this system was mission critical for delivering
data in real-time to customers. Failure was not an option so
best in class practices needed to be implemented included
using hosted solutions such as Databricks and S3.
• Expectations: this system would scale as data collection
efforts grew and would be extremely fault tolerant.
32
33. robertwdempsey.com Production ML Pipelines
Architecture 3 Choices
• Product: this system would be extended to accommodate
additional product offerings so flexibility was important.
• Operations: this system was maintained by the engineers
who built it as there no separate devops team.
• Results: the system processed several TBs of data per hour
(need to double check this) with minimal downtime.
• Team: the team supporting the pipeline set up monitoring
and alerting to ensure uptime and worked with other
engineering groups to deconflict deployments that might
impact the pipeline.
33
35. robertwdempsey.com Production ML Pipelines
Architecture 4 Choices
This pipeline was built at a company building a new platform using all
leading-edge technologies, and was a temporary solution until another
pipeline was built.
• Risk Aversion: this system supported a key customer and was being
implemented as a means to resolve data loss and data discrepancies
that had plagued a legacy system.
• Expectations: this system would be resilient in the event of an outage
so that no data would be lost.
• Product: this system would ultimately be replaced by a more general
system designed to support multiple customers, so it was considered
extremely critical yet a one-off.
35
36. robertwdempsey.com Production ML Pipelines
Architecture 4 Choices
• Operations: this system was maintained by the engineers
who built it as at the time there was no technical operations
team in place.
• Results: the system processed hundreds of GBs of data per
day with infrequent outages.
• Team: once deployed, the team of developers who built this
pipeline began work on incorporating its features into a
more generalized stream processing platform.
36
39. robertwdempsey.com Production ML Pipelines
Where to Find Me
Website
Lotus Guides
LinkedIn
Twitter
Github
39
robertwdempsey.com
lotusguides.com
robertwdempsey
rdempsey
rdempsey