SlideShare a Scribd company logo
1 of 64
Download to read offline
MATS stack (MLFlow, Airflow, Tensorflow,
Spark) for cross-system orchestration of
machine learning pipelines
João Da Silva & Yury Kasimov
Intro
Yury Kasimov
Data engineer at Avast with background in
Machine Learning and Network security, tennis
player on even days, chess on odd days
Intro
Yury Kasimov
Data engineer at Avast with background in
Machine Learning and Network security, tennis
player on even days, chess on odd days
João Da Silva
Scala & FP enthusiast, Lead Data Engineer @avast,
DJ @sonuz, capoeirista and co-organizer of
Prague @functional_jvm meetup
Agenda
● Intro: The saga begins
● Problems: Clone wars
● Goals: Insidious plan
● Solutions: Spark of a rebellion
● Challenges: Technologies strike back
● Successes: A new hope
Avast
Avast is dedicated to creating a world
that provides safety and privacy for all,
no matter who you are, where you are,
or how you connect.
Intro: The saga begins
Intro: The saga begins
Intro: The saga begins
Intro: The saga begins
Intro: The saga begins
Problems: Clone wars
Problems: Clone wars
● A lot of duplicated effort between different teams
Problems: Clone wars
● A lot of duplicated effort between different teams
● No overview of different experiments in one place
Problems: Clone wars
● A lot of duplicated effort between different teams
● No overview of different experiments in one place
● No automated process for moving from experiments to production
Problems: Clone wars
● A lot of duplicated effort between different teams
● No overview of different experiments in one place
● No automated process for moving from experiments to production
● Scaling and monitoring of deployed models
Goals: Insidious plan
Goals:Insidiousplan
Goals:Insidiousplan
Goals:Insidiousplan
Goals:Insidiousplan
Goals: Insidious plan
● Define a common ground for data science team and data engineering
team
● Structured, fast and reproducible experiments
● Cross-system orchestration/scheduling
● Automated model serving
Solutions: Spark of a rebellion
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
ML Project
Lifecycle, Design
and Structure
Solutions: Spark of a rebellion
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Data: Data Engineering Stages
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Data: Data Engineering Stages
○ Model: Machine Learning Stages
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Data: Data Engineering Stages
○ Model: Machine Learning Stages
○ Code: CI/CD
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Standard repository structure
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Standard repository structure
○ Standard ML Development at Avast
Solutions: Spark of a rebellion
● ML Project Lifecycle, Design and Structure
○ Standard repository structure
○ Standard ML Development at Avast
○ Standard Tooling
Solutions: MATS Stack
● MLFlow
● Airflow
● Tensorflow
● Spark
MATS
Solutions: Spark of a rebellion
● MLFlow for experiment tracking and Model management
Solutions: Spark of a rebellion
● MLFlow for experiment tracking and Model management
○ Open Source ML Platform
○ Easy experiment tracking
○ Model packaging, storage, version management and deployment
○ Rich API and CLI which can be used by any language or ML Library
Solutions: Spark of a rebellion
Solutions: Spark of a rebellion
Solutions: Spark of a rebellion
Solutions: Spark of a rebellion
Solutions: Spark of a rebellion
● Airflow for cross-system scheduling
Solutions: Spark of a rebellion
● Airflow for cross-system scheduling
○ Message driven architecture
○ It’s extensible, it’s Python ;-)
○ Templating, default_args and connections removes boilerplate
Solutions: Spark of a rebellion
Kubernetes
GPU/Tensorflow
Yarn/Spark
data_dump
Spark / HDFS
Solutions: Spark of a rebellion
Kubernetes
GPU/Tensorflow
Yarn/Spark
data_dump
Spark / HDFS
Solutions: Spark of a rebellion
Kubernetes
GPU/Tensorflow
Yarn/Spark
data_dump
Spark / HDFS
Solutions: Spark of a rebellion
Kubernetes
GPU/Tensorflow
Yarn/Spark
data_dump
Spark / HDFS
Solutions: Spark of a rebellion
Spark / HDFS
Kubernetes
GPU/Tensorflow
Yarn/Spark
data_dump
Solutions: Spark of a rebellion
● Tensorflow for high performance training
Solutions: Spark of a rebellion
● Tensorflow for high performance training
○ TFRecords
○ TensorFlow Serving
○ Rich ecosystem
Solutions: Spark of a rebellion
● Spark for distributed big data processing
Solutions: Spark of a rebellion
● Spark for distributed big data processing
○ Extensive usage and knowledge at Avast
○ Really, Spark it’s king for big data processing ;-)
Solutions: MATS Stack
MATS
Challenges: Technologies strike back
Challenges: Technologies strike back
▪ Lack of event based notifications for model registry changes
▪ https://github.com/mlflow/mlflow/issues/2740
Challenges: Technologies strike back
▪ Lack of event based notifications for model registry changes
▪ https://github.com/mlflow/mlflow/issues/3015
▪ Lack of support Tensorflow ModelServer for serving
▪ MLFlow does not support tensorflow model logging in saved_model format
▪ https://github.com/mlflow/mlflow/issues/2740
Challenges: Technologies strike back
▪ Lack of event based notifications for model registry changes
▪ https://github.com/mlflow/mlflow/issues/2740
▪ Lack of support Tensorflow ModelServer for serving
▪ MLFlow does not support tensorflow model logging in saved_model format
▪ https://github.com/mlflow/mlflow/issues/2740
▪ Airflow deployment, security and quirks
Successes: A new hope
Successes: A new hope
● Delivered Angler ML pipeline for url phishing classifier
Successes: A new hope
● Delivered Angler ML pipeline for url phishing classifier
● Established processes for faster productization of ML Models
Successes: A new hope
● Delivered Angler ML pipeline for url phishing classifier
● Established processes for faster productization of ML Models
● Interest from other teams to adopt our solution
Successes: A new hope
● Delivered Angler ML pipeline for url phishing classifier
● Established processes for faster productization of ML Models
● Interest from other teams to adopt our solutions
● MATS Stack
We would like to thank
● Tomas Trnka – our first “customer” and the creator of Angler projects
● Vojtech Tuma – our manager for guiding and supporting us
● Our colleagues for their help and suggestions
● All of you that attended this presentation
Reach out
Yury Kasimov
@LunaticInHall
João Da Silva
@imjsilva
Q&A
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

More Related Content

What's hot

Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Ed Fernandez
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Mihai Criveti
 

What's hot (20)

Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnb’s Machine Learning Data Management Platform with Nikhil Simha...
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem Revisited
 
Seldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon: Deploying Models at Scale
Seldon: Deploying Models at Scale
 
Data Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflowData Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflow
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
 
ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox ML Infrastracture @ Dropbox
ML Infrastracture @ Dropbox
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and Kubernetes
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache KafkaTop 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
 
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionMLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
Reproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorchReproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorch
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 

Similar to MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestration of Machine Learning Pipelines

Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
Nikhil Garg
 

Similar to MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestration of Machine Learning Pipelines (20)

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkus
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
AML_service.pptx
AML_service.pptxAML_service.pptx
AML_service.pptx
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad Data
 
Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorch
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflow
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
FlinkML - Big data application meetup
FlinkML - Big data application meetupFlinkML - Big data application meetup
FlinkML - Big data application meetup
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 

Recently uploaded (20)

Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 

MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestration of Machine Learning Pipelines

  • 1. MATS stack (MLFlow, Airflow, Tensorflow, Spark) for cross-system orchestration of machine learning pipelines João Da Silva & Yury Kasimov
  • 2. Intro Yury Kasimov Data engineer at Avast with background in Machine Learning and Network security, tennis player on even days, chess on odd days
  • 3. Intro Yury Kasimov Data engineer at Avast with background in Machine Learning and Network security, tennis player on even days, chess on odd days João Da Silva Scala & FP enthusiast, Lead Data Engineer @avast, DJ @sonuz, capoeirista and co-organizer of Prague @functional_jvm meetup
  • 4. Agenda ● Intro: The saga begins ● Problems: Clone wars ● Goals: Insidious plan ● Solutions: Spark of a rebellion ● Challenges: Technologies strike back ● Successes: A new hope
  • 5. Avast Avast is dedicated to creating a world that provides safety and privacy for all, no matter who you are, where you are, or how you connect.
  • 10. Intro: The saga begins
  • 12. Problems: Clone wars ● A lot of duplicated effort between different teams
  • 13. Problems: Clone wars ● A lot of duplicated effort between different teams ● No overview of different experiments in one place
  • 14. Problems: Clone wars ● A lot of duplicated effort between different teams ● No overview of different experiments in one place ● No automated process for moving from experiments to production
  • 15. Problems: Clone wars ● A lot of duplicated effort between different teams ● No overview of different experiments in one place ● No automated process for moving from experiments to production ● Scaling and monitoring of deployed models
  • 16.
  • 22. Goals: Insidious plan ● Define a common ground for data science team and data engineering team ● Structured, fast and reproducible experiments ● Cross-system orchestration/scheduling ● Automated model serving
  • 23. Solutions: Spark of a rebellion
  • 24. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure
  • 25. ML Project Lifecycle, Design and Structure Solutions: Spark of a rebellion
  • 26. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure
  • 27. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure ○ Data: Data Engineering Stages
  • 28. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure ○ Data: Data Engineering Stages ○ Model: Machine Learning Stages
  • 29. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure ○ Data: Data Engineering Stages ○ Model: Machine Learning Stages ○ Code: CI/CD
  • 30. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure ○ Standard repository structure
  • 31. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure ○ Standard repository structure ○ Standard ML Development at Avast
  • 32. Solutions: Spark of a rebellion ● ML Project Lifecycle, Design and Structure ○ Standard repository structure ○ Standard ML Development at Avast ○ Standard Tooling
  • 33. Solutions: MATS Stack ● MLFlow ● Airflow ● Tensorflow ● Spark MATS
  • 34. Solutions: Spark of a rebellion ● MLFlow for experiment tracking and Model management
  • 35. Solutions: Spark of a rebellion ● MLFlow for experiment tracking and Model management ○ Open Source ML Platform ○ Easy experiment tracking ○ Model packaging, storage, version management and deployment ○ Rich API and CLI which can be used by any language or ML Library
  • 36. Solutions: Spark of a rebellion
  • 37. Solutions: Spark of a rebellion
  • 38. Solutions: Spark of a rebellion
  • 39. Solutions: Spark of a rebellion
  • 40. Solutions: Spark of a rebellion ● Airflow for cross-system scheduling
  • 41. Solutions: Spark of a rebellion ● Airflow for cross-system scheduling ○ Message driven architecture ○ It’s extensible, it’s Python ;-) ○ Templating, default_args and connections removes boilerplate
  • 42. Solutions: Spark of a rebellion Kubernetes GPU/Tensorflow Yarn/Spark data_dump Spark / HDFS
  • 43. Solutions: Spark of a rebellion Kubernetes GPU/Tensorflow Yarn/Spark data_dump Spark / HDFS
  • 44. Solutions: Spark of a rebellion Kubernetes GPU/Tensorflow Yarn/Spark data_dump Spark / HDFS
  • 45. Solutions: Spark of a rebellion Kubernetes GPU/Tensorflow Yarn/Spark data_dump Spark / HDFS
  • 46. Solutions: Spark of a rebellion Spark / HDFS Kubernetes GPU/Tensorflow Yarn/Spark data_dump
  • 47. Solutions: Spark of a rebellion ● Tensorflow for high performance training
  • 48. Solutions: Spark of a rebellion ● Tensorflow for high performance training ○ TFRecords ○ TensorFlow Serving ○ Rich ecosystem
  • 49. Solutions: Spark of a rebellion ● Spark for distributed big data processing
  • 50. Solutions: Spark of a rebellion ● Spark for distributed big data processing ○ Extensive usage and knowledge at Avast ○ Really, Spark it’s king for big data processing ;-)
  • 53. Challenges: Technologies strike back ▪ Lack of event based notifications for model registry changes ▪ https://github.com/mlflow/mlflow/issues/2740
  • 54. Challenges: Technologies strike back ▪ Lack of event based notifications for model registry changes ▪ https://github.com/mlflow/mlflow/issues/3015 ▪ Lack of support Tensorflow ModelServer for serving ▪ MLFlow does not support tensorflow model logging in saved_model format ▪ https://github.com/mlflow/mlflow/issues/2740
  • 55. Challenges: Technologies strike back ▪ Lack of event based notifications for model registry changes ▪ https://github.com/mlflow/mlflow/issues/2740 ▪ Lack of support Tensorflow ModelServer for serving ▪ MLFlow does not support tensorflow model logging in saved_model format ▪ https://github.com/mlflow/mlflow/issues/2740 ▪ Airflow deployment, security and quirks
  • 57. Successes: A new hope ● Delivered Angler ML pipeline for url phishing classifier
  • 58. Successes: A new hope ● Delivered Angler ML pipeline for url phishing classifier ● Established processes for faster productization of ML Models
  • 59. Successes: A new hope ● Delivered Angler ML pipeline for url phishing classifier ● Established processes for faster productization of ML Models ● Interest from other teams to adopt our solution
  • 60. Successes: A new hope ● Delivered Angler ML pipeline for url phishing classifier ● Established processes for faster productization of ML Models ● Interest from other teams to adopt our solutions ● MATS Stack
  • 61. We would like to thank ● Tomas Trnka – our first “customer” and the creator of Angler projects ● Vojtech Tuma – our manager for guiding and supporting us ● Our colleagues for their help and suggestions ● All of you that attended this presentation
  • 63. Q&A
  • 64. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.