SlideShare a Scribd company logo
1 of 16
Download to read offline
CI/CD Templates: Continuous
Delivery of ML-Enabled Data
Pipelines on Databricks
Michael Shtelma, Sr. Solutions Architect
Ivan Trusov, Solutions Architect
Agenda
The Challenges of implementing
CI/CD for ML pipelines
The CI/CD challenges forcing ML teams to choose
between Databricks notebooks or local IDEs
Introducing DatabricksLabs
CI/CD Templates
How CI/CD Templates solves ML team production
challenges
Demo and Next Steps
Problem:
Organisations are struggling to get Business to start using
their models to drive additional revenue
Cause:
Due to complexity of ML lifecycle only few models end up
in production and drive additional revenue for business.
Most of them are either delayed or discontinued during
different ML Project stages
It is challenging for organizations to
gain value from ML due to complexity of
the ML lifecycle
What challenges do ML teams
face when then try to
implement CD4ML?
ML teams struggle to combine traditional CI/CD
tools with Databricks notebooks
1. Benefits to Databricks notebooks
Easy to use
Scalable
Provides access to ML tools such as mlflow for model logging and serving
2. Challenges
Non-trivial to hook into traditional software development tools such as CI tools or local IDEs.
3. Result
Teams find themselves choosing between
using traditional IDE based workflows but struggling to test and deploy at scale or
using Databricks notebooks or other cloud notebooks but then struggling to ensure
testing and deployment reliability via CI/CD pipelines.
What’s the solution?
CI/CD Templates gives you the benefits of
traditional CICD workflows and the scale of
databricks clusters
CI/CD Templates allows you to
● create a production pipeline via template in a few steps
● that automatically hooks to github actions and
● runs tests and deployments on databricks upon git commit or
whatever trigger you define and
● gives you a test success status directly in github so you know if your
commit broke the build
A scalable CI/CD pipeline in 5 easy steps
1. Install and customize with a single command
2. Create a new github repo containing your databricks host
and token secrets
3. Initialize git in your repo and commit the code.
4. Push your new cicd templates project to the repo. Your tests will
start running automatically on Databricks. Upon your tests’ success
or failure you will get a green checkmark or red x next to your commit
status.
5. You’re done! You now have a fully scalable CICD pipeline.
1
2
3
4
5
Project structure
1. Python package where the logic of the project will be developed.
Your models and pipelines will be developed here.
2. Configuration where you can configure define Databricks jobs
which can run pipelines developed in python package
3. Tests directory where local unit tests and integration tests will be
developed
1
2
3
CI/CD Templates execute tests and deployments
directly on databricks while storing packages, model
logging and other artifacts in Mlflow
CI/CD Templates - now powered by dbx
With dbx you can:
● customize project structure and specify it during deployments
● use new CI tools easily (PRs are welcome!)
● run custom data pipelines pipelines directly from IDE on interactive clusters
Push Flow
Release Flow
Demo:
CI/CD Templates
Summary
The Challenges of implementing
CD4ML
The CI/CD challenges forcing ML teams to choose
between Databricks notebooks or local IDEs
Introducing DatabricksLabs
CI/CD Templates
How CI/CD Templates solves ML team production
challenges
Next Steps
Search DatabricksLabs cicd-templates or go
directly to
https://github.com/databrickslabs/cicd-templates
to get started
michael.shtelma@databricks.com
ivan.trusov@databricks.com
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

More Related Content

What's hot

Optimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j GraphOptimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j GraphNeo4j
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleDatabricks
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOpsDatabricks
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleDatabricks
 
Introducing MLflow for End-to-End Machine Learning on Databricks
Introducing MLflow for End-to-End Machine Learning on DatabricksIntroducing MLflow for End-to-End Machine Learning on Databricks
Introducing MLflow for End-to-End Machine Learning on DatabricksDatabricks
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
 
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Training Week: Introduction to Neo4j 2022
Training Week: Introduction to Neo4j 2022Training Week: Introduction to Neo4j 2022
Training Week: Introduction to Neo4j 2022Neo4j
 
Machine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeMachine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeDatabricks
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
MDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large EnterprisesMDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large EnterprisesMark Schoeppel
 
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4jNeo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4jNeo4j
 
Advanced Analytics Governance - Effective Model Management and Stewardship
Advanced Analytics Governance - Effective Model Management and StewardshipAdvanced Analytics Governance - Effective Model Management and Stewardship
Advanced Analytics Governance - Effective Model Management and StewardshipDATAVERSITY
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Liberating data with Talend Data Catalog
Liberating data with Talend Data CatalogLiberating data with Talend Data Catalog
Liberating data with Talend Data CatalogJean-Michel Franco
 
Neo4j y GenAI
Neo4j y GenAI Neo4j y GenAI
Neo4j y GenAI Neo4j
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
warner-DP-203-slides.pptx
warner-DP-203-slides.pptxwarner-DP-203-slides.pptx
warner-DP-203-slides.pptxHibaB2
 

What's hot (20)

Optimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j GraphOptimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j Graph
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at Scale
 
Introducing MLflow for End-to-End Machine Learning on Databricks
Introducing MLflow for End-to-End Machine Learning on DatabricksIntroducing MLflow for End-to-End Machine Learning on Databricks
Introducing MLflow for End-to-End Machine Learning on Databricks
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Training Week: Introduction to Neo4j 2022
Training Week: Introduction to Neo4j 2022Training Week: Introduction to Neo4j 2022
Training Week: Introduction to Neo4j 2022
 
Machine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeMachine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta Lake
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
MDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large EnterprisesMDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large Enterprises
 
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4jNeo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
 
Advanced Analytics Governance - Effective Model Management and Stewardship
Advanced Analytics Governance - Effective Model Management and StewardshipAdvanced Analytics Governance - Effective Model Management and Stewardship
Advanced Analytics Governance - Effective Model Management and Stewardship
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Liberating data with Talend Data Catalog
Liberating data with Talend Data CatalogLiberating data with Talend Data Catalog
Liberating data with Talend Data Catalog
 
Neo4j y GenAI
Neo4j y GenAI Neo4j y GenAI
Neo4j y GenAI
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
warner-DP-203-slides.pptx
warner-DP-203-slides.pptxwarner-DP-203-slides.pptx
warner-DP-203-slides.pptx
 

Similar to CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks

Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Databricks
 
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflowContinuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflowDatabricks
 
Continuous Integration & Continuous Delivery
Continuous Integration & Continuous DeliveryContinuous Integration & Continuous Delivery
Continuous Integration & Continuous DeliveryDatabricks
 
Don't Repeat Yourself - An Introduction to Agile SSIS Development (24 Hours o...
Don't Repeat Yourself - An Introduction to Agile SSIS Development (24 Hours o...Don't Repeat Yourself - An Introduction to Agile SSIS Development (24 Hours o...
Don't Repeat Yourself - An Introduction to Agile SSIS Development (24 Hours o...Cathrine Wilhelmsen
 
Использование AzureDevOps при разработке микросервисных приложений
Использование AzureDevOps при разработке микросервисных приложенийИспользование AzureDevOps при разработке микросервисных приложений
Использование AzureDevOps при разработке микросервисных приложенийVitebsk Miniq
 
Development workflow guide for building docker apps
Development workflow guide for building docker appsDevelopment workflow guide for building docker apps
Development workflow guide for building docker appsAbdul Khan
 
Development workflow guide for building docker apps
Development workflow guide for building docker appsDevelopment workflow guide for building docker apps
Development workflow guide for building docker appsAbdul Khan
 
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with ConcourseContinuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with ConcourseVMware Tanzu
 
Ship code like a keptn
Ship code like a keptnShip code like a keptn
Ship code like a keptnRob Jahn
 
Continuous Integration for Oracle Database Development
Continuous Integration for Oracle Database DevelopmentContinuous Integration for Oracle Database Development
Continuous Integration for Oracle Database DevelopmentVladimir Bakhov
 
[20200720]cloud native develoment - Nelson Lin
[20200720]cloud native develoment - Nelson Lin[20200720]cloud native develoment - Nelson Lin
[20200720]cloud native develoment - Nelson LinHanLing Shen
 
Docs as Code: Publishing Processes for API Experiences
Docs as Code: Publishing Processes for API ExperiencesDocs as Code: Publishing Processes for API Experiences
Docs as Code: Publishing Processes for API ExperiencesAnne Gentle
 
Webcast Presentation: Be lean. Be agile. Work together with DevOps Services (...
Webcast Presentation: Be lean. Be agile. Work together with DevOps Services (...Webcast Presentation: Be lean. Be agile. Work together with DevOps Services (...
Webcast Presentation: Be lean. Be agile. Work together with DevOps Services (...GRUC
 
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...Edge AI and Vision Alliance
 
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...danielschulz2005
 
Data Modeling Comparison: Tableau, Cognos and Power BI
Data Modeling Comparison: Tableau, Cognos and Power BIData Modeling Comparison: Tableau, Cognos and Power BI
Data Modeling Comparison: Tableau, Cognos and Power BISenturus
 

Similar to CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks (20)

Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
 
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflowContinuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
 
Continuous Integration & Continuous Delivery
Continuous Integration & Continuous DeliveryContinuous Integration & Continuous Delivery
Continuous Integration & Continuous Delivery
 
Ds for finance day 4
Ds for finance day 4Ds for finance day 4
Ds for finance day 4
 
Don't Repeat Yourself - An Introduction to Agile SSIS Development (24 Hours o...
Don't Repeat Yourself - An Introduction to Agile SSIS Development (24 Hours o...Don't Repeat Yourself - An Introduction to Agile SSIS Development (24 Hours o...
Don't Repeat Yourself - An Introduction to Agile SSIS Development (24 Hours o...
 
Использование AzureDevOps при разработке микросервисных приложений
Использование AzureDevOps при разработке микросервисных приложенийИспользование AzureDevOps при разработке микросервисных приложений
Использование AzureDevOps при разработке микросервисных приложений
 
Development workflow guide for building docker apps
Development workflow guide for building docker appsDevelopment workflow guide for building docker apps
Development workflow guide for building docker apps
 
Development workflow guide for building docker apps
Development workflow guide for building docker appsDevelopment workflow guide for building docker apps
Development workflow guide for building docker apps
 
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with ConcourseContinuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
 
Ship code like a keptn
Ship code like a keptnShip code like a keptn
Ship code like a keptn
 
Continuous Integration for Oracle Database Development
Continuous Integration for Oracle Database DevelopmentContinuous Integration for Oracle Database Development
Continuous Integration for Oracle Database Development
 
[20200720]cloud native develoment - Nelson Lin
[20200720]cloud native develoment - Nelson Lin[20200720]cloud native develoment - Nelson Lin
[20200720]cloud native develoment - Nelson Lin
 
SDLC Modernization
SDLC ModernizationSDLC Modernization
SDLC Modernization
 
Gitops Hands On
Gitops Hands OnGitops Hands On
Gitops Hands On
 
Docs as Code: Publishing Processes for API Experiences
Docs as Code: Publishing Processes for API ExperiencesDocs as Code: Publishing Processes for API Experiences
Docs as Code: Publishing Processes for API Experiences
 
Webcast Presentation: Be lean. Be agile. Work together with DevOps Services (...
Webcast Presentation: Be lean. Be agile. Work together with DevOps Services (...Webcast Presentation: Be lean. Be agile. Work together with DevOps Services (...
Webcast Presentation: Be lean. Be agile. Work together with DevOps Services (...
 
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
 
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
 
DevOps: Age Of CI/CD
DevOps: Age Of CI/CDDevOps: Age Of CI/CD
DevOps: Age Of CI/CD
 
Data Modeling Comparison: Tableau, Cognos and Power BI
Data Modeling Comparison: Tableau, Cognos and Power BIData Modeling Comparison: Tableau, Cognos and Power BI
Data Modeling Comparison: Tableau, Cognos and Power BI
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 

Recently uploaded (20)

Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 

CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks

  • 1. CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks Michael Shtelma, Sr. Solutions Architect Ivan Trusov, Solutions Architect
  • 2. Agenda The Challenges of implementing CI/CD for ML pipelines The CI/CD challenges forcing ML teams to choose between Databricks notebooks or local IDEs Introducing DatabricksLabs CI/CD Templates How CI/CD Templates solves ML team production challenges Demo and Next Steps
  • 3. Problem: Organisations are struggling to get Business to start using their models to drive additional revenue Cause: Due to complexity of ML lifecycle only few models end up in production and drive additional revenue for business. Most of them are either delayed or discontinued during different ML Project stages It is challenging for organizations to gain value from ML due to complexity of the ML lifecycle
  • 4. What challenges do ML teams face when then try to implement CD4ML?
  • 5. ML teams struggle to combine traditional CI/CD tools with Databricks notebooks 1. Benefits to Databricks notebooks Easy to use Scalable Provides access to ML tools such as mlflow for model logging and serving 2. Challenges Non-trivial to hook into traditional software development tools such as CI tools or local IDEs. 3. Result Teams find themselves choosing between using traditional IDE based workflows but struggling to test and deploy at scale or using Databricks notebooks or other cloud notebooks but then struggling to ensure testing and deployment reliability via CI/CD pipelines.
  • 7. CI/CD Templates gives you the benefits of traditional CICD workflows and the scale of databricks clusters CI/CD Templates allows you to ● create a production pipeline via template in a few steps ● that automatically hooks to github actions and ● runs tests and deployments on databricks upon git commit or whatever trigger you define and ● gives you a test success status directly in github so you know if your commit broke the build
  • 8. A scalable CI/CD pipeline in 5 easy steps 1. Install and customize with a single command 2. Create a new github repo containing your databricks host and token secrets 3. Initialize git in your repo and commit the code. 4. Push your new cicd templates project to the repo. Your tests will start running automatically on Databricks. Upon your tests’ success or failure you will get a green checkmark or red x next to your commit status. 5. You’re done! You now have a fully scalable CICD pipeline. 1 2 3 4 5
  • 9. Project structure 1. Python package where the logic of the project will be developed. Your models and pipelines will be developed here. 2. Configuration where you can configure define Databricks jobs which can run pipelines developed in python package 3. Tests directory where local unit tests and integration tests will be developed 1 2 3
  • 10. CI/CD Templates execute tests and deployments directly on databricks while storing packages, model logging and other artifacts in Mlflow
  • 11. CI/CD Templates - now powered by dbx With dbx you can: ● customize project structure and specify it during deployments ● use new CI tools easily (PRs are welcome!) ● run custom data pipelines pipelines directly from IDE on interactive clusters
  • 15. Summary The Challenges of implementing CD4ML The CI/CD challenges forcing ML teams to choose between Databricks notebooks or local IDEs Introducing DatabricksLabs CI/CD Templates How CI/CD Templates solves ML team production challenges Next Steps Search DatabricksLabs cicd-templates or go directly to https://github.com/databrickslabs/cicd-templates to get started michael.shtelma@databricks.com ivan.trusov@databricks.com
  • 16. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.