Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Model Experiments Tracking and Registration using MLflow on Databricks

Machine learning models are only as good as the quality of data and the size of datasets used to train the models. Data has shown that data scientists spend around 80% of their time on preparing and managing data for analysis and 57% of the data scientists regard cleaning and organizing data as the least enjoyable part of their work. This further validates the idea of MLOps and the need for collaboration between data scientists and data engineers.

  • Be the first to comment

  • Be the first to like this

Model Experiments Tracking and Registration using MLflow on Databricks

  1. 1. Model Experiments Tracking and Registration using MLflow on Databricks Dash Desai Director Of Platform And Technical Evangelism, StreamSets dash@streamsets.com | @iamontheinet | https://www.linkedin.com/in/dash-desai/
  2. 2. Agenda Overview The perfect recipe for building machine learning model experiments comes from automating tasks for data acquisition, preparation and being able to track model experiments. Hands-On Demo Hands-on demo of automating these crucial tasks using StreamSets and MLflow on Databricks. Find Out More Join me for “StreamSets Live: Demos with Dash.” https://bit.ly/DemosWDash
  3. 3. Data Acquisition And Preparation ▪ 80% of the data scientist's time is spent acquiring and preparing the data ▪ Source: Infoworld https://www.infoworld.com/article/3228245/the-80-20-data-science-dilemma.html ▪ Access to data is controlled by constrained teams ▪ The Dice 2020 Tech Job Report suggests data engineer was the fastest growing job in technology with a 50% year-over-year growth in the number of open positions. ▪ Source: Smith Hanley https://www.smithhanley.com/2020/06/11/data-engineers-more-in-demand-than-data-scientists/ ▪ Ability to experiment on large datasets ▪ “In machine learning, is more data always better than better algorithms?” - Banko and Brills ▪ Source: https://courses.cs.cornell.edu/cs674/2004sp/materials/banko-brill-acl2001.pdf
  4. 4. Model Experiments, Tracking, And Registration ▪ Precursor to model development ▪ Model experiments in a rapid, iterative manner ▪ Lack of industry standards ▪ Manual versioning and tracking models, inputs, hyperparameters ▪ Long model deployment/release cycles ▪ Hinders adoption to dynamic changes, gain competitive advantage ▪ Compliance with changing governance and regulations
  5. 5. Automation - StreamSets Data Acquisition, Preparation, Model Experimentation
  6. 6. ▪ An open source platform for end-to-end machine learning lifecycle ▪ Modern data integration platform for building smart data pipelines ▪ Easy to build; Self-serve ▪ Cloud and platform agnostic ▪ 100s of connectors ▪ Easy to scale and port ▪ Extensible and resilient ▪ Built-in orchestration and automation ▪ Unified data analytics platform ▪ Fully managed Apache Spark and MLflow Automation - MLflow | Databricks | StreamSets Data Acquisition, Preparation, Model Experimentation
  7. 7. Automation - StreamSets Transformer Data Acquisition, Preparation, Model Experimentation
  8. 8. Automation - StreamSets Transformer Data Acquisition, Preparation, Model Experimentation
  9. 9. Automation - StreamSets Transformer Data Acquisition, Preparation, Model Experimentation
  10. 10. Automation - StreamSets Transformer Data Acquisition, Preparation, Model Experimentation
  11. 11. Automation - StreamSets Transformer Data Acquisition, Preparation, Model Experimentation
  12. 12. Automation - StreamSets Transformer Data Acquisition, Preparation, Model Experimentation
  13. 13. Hands-On Demo ● Review data acquisition and preparation pipelines ● Run model experiments pipelines ● How to build pipelines
  14. 14. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  15. 15. Thank You! Dash Desai Director Of Platform And Technical Evangelism dash@streamsets.com | @iamontheinet | https://www.linkedin.com/in/dash-desai/

×