Data Versioning and Reproducible ML with DVC and MLflow

•

1 like•2,687 views

Machine Learning development involves comparing models and storing the artifacts they produced. We often compare several algorithms to select the most efficient ones. We assess different hyper-parameters to fine-tune the model. Git helps us store multiple versions of our code. Additionally, we need to keep track of the datasets we are using. This is important not only for audit purposes but also for assessing the performances of the models, developed at a later time. Git is a standard code versioning tool in software development. It can be used to store your datasets but it does not offer an optimal solution.

Data & Analytics

Data Versioning and Reproducible ML
with DVC and MLflow
Petra Kaferle Devisschere
Data Scientist, Adaltas

Agenda
▪ Versioning in Data Science
▪ Data Versioning
▪ Intro to DVC and MLflow
▪ Demo with DVC and MLflow

Problem definition
▪ Dataset
▪ Tabular
▪ Images, video, sound, text
▪ Code
▪ Functionality, Hyper-parameters
▪ Data pre-processing, Modeling
▪ Results
▪ Numeric (metrics)
▪ Plots
▪ Environment
▪ Dependencies
What do we need to track in Data Science project to be able to reproduce the results?

Data versioning use cases
▪ Data engineering during data cleaning and dataset preparation
▪ Test data in software engineering to assure the functionality of the
software
▪ Development and testing of database applications
▪ Ontology versioning for Semantic Web
▪ ...
Data beyond Data Science

Data is in the heart of any ML project
▪ Data quality management
▪ Reproducible training and re-training
▪ Automation of testing or deployment
▪ Use in applications and web services
▪ Audit
Why is the exact version of a dataset important?

Classical DV solutions and their downsides
▪ Git
▪ Not suitable for big files
▪ Difficulties with binary files
▪ Git-LFS
▪ File size limitation (ex.: 2 GB for GitHub free account and 5 GB for GitHub Enterprise Cloud)
▪ Data needs to be stored on servers of Git provider
▪ External storage
▪ Checksum calculation for any change in the dataset and placing them under version control
Classical Software Engineering tools are not enough to solve ML reproducibility crisis

Our proposed solution
Combining Data Version Control (DVC) and MLflow
&
- solves Git’s limitations
- easy to learn
- extensive and easy tracking
- user-friendly UI

What is DVC?
https://dvc.org/
▪ Tracks datasets and ML projects
▪ Works with many types of storages (cloud,
local, HDFS, HTTP…)
▪ Runs on top of a Git repository
▪ Supports building and running pipelines
We will focus on dataset tracking.

What is MLflow?
https://mlflow.org/
We will focus on experiment tracking.

Let’s put them together!
Best of both worlds

Recap
Let’s look again at what we just did

Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

What's hot

cLoki: Like Loki but for ClickHouse

Altinity Ltd

Data Pipline Observability meetup

Omid Vahdaty

Specialized tools for machine learning development and model governance are becoming essential. MlFlow is an open source platform for managing the machine learning lifecycle. Just by adding a few lines of code in the function or script that trains their model, data scientists can log parameters, metrics, artifacts (plots, miscellaneous files, etc.) and a deployable packaging of the ML model. Every time that function or script is run, the results will be logged automatically as a byproduct of those lines of code being added, even if the party doing the training run makes no special effort to record the results. MLflow application programming interfaces (APIs) are available for the Python, R and Java programming languages, and MLflow sports a language-agnostic REST API as well. Over a relatively short time period, MLflow has garnered more than 3,300 stars on GitHub , almost 500,000 monthly downloads and 80 contributors from more than 40 companies. Most significantly, more than 200 companies are now using MLflow. We will demo MlFlow Tracking , Project and Model components with Azure Machine Learning (AML) Services and show you how easy it is to get started with MlFlow on-prem or in the cloud.

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

DataWorks Summit

A traditional data team has roles including data engineer, data scientist, and data analyst. However, many organizations are finding success by integrating a new role – the analytics engineer. The analytics engineer develops a code-based data infrastructure that can serve both analytics and data science teams. He or she develops re-usable data models using the software engineering practices of version control and unit testing, and provides the critical domain expertise that ensures that data products are relevant and insightful. In this talk we’ll talk about the role and skill set of the analytics engineer, and discuss how dbt, an open source programming environment, empowers anyone with a SQL skillset to fulfill this new role on the data team. We’ll demonstrate how to use dbt to build version-controlled data models on top of Delta Lake, test both the code and our assumptions about the underlying data, and orchestrate complete data pipelines on Apache Spark™.

The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...

Databricks

With data as a valuable currency and the architecture of reliable, scalable Data Lakes and Lakehouses continuing to mature, it is crucial that machine learning training and deployment techniques keep up to realize value. Reproducibility, efficiency, and governance in training and production environments rest on the shoulders of both point in time snapshots of the data and a governing mechanism to regulate, track, and make best use of associated metadata. This talk will outline the challenges and importance of building and maintaining reproducible, efficient, and governed machine learning solutions as well as posing solutions built on open source technologies – namely Delta Lake for data versioning and MLflow for efficiency and governance.

Importance of ML Reproducibility & Applications with MLfLow

Databricks

Apache Airflow

Knoldus Inc.

Deploying and managing machine learning models at scale introduces new complexities. Fortunately, there are tools that simplify this process. In this talk we walk you through an end-to-end hands on example showing how you can go from research to production without much complexity by leveraging the Seldon Core and MLflow frameworks. We will train a set of ML models, and we will showcase a simple way to deploy them to a Kubernetes cluster through sophisticated deployment methods, including canary deployments, shadow deployments and we’ll touch upon richer ML graphs such as explainer deployments.

Seamless MLOps with Seldon and MLflow

Databricks

Productionalizing Models through CI/CD Design with MLflow

Databricks

Building a Data Pipeline using Apache Airflow (on AWS / GCP)

Yohei Onishi

Milvus is an open-source vector database that leverages a novel data fabric to build and manage vector similarity search applications. As the world's most popular vector database, it has already been adopted in production by thousands of companies around the world, including Lucidworks, Shutterstock, and Cloudinary. With the launch of Milvus 2.0, the community aims to introduce a cloud-native, highly scalable and extendable vector similarity solution, and the key design concept is log as data. Milvus relies on Pulsar as the log pub/sub system. Pulsar helps Milvus to reduce system complexity by loosely decoupling each micro service, making the system stateless by disaggregating log storage and computation, which also makes the system further extendable. We will introduce the overview design, the implementation details of Milvus and its roadmap in this topic. Takeaways: 1) Get a general idea about what is a vector database and its real-world use cases. 2) Understand the major design principles of Milvus 2.0. 3) Learn how to build a complex system with the help of a modern log system like Pulsar.

Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...

StreamNative

Airflow Best Practises & Roadmap to Airflow 2.0

Kaxil Naik

Performance Tuning And Optimization Microsoft SQL Database

Tung Nguyen Thanh

Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Buesing | Current 2022 Businesses need to react to results immediately; to achieve this, real-time processing is becoming a requirement in many analytic verticals. But sometimes, the move from batch to real-time can leave you in a pinch. How do you handle and correct mistakes in your data? How do you migrate a new system to real-time along with historical data? Let’s start with how to run Apache Druid locally with your containerized-based development environment. While streaming real-time events from Kafka into Druid, an S3 Complaint Store captures messages via Kafka Connect, for historical processing. An exploration of performance implications when the real-time stream of events contains historical data and how that affects performance and the techniques to prevent those issues, leaving a high-performance analytic platform supporting real-time and historical processing. You’ll leave with the tools of doing real-time analytic processing and historical batch processing from a single source of truth. Your Druid cluster will have better rollups (pre-computed aggregates) and fewer segments, which reduces cost and improves query performance.

Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...

HostedbyConfluent

MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.

MLOps Using MLflow

Databricks

Building an Observability platform with ClickHouse

Altinity Ltd

Reproducible AI using MLflow and PyTorch

Databricks

Data and AI summit: data pipelines observability with open lineage

Julien Le Dem

The term data quality is used to describe the dependability, reliability, and usability of datasets. Data scientists and business analysts often determine the quality of a dataset by its trustworthiness and completeness. But what information might be needed to differentiate between useful vs noisy data? How quickly can data quality issues be identified and explored? More importantly, how can metadata enable data scientists to make better sense of the high volume of data within their organization from a variety of data sources? With Airflow now ubiquitous for DAG orchestration, organizations increasingly dependon Airflow to manage complex inter-DAG dependencies and provide up-to-date runtime visibility into DAG execution. At WeWork, Airflow has quickly become an important component of our Data Platform powering billing, space inventory, etc. But what effects (if any) would upstream DAGs have on downstream DAGs if dataset consumption was delayed? What alerting rules should be in place to notify downstream DAGs of possible upstream processing issues or failures? At WeWork, we feel it’s critical that DAG metadata is collected, maintained, and shared across the organization. This investment in metadata enables: ● Data lineage ● Data governance ● Data discovery In this talk, we introduce Marquez: an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. We will demonstrate how metadata management with Marquez helps maintain inter-DAG dependencies, catalog historical runs of DAGs, and minimize data quality issues.

Data Lineage with Apache Airflow using Marquez

Willy Lulciuc

Databricks Overview for MLOps

Databricks

Data Build Tool (DBT) is an open source technology to set up your data lake using best practices from software engineering. This SQL first technology is a great marriage between Databricks and Delta. This allows you to maintain high quality data and documentation during the entire datalake life-cycle. In this talk I’ll do an introduction into DBT, and show how we can leverage Databricks to do the actual heavy lifting. Next, I’ll present how DBT supports Delta to enable upserting using SQL. Finally, we show how we integrate DBT+Databricks into the Azure cloud. Finally we show how we emit the pipeline metrics to Azure monitor to make sure that you have observability over your pipeline.

3D: DBT using Databricks and Delta

Databricks

What's hot (20)

cLoki: Like Loki but for ClickHouse

Data Pipline Observability meetup

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...

Importance of ML Reproducibility & Applications with MLfLow

Apache Airflow

Seamless MLOps with Seldon and MLflow

Productionalizing Models through CI/CD Design with MLflow

Building a Data Pipeline using Apache Airflow (on AWS / GCP)

Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...

Airflow Best Practises & Roadmap to Airflow 2.0

Performance Tuning And Optimization Microsoft SQL Database

Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...

MLOps Using MLflow

Building an Observability platform with ClickHouse

Reproducible AI using MLflow and PyTorch

Data and AI summit: data pipelines observability with open lineage

Data Lineage with Apache Airflow using Marquez

Databricks Overview for MLOps

3D: DBT using Databricks and Delta

Similar to Data Versioning and Reproducible ML with DVC and MLflow

The availability of new tools in the modern data stack is changing the way data teams operate. Specifically, the modern data stack supports an “ELT” approach for managing data, rather than the traditional “ETL” approach. In an ELT approach, data sources are automatically loaded in a normalized state into Delta Lake and opinionated transformations happen in the data destination using dbt. This workflow allows data analysts to move more quickly from raw data to insight, while creating repeatable data pipelines robust to changes in the source datasets. In this presentation, we’ll illustrate how easy it is for even a data analytics team of one to to develop an end-to-end data pipeline. We’ll load data from GitHub into Delta Lake, then use pre-built dbt models to feed a daily Redash dashboard on sales performance by manager, and use the same transformed models to power the data science team’s predictions of future sales by segment.

Speeding Time to Insight with a Modern ELT Approach

Databricks

Team Data Science Process Presentation (TDSP), Aug 29, 2017

Debraj GuhaThakurta

The blessing and curse of today's database market? So many choices! While relational databases still dominate the day-to-day business, a host of alternatives has evolved around very specific use cases: graph, document, NoSQL, hybrid (HTAP), column store, the list goes on. And the database tools market is teeming with activity as well. Register for this special Research Webcast to hear Dr. Robin Bloor share his early findings about the evolving database market. He'll be joined by Steve Sarsfield of HPE Vertica, and Robert Reeves of Datical in a roundtable discussion with Bloor Group CEO Eric Kavanagh. Send any questions to info@insideanalysis.com, or tweet with #DBSurvival.

Horses for Courses: Database Roundtable

Eric Kavanagh

During this webinar, we will review best practices and lessons learned from working with large and mid-size companies on their deployment of PostgreSQL. We will explore the practices that helped industry leaders move through these stages quickly, and get as much value out of PostgreSQL as possible without incurring undue risk. We have identified a set of levers that companies can use to accelerate their success with PostgreSQL: - Application Tiering - Collaboration between DBAs and Development Teams - Evangelizing - Standardization and Automation - Balance of Migration and New Development

PostgreSQL as a Strategic Tool

EDB

The advances in machine learning are great, yet, in order to have real value within a company, data scientists must be able to go from a research project to a reproducible process. A common problem is that the code is intrinsically linked to the data it was developed against. Hence it is critically important to track, trace and validate the input data used to train and test the algorithm. This talk will be a review of the several tools which for data versioning and processing.

Reproducible data science: review of Pachyderm, Data Version Control and GIT ...

Josh Levy-Kramer

Big Data for Data Scientists - Info Session

WeCloudData

Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...

Demi Ben-Ari

Watch full webinar here: https://bit.ly/3oah4ng Gartner a récemment qualifié la Data Virtualisation comme étant une pièce maitresse des architectures d’intégration de données. Découvrez : - Les bénéfices d’une plateforme de virtualisation de données - La multiplication des usages : Lakehouse, Data Science, Big Data, Data Service & IoT - La création d’une vue unifiée de votre patrimoine de données sans transiger sur la performance - La construction d’une architecture d’intégration Agile des données : on-premise, dans le cloud ou hybride

Virtualisation de données : Enjeux, Usages & Bénéfices

Denodo

Watch full webinar here: https://bit.ly/33yYuQm Data virtualization has become an essential part of enterprise data architectures, bridging the gap between IT and business users and delivering significant cost and time savings. This technology revolutionizes the way data is accessed, delivered, consumed and governed regardless of its format and location. This 1.5 hour discovery session will show help you identify the benefits of this modern and agile data integration and management technology for your organisation.

Belgium & Luxembourg dedicated online Data Virtualization discovery workshop

Denodo

Big Data projects are a struggle, not only on the technical side but also on the organizational side. In this talk the author shares his experience and opinions from almost 5 years of Big Data projects and develops an Agile Big Data Model which reflects his ideas on how Big Data projects can be successful, even in large companies. Talk held at the crossover meetup of the "Agile Stammtisch Rhein-Main" and the "Hadoop & Spark User Group Rhein-Main" at codecentric AG on 31.01.2017.

Hadoop meets Agile! - An Agile Big Data Model

Uwe Printz

Watch full webinar here: https://bit.ly/3hgOSwm Data Lake technologies have been in constant evolution in recent years, with each iteration primising to fix what previous ones failed to accomplish. Several data lake engines are hitting the market with better ingestion, governance, and acceleration capabilities that aim to create the ultimate data repository. But isn't that the promise of a logical architecture with data virtualization too? So, what’s the difference between the two technologies? Are they friends or foes? This session will explore the details.

Data Lake Acceleration vs. Data Virtualization - What’s the difference?

Denodo

https://fwdays.com/en/event/php-fwdays-2020 All of us think about many questions when we start a project, when we already have a product and when we release it. Here are some of them: which architecture and infrastructure to choose? what should be the repository structure? how to make the right evolution from one application to 100 microservices with success product release? how to distribute cross-stack commands as a whole? what development practices to use? This story will expose approaches to solving these and many other problems in PHP projects through: None-Breaking change development approach, Cross-stack contacts, Trunk Based development, evolution from Polyrepo to Monorepo with components on different technologies, Boilerplates for components, different Architecture Views, Continuous Testing & Quality, Infrastructure View, Infrastructure as a code as the main tool. This topic will appeal to everyone - from Software Developer to Architect, as many Tips & Tricks will be revealed.

PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...

Alexandr Savchenko

Ця розповідь розкриє підходи для вирішення багатьох проблем в PHP проєктах через: None-Breaking change development підхід, cross-stack контракти, Trunk Based development, еволюція з Polyrepo до Monorepo з компонентами на різних технологіях, Boilerplat’и компонентів, різні Architecture View, Continuous Testing & Quality, Infrastructure View, Infrastructure as a code як основний інструмент.

"Different software evolutions from Start till Release in PHP product" Oleksa...

Fwdays

Webinar on MongoDB BI Connectors

Sumit Sarkar

Three signs your architecture is too small for big data. Camp IT December 2014

Craig Jordan

Watch full webinar here: https://buff.ly/3qgGjtA Presented at TDWI VIRTUAL SUMMIT - Modernizing Data Management While the technological advances of the past decade have addressed the scale of data processing and data storage, they have failed to address scale in other dimensions: proliferation of sources of data, diversity of data types and user persona, and speed of response to change. The essence of the data mesh and data fabric approaches is that it puts the customer first and focuses on outcomes instead of outputs. In this session, Saptarshi Sengupta, Senior Director of Product Marketing at Denodo, will address key considerations and provide his insights on why some companies are succeeding with these approaches while others are not. Watch On-Demand and Learn: - Why a logical approach is necessary and how it aligns with data fabric and data mesh - How some of the large enterprises are using logical data fabric and data mesh for their data and analytics needs - Tips to create a good data management modernization roadmap for your organization

Logical Data Fabric and Data Mesh – Driving Business Outcomes

Denodo

Spark is an ETL and Data Processing engine especially suited for big data. Most of the time an organization has different teams working on different languages, frameworks and libraries, which needs to be integrated in the ETL Pipelines or for general data processing. For example, a Spark ETL job may be written in Scala by data engineering team, but there is a need to integrate a machine learning solution written in python/R developed by Data Science team. These kinds of solutions are not very straightforward to integrate with spark engine, and it required great amount of collaboration between different teams, hence increasing overall project time and cost. Furthermore, these solutions will keep on changing/upgrading with time using latest versions of the technologies and with improved design and implementation, especially in Machine Learning domain where ML models/algorithms keep on improving with new data and new approaches. And so there is significant downtime involved in integrating the these upgraded version.

Simplifying AI integration on Apache Spark

Databricks

Moving data to the cloud BY CESAR ROJAS from Pivotal

VMware Tanzu Korea

Leading brands such as Pepsi and Macy’s use Celtra’s technology platform for brand advertising. To inform better product design and resolve issues faster, Celtra relies on Databricks to gather insights from large-scale, diverse, and complex raw event data. Learn how Celtra uses Databricks to simplify their Spark deployment, achieve faster project turnaround time, and empower people to make data-driven decisions. In this webinar, you will learn how Databricks helps Celtra to: - Utilize Apache Spark to power their production analytics pipeline. - Build a “Just-in-Time” data warehouse to analyze diverse data sources such as Elastic Load Balancer access logs, raw tracking events, operational data, and reportable metrics. - Go beyond simple counting and group events into sequences (i.e., sessionization) and perform more complex analysis such as funnel analytics.

How Celtra Optimizes its Advertising Platformwith Databricks

Grega Kespret

"We can all agree that streaming is super cool. And for a while now, the adoption conversation has been largely led with an all-in mentality. But that’s silly. The only concerns end users have are: -The freshness of their data -Latency they require to meet their SLAs from source to consumption -All while maintaining data quality and governance. Luckily, the industry has realized this and we have seen a shift of streaming capabilities surfacing as an in-database technology, via objects as trivial to analytics engineers as views - materialized that is. With this convergence of streaming capabilities and batch level accessibility, this is when ELT tools like dbt can join in and expand out the adoption story. dbt is the T in ELT, Extract Load and Transform. In dbt, analytics engineers design models - SQL (and occasional python) statements that encapsulate business logic. At runtime, dbt will wrap that logic in a DDL statement and send it over to the data platform to execute. In this session, we’ll discuss how we see streaming at dbt Labs. We will dive into how we are extending dbt to support low-latency scenarios and the recent additions we have made to make batch and streaming allies in a DAG rather than archenemies."

Streaming is a Detail

HostedbyConfluent

Similar to Data Versioning and Reproducible ML with DVC and MLflow (20)

Speeding Time to Insight with a Modern ELT Approach

Team Data Science Process Presentation (TDSP), Aug 29, 2017

Horses for Courses: Database Roundtable

PostgreSQL as a Strategic Tool

Reproducible data science: review of Pachyderm, Data Version Control and GIT ...

Big Data for Data Scientists - Info Session

Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...

Virtualisation de données : Enjeux, Usages & Bénéfices

Belgium & Luxembourg dedicated online Data Virtualization discovery workshop

Hadoop meets Agile! - An Agile Big Data Model

Data Lake Acceleration vs. Data Virtualization - What’s the difference?

PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...

"Different software evolutions from Start till Release in PHP product" Oleksa...

Webinar on MongoDB BI Connectors

Three signs your architecture is too small for big data. Camp IT December 2014

Logical Data Fabric and Data Mesh – Driving Business Outcomes

Simplifying AI integration on Apache Spark

Moving data to the cloud BY CESAR ROJAS from Pivotal

How Celtra Optimizes its Advertising Platformwith Databricks

Streaming is a Detail

More from Databricks

DW Migration Webinar-March 2022.pptx

Databricks

The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse. Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today. Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow. This is an educational event. Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.

Data Lakehouse Symposium | Day 1 | Part 1

Databricks

Data Lakehouse Symposium | Day 1 | Part 2

Databricks

Data Lakehouse Symposium | Day 2

Databricks

Data Lakehouse Symposium | Day 4

Databricks

In this session, learn how to quickly supplement your on-premises Hadoop environment with a simple, open, and collaborative cloud architecture that enables you to generate greater value with scaled application of analytics and AI on all your data. You will also learn five critical steps for a successful migration to the Databricks Lakehouse Platform along with the resources available to help you begin to re-skill your data teams.

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

Databricks

Bad data leads to bad decisions and broken customer experiences. Organizations depend on complete and accurate data to power their business, maintain efficiency, and uphold customer trust. With thousands of datasets and pipelines running, how do we ensure that all data meets quality standards, and that expectations are clear between producers and consumers? Investing in shared, flexible components and practices for monitoring data health is crucial for a complex data organization to rapidly and effectively scale. At Zillow, we built a centralized platform to meet our data quality needs across stakeholders. The platform is accessible to engineers, scientists, and analysts, and seamlessly integrates with existing data pipelines and data discovery tools. In this presentation, we will provide an overview of our platform’s capabilities, including: Giving producers and consumers the ability to define and view data quality expectations using a self-service onboarding portal Performing data quality validations using libraries built to work with spark Dynamically generating pipelines that can be abstracted away from users Flagging data that doesn’t meet quality standards at the earliest stage and giving producers the opportunity to resolve issues before use by downstream consumers Exposing data quality metrics alongside each dataset to provide producers and consumers with a comprehensive picture of health over time

Democratizing Data Quality Through a Centralized Platform

Databricks

Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.

Learn to Use Databricks for Data Science

Databricks

Application performance monitoring (APM) has become the cornerstone of software engineering allowing engineering teams to quickly identify and remedy production issues. However, as the world moves to intelligent software applications that are built using machine learning, traditional APM quickly becomes insufficient to identify and remedy production issues encountered in these modern software applications. As a lead software engineer at NewRelic, my team built high-performance monitoring systems including Insights, Mobile, and SixthSense. As I transitioned to building ML Monitoring software, I found the architectural principles and design choices underlying APM to not be a good fit for this brand new world. In fact, blindly following APM designs led us down paths that would have been better left unexplored. In this talk, I draw upon my (and my team’s) experience building an ML Monitoring system from the ground up and deploying it on customer workloads running large-scale ML training with Spark as well as real-time inference systems. I will highlight how the key principles and architectural choices of APM don’t apply to ML monitoring. You’ll learn why, understand what ML Monitoring can successfully borrow from APM, and hear what is required to build a scalable, robust ML Monitoring architecture.

Why APM Is Not the Same As ML Monitoring

Databricks

Autonomy and ownership are core to working at Stitch Fix, particularly on the Algorithms team. We enable data scientists to deploy and operate their models independently, with minimal need for handoffs or gatekeeping. By writing a simple function and calling out to an intuitive API, data scientists can harness a suite of platform-provided tooling meant to make ML operations easy. In this talk, we will dive into the abstractions the Data Platform team has built to enable this. We will go over the interface data scientists use to specify a model and what that hooks into, including online deployment, batch execution on Spark, and metrics tracking and visualization.

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

Databricks

In this talk, I will dive into the stage level scheduling feature added to Apache Spark 3.1. Stage level scheduling extends upon Project Hydrogen by improving big data ETL and AI integration and also enables multiple other use cases. It is beneficial any time the user wants to change container resources between stages in a single Apache Spark application, whether those resources are CPU, Memory or GPUs. One of the most popular use cases is enabling end-to-end scalable Deep Learning and AI to efficiently use GPU resources. In this type of use case, users read from a distributed file system, do data manipulation and filtering to get the data into a format that the Deep Learning algorithm needs for training or inference and then sends the data into a Deep Learning algorithm. Using stage level scheduling combined with accelerator aware scheduling enables users to seamlessly go from ETL to Deep Learning running on the GPU by adjusting the container requirements for different stages in Spark within the same application. This makes writing these applications easier and can help with hardware utilization and costs. There are other ETL use cases where users want to change CPU and memory resources between stages, for instance there is data skew or perhaps the data size is much larger in certain stages of the application. In this talk, I will go over the feature details, cluster requirements, the API and use cases. I will demo how the stage level scheduling API can be used by Horovod to seamlessly go from data preparation to training using the Tensorflow Keras API using GPUs. The talk will also touch on other new Apache Spark 3.1 functionality, such as pluggable caching, which can be used to enable faster dataframe access when operating from GPUs.

Stage Level Scheduling Improving Big Data and AI Integration

Databricks

In this talk, I would like to introduce an open-source tool built by our team that simplifies the data conversion from Apache Spark to deep learning frameworks. Imagine you have a large dataset, say 20 GBs, and you want to use it to train a TensorFlow model. Before feeding the data to the model, you need to clean and preprocess your data using Spark. Now you have your dataset in a Spark DataFrame. When it comes to the training part, you may have the problem: How can I convert my Spark DataFrame to some format recognized by my TensorFlow model? The existing data conversion process can be tedious. For example, to convert an Apache Spark DataFrame to a TensorFlow Dataset file format, you need to either save the Apache Spark DataFrame on a distributed filesystem in parquet format and load the converted data with third-party tools such as Petastorm, or save it directly in TFRecord files with spark-tensorflow-connector and load it back using TFRecordDataset. Both approaches take more than 20 lines of code to manage the intermediate data files, rely on different parsing syntax, and require extra attention for handling vector columns in the Spark DataFrames. In short, all these engineering frictions greatly reduced the data scientists’ productivity. The Databricks Machine Learning team contributed a new Spark Dataset Converter API to Petastorm to simplify these tedious data conversion process steps. With the new API, it takes a few lines of code to convert a Spark DataFrame to a TensorFlow Dataset or a PyTorch DataLoader with default parameters. In the talk, I will use an example to show how to use the Spark Dataset Converter to train a Tensorflow model and how simple it is to go from single-node training to distributed training on Databricks.

Simplify Data Conversion from Spark to TensorFlow and PyTorch

Databricks

There is no doubt Kubernetes has emerged as the next generation of cloud native infrastructure to support a wide variety of distributed workloads. Apache Spark has evolved to run both Machine Learning and large scale analytics workloads. There is growing interest in running Apache Spark natively on Kubernetes. By combining the flexibility of Kubernetes and scalable data processing with Apache Spark, you can run any data and machine pipelines on this infrastructure while effectively utilizing resources at disposal. In this talk, Rajesh Thallam and Sougata Biswas will share how to effectively run your Apache Spark applications on Google Kubernetes Engine (GKE) and Google Cloud Dataproc, orchestrate the data and machine learning pipelines with managed Apache Airflow on GKE (Google Cloud Composer). Following topics will be covered: – Understanding key traits of Apache Spark on Kubernetes- Things to know when running Apache Spark on Kubernetes such as autoscaling- Demonstrate running analytics pipelines on Apache Spark orchestrated with Apache Airflow on Kubernetes cluster.

Scaling your Data Pipelines with Apache Spark on Kubernetes

Databricks

Pipelines have become ubiquitous, as the need for stringing multiple functions to compose applications has gained adoption and popularity. Common pipeline abstractions such as “fit” and “transform” are even shared across divergent platforms such as Python Scikit-Learn and Apache Spark. Scaling pipelines at the level of simple functions is desirable for many AI applications, however is not directly supported by Ray’s parallelism primitives. In this talk, Raghu will describe a pipeline abstraction that takes advantage of Ray’s compute model to efficiently scale arbitrarily complex pipeline workflows. He will demonstrate how this abstraction cleanly unifies pipeline workflows across multiple platforms such as Scikit-Learn and Spark, and achieves nearly optimal scale-out parallelism on pipelined computations. Attendees will learn how pipelined workflows can be mapped to Ray’s compute model and how they can both unify and accelerate their pipelines with Ray.

Scaling and Unifying SciKit Learn and Apache Spark Pipelines

Databricks

In this talk about zipline, we will introduce a new type of windowing construct called a sawtooth window. We will describe various properties about sawtooth windows that we utilize to achieve online-offline consistency, while still maintaining high-throughput, low-read latency and tunable write latency for serving machine learning features.We will also talk about a simple deployment strategy for correcting feature drift – due operations that are not “abelian groups”, that operate over change data.

Sawtooth Windows for Feature Aggregations

Databricks

We want to present multiple anti patterns utilizing Redis in unconventional ways to get the maximum out of Apache Spark.All examples presented are tried and tested in production at Scale at Adobe. The most common integration is spark-redis which interfaces with Redis as a Dataframe backing Store or as an upstream for Structured Streaming. We deviate from the common use cases to explore where Redis can plug gaps while scaling out high throughput applications in Spark. Niche 1 : Long Running Spark Batch Job – Dispatch New Jobs by polling a Redis Queue · Why? o Custom queries on top a table; We load the data once and query N times · Why not Structured Streaming · Working Solution using Redis Niche 2 : Distributed Counters · Problems with Spark Accumulators · Utilize Redis Hashes as distributed counters · Precautions for retries and speculative execution · Pipelining to improve performance

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink

Databricks

In the era of microservices, decentralized ML architectures and complex data pipelines, data quality has become a bigger challenge than ever. When data is involved in complex business processes and decisions, bad data can, and will, affect the bottom line. As a result, ensuring data quality across the entire ML pipeline is both costly, and cumbersome while data monitoring is often fragmented and performed ad hoc. To address these challenges, we built whylogs, an open source standard for data logging. It is a lightweight data profiling library that enables end-to-end data profiling across the entire software stack. The library implements a language and platform agnostic approach to data quality and data monitoring. It can work with different modes of data operations, including streaming, batch and IoT data. In this talk, we will provide an overview of the whylogs architecture, including its lightweight statistical data collection approach and various integrations. We will demonstrate how the whylogs integration with Apache Spark achieves large scale data profiling, and we will show how users can apply this integration into existing data and ML pipelines.

Re-imagine Data Monitoring with whylogs and Spark

Databricks

Machine learning (ML) models are typically part of prediction queries that consist of a data processing part (e.g., for joining, filtering, cleaning, featurization) and an ML part invoking one or more trained models. In this presentation, we identify significant and unexplored opportunities for optimization. To the best of our knowledge, this is the first effort to look at prediction queries holistically, optimizing across both the ML and SQL components. We will present Raven, an end-to-end optimizer for prediction queries. Raven relies on a unified intermediate representation that captures both data processing and ML operators in a single graph structure. This allows us to introduce optimization rules that (i) reduce unnecessary computations by passing information between the data processing and ML operators (ii) leverage operator transformations (e.g., turning a decision tree to a SQL expression or an equivalent neural network) to map operators to the right execution engine, and (iii) integrate compiler techniques to take advantage of the most efficient hardware backend (e.g., CPU, GPU) for each operator. We have implemented Raven as an extension to Spark’s Catalyst optimizer to enable the optimization of SparkSQL prediction queries. Our implementation also allows the optimization of prediction queries in SQL Server. As we will show, Raven is capable of improving prediction query performance on Apache Spark and SQL Server by up to 13.1x and 330x, respectively. For complex models, where GPU acceleration is beneficial, Raven provides up to 8x speedup compared to state-of-the-art systems. As part of the presentation, we will also give a demo showcasing Raven in action.

Raven: End-to-end Optimization of ML Prediction Queries

Databricks

Semantic segmentation is the classification of every pixel in an image/video. The segmentation partitions a digital image into multiple objects to simplify/change the representation of the image into something that is more meaningful and easier to analyze [1][2]. The technique has a wide variety of applications ranging from perception in autonomous driving scenarios to cancer cell segmentation for medical diagnosis. Exponential growth in the datasets that require such segmentation is driven by improvements in the accuracy and quality of the sensors generating the data extending to 3D point cloud data. This growth is further compounded by exponential advances in cloud technologies enabling the storage and compute available for such applications. The need for semantically segmented datasets is a key requirement to improve the accuracy of inference engines that are built upon them. Streamlining the accuracy and efficiency of these systems directly affects the value of the business outcome for organizations that are developing such functionalities as a part of their AI strategy. This presentation details workflows for labeling, preprocessing, modeling, and evaluating performance/accuracy. Scientists and engineers leverage domain-specific features/tools that support the entire workflow from labeling the ground truth, handling data from a wide variety of sources/formats, developing models and finally deploying these models. Users can scale their deployments optimally on GPU-based cloud infrastructure to build accelerated training and inference pipelines while working with big datasets. These environments are optimized for engineers to develop such functionality with ease and then scale against large datasets with Spark-based clusters on the cloud.

Processing Large Datasets for ADAS Applications using Apache Spark

Databricks

At Adobe Experience Platform, we ingest TBs of data every day and manage PBs of data for our customers as part of the Unified Profile Offering. At the heart of this is a bunch of complex ingestion of a mix of normalized and denormalized data with various linkage scenarios power by a central Identity Linking Graph. This helps power various marketing scenarios that are activated in multiple platforms and channels like email, advertisements etc. We will go over how we built a cost effective and scalable data pipeline using Apache Spark and Delta Lake and share our experiences. What are we storing? Multi Source – Multi Channel Problem Data Representation and Nested Schema Evolution Performance Trade Offs with Various formats Go over anti-patterns used (String FTW) Data Manipulation using UDFs Writer Worries and How to Wipe them Away Staging Tables FTW Datalake Replication Lag Tracking Performance Time!

Massive Data Processing in Adobe Using Delta Lake

Databricks

More from Databricks (20)

DW Migration Webinar-March 2022.pptx

Data Lakehouse Symposium | Day 1 | Part 1

Data Lakehouse Symposium | Day 1 | Part 2

Data Lakehouse Symposium | Day 2

Data Lakehouse Symposium | Day 4

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

Democratizing Data Quality Through a Centralized Platform

Learn to Use Databricks for Data Science

Why APM Is Not the Same As ML Monitoring

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

Stage Level Scheduling Improving Big Data and AI Integration

Simplify Data Conversion from Spark to TensorFlow and PyTorch

Scaling your Data Pipelines with Apache Spark on Kubernetes

Scaling and Unifying SciKit Learn and Apache Spark Pipelines

Sawtooth Windows for Feature Aggregations

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink

Re-imagine Data Monitoring with whylogs and Spark

Raven: End-to-end Optimization of ML Prediction Queries

Processing Large Datasets for ADAS Applications using Apache Spark

Massive Data Processing in Adobe Using Delta Lake

Recently uploaded

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 4-May-2024(SMW) Call Girls In Model Towh +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...

amitlee9823

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K to 25K High Profile Escorts In Pune Booking Now open +91- 8005736733 Why you Choose Us- +91- 8005736733 HOT⇄ 8005736733 Mr ashu ji Call Mr ashu Ji +91- 8005736733 (V030524]N) 𝐇𝐨𝐭𝐞𝐥 𝐑𝐨𝐨𝐦𝐬 𝐈𝐧𝐜𝐥𝐮𝐝𝐢𝐧𝐠 𝐑𝐚𝐭𝐞 𝐒𝐡𝐨𝐭𝐬/𝐇𝐨𝐮𝐫𝐲🆓 .█▬█⓿▀█▀ 𝐈𝐍𝐃𝐄𝐏𝐄𝐍𝐃𝐄𝐍𝐓 𝐆𝐈𝐑𝐋 𝐕𝐈𝐏 𝐄𝐒𝐂𝐎𝐑𝐓 Hello Guys ! High Profiles young Beauties and Good Looking standard Profiles Available , Enquire Now if you are interested in Hifi Service and want to get connect with someone who can understand your needs. Service offers you the most beautiful High Profile sexy independent female Escorts in genuine ✔✔✔ To enjoy with hot and sexy girls ✔✔✔ ★providing:- • Models • vip Models • Russian Models • Foreigner Models • TV Actress and Celebrities • Receptionist • Air Hostess • Call Center Working Girls/Women • Hi-Tech Co. Girls/Women • Housewife

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

SUHANI PANDEY

Ashok Vihar Call Girls in Delhi (–9953330565) Escort Service In Delhi NCR PROVIDE 100% REAL GIRLS ALL ARE GIRLS LOOKING MODELS AND RAM MODELS ALL GIRLS” INDIAN , RUSSIAN ,KASMARI ,PUNJABI HOT GIRLS AND MATURED HOUSE WIFE BOOKING ONLY DECENT GUYS AND GENTLEMAN NO FAKE PERSON FREE HOME SERVICE IN CALL FULL AC ROOM SERVICE IN SOUTH DELHI Ultimate Destination for finding a High Profile Independent Escorts in Delhi.Gurgaon.Noida..!.Like You Feel 100% Real Girl Friend Experience. We are High Class Delhi Escort Agency offering quality services with discretion. We only offer services to gentlemen people. We have lots of girls working with us like students, Russian, models, house wife, and much More We Provide Short Time and Full Night Service Call ☎☎+91–9953330565 ❤꧂ • In Call and Out Call Service in Delhi NCR • 3* 5* 7* Hotels Service in Delhi NCR • 24 Hours Available in Delhi NCR • Indian, Russian, Punjabi, Kashmiri Escorts • Real Models, College Girls, House Wife, Also Available • Short Time and Full Time Service Available • Hygienic Full AC Neat and Clean Rooms Avail. In Hotel 24 hours • Daily New Escorts Staff Available • Minimum to Maximum Range Available. Location;- Delhi, Gurgaon, NCR, Noida, and All Over in Delhi Hotel and Home Services HOTEL SERVICE AVAILABLE :-REDDISSON BLU,ITC WELCOM DWARKA,HOTEL-JW MERRIOTT,HOLIDAY INN MAHIPALPUR AIROCTY,CROWNE PLAZA OKHALA,EROSH NEHRU PLACE,SURYAA KALKAJI,CROWEN PLAZA ROHINI,SHERATON PAHARGANJ,THE AMBIENC,VIVANTA,SURAJKUND,ASHOKA CONTINENTAL , LEELA CHANKYAPURI,_ALL 3* 5* 7* STARTS HOTEL SERVICE BOOKING CALL Call WHATSAPP Call ☎+91–9953330565❤꧂ NIGHT SHORT TIME BOTH ARE AVAILABLE

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

amitlee9823

Capstone Project on IBM Data Analytics Program

MoniSankarHazra

Call Girl In Dwarka ☎92055#41914 ¶¶ Indian,Russian Best Quality full Educated And Full Cooperative Independent Call Girls Escort Services In New Delhi- I Have Extremely Beautiful Broad Minded Cute Sexy & Hot Call Girls and Escorts, We Are Located in 3* 4* 5* Hotels in Delhi. Safe & Secure High Class Services Affordable Rate 100% Satisfaction, Unlimited Enjoyment. Any Time for Model/Teens Escort in Delhi High class luxury and premium escorts agency Indian Russian Call Girls In Delhi Booking Good High Profile Escorts (Call Girls) In Delhi 5 Star Hotel ,Incall Service,OutCall Service, We provide services by Call Girls,College Girls,Modals Get High Profile queens,Well Educated,Good Looking,Full Cooperative Model, Russian Models,Punjabi Girls Kashmeri Girls Services etc… We Provide Hottest Female With Safe And Consensual With Most Limits Respected Complete Satisfaction Guaranteed…Service. Call Me Spacial For Including Incall//outcall Service In New Delhi Indian Russian Escorts Service

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Delhi Call girls

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (Bangalore) Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...

amitlee9823

CebaBaby dropshipping via API with DroFX.pptx

olyaivanovalion

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha Call girls in dubai Call girls at dubai Dubai Call girl mistaken. Just Call girl dubai Call girl in dubai Indian Call girls dubai Indian Call girl dubai Pakistan Call girls in dubai the Pakistani Call girl dubai https://services.tochat.be/whatsapp-business-directory/b22b3d16-5b26-4e59-b6c1-db53f14dfea0?utm_medium=social&utm_source=heylink.me Dubai Call girls service Dubai Call girl services Call girl service in dubai Dubai Call girl agency Dubai Call girls agency Verified Call girls dubai correct motivation is required. Her smile enlarges as if she were Young Call girls in dubai Marina Call girls Dubai marina Call girls Jumeirah Call girls Dubai Jumeirah Call girls Bur dubai Call girls Indian Call girls in bur dubai Call girls bur dubai hiding a tremendous secret. Al qusais Call girls Al nahda dubai Call girls Independent Call girls dubai Independent Call girl dubai Russian Call girls in dubai Dubai russian Call girls Young Call girls in dubai Dubai young Call girls Call girls numbers in dubai How about leaving your father's home, being wealthy, and being able to help your sister? Even though I know what she is going to say won't be good, my ears are ringing. To have this chat, I waited until Dubai Call girls number Call girls near me dubai Call girls near my hotel Cute Call girls in dubai Model Call girl in dubai Rent a girlfriend dubai you were eighteen years old. Do you understand what I do, Eden? Since I have no idea, I shake my head and my mind races. She must be some kind of successful businesswoman, I suppose. "I own a business. Do you recognize that? Knowing my best. She left. She said that Dad told her that Dubai Call girls Call girls dubai Call girls in dubai Call girls at dubai we didn’t need her anymore when he came home. I was sad.Dubai Call girl Call girl dubai Call girl in dubai Indian Call girls dubai Indian Call girl dubai Can you tell her to come back? I like her.” Her little face is Pakistan Call girls in dubai Pakistani Call girl dubai Dubai Call girls service Dubai Call girl services all pinched. So sweet. Call girl service in dubai Dubai Call girl agency Dubai Call girls agency Verified Call girls dubai But I'm pissed off. How can he Young Call girls in dubai Marina Call girls Dubai marina Call girls Jumeirah Call girls Dubai Jumeirah Call girls Bur dubai Call girls Indian Call girls in bur dubai Call girls bur dubai turn down someone I'm paying for? “So, who's here with you?” I ask her,Al qusais Call girls Al nahda dubai Call girls Independent Call girls dubai Independent Call girl dubai Russian Call girls in dubai Dubai russian Call girls fervently hoping she wasn’t here alone. “Dad's downstairs, I think Young Call girls in dubai Dubai young Call girls Call girls numbers in dubai Dubai Call girls number Call girls near me dubai Call girls near my hotel Cute Call girls in dubai Model Call girl in dubai Rent a girlfriend

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

AroojKhan71

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

Pooja Nehwal

Halmar dropshipping via API with DroFx

olyaivanovalion

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

amitlee9823

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage with sex Booking Contact Details :- WhatsApp Chat :- +91-9920725232 4-May-2024(SMW) Call Girls In Model Towh +91-9920725232 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in NCR 24 Hours Available Service Call Girls, Contact Us +91-9920725232 (Any Time. Any Where) Call Girls in , Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service NCRWelcome To Escorts Service – An All Over New Very Sexy Hot Call Girls Agency Service Escorts In South NCR’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —9920725232 We are available 24*7 all days of the year. Call us — 9920725232 Thank you for Visiting.

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...

amitlee9823

ELKO dropshipping via API with DroFx.pptx

olyaivanovalion

(NEHA) Call Girls Katra Call Now: 8617697112 Katra Escorts Booking Contact Details WhatsApp Chat: +91-8617697112 Katra Escort Service includes providing maximum physical satisfaction to their clients as well as engaging conversation that keeps your time enjoyable and entertaining. Plus, they look fabulously elegant, making an impression. Independent Escorts Katra understands the value of confidentiality and discretion; they will go the extra mile to meet your needs. Simply contact them via text messaging or through their online profiles; they'd be more than delighted to accommodate any request or arrange a romantic date or fun-filled night together. We provide:

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7

Call Girls in Nagpur High Profile Call Girls

Klinik_ Apotek Onlin 085657271886 Solusi Menggugurkan Masalah Kehamilan Anda Jual Obat Aborsi Asli KLINIK ABORSI TERPEECAYA _ Jual Obat Aborsi Cytotec Misoprostol Asli 100% Ampuh Hanya 3 Jam Langsung Gugur || OBAT PENGGUGUR KANDUNGAN AMPUH MANJUR OBAT ABORSI OLINE" APOTIK Jual Obat Cytotec, Gastrul, Gynecoside Asli Ampuh. JUAL ” Obat Aborsi Tuntas | Obat Aborsi Manjur | Obat Aborsi Ampuh | Obat Penggugur Janin | Obat Pencegah Kehamilan | Obat Pelancar Haid | Obat terlambat Bulan | Ciri Obat Aborsi Asli | Obat Telat Bulan | Pil Aborsi Asli | Cara Menggugurkan Konten | Cara Aborsi Tuntas | Harga Obat Aborsi Asli | Pil Aborsi | Jual Obat Aborsi Cytotec | Cara Aborsi Sendiri | Cara Aborsi Usia 1 Bulan | Cara Aborsi Usia 2 Tahun | Cara Aborsi Usia 3 Bulan | Obat Aborsi Usia 4 Bulan | Cara Abrasi Usia 5 Bulan | Cara Menggugurkan Konten | Kandungan Obat Penggugur | Cara Menghitung Usia Konten | Cara Mengatasi Terlambat Bulan | Penjual Obat Aborsi Asli | Obat Aborsi Garansi | Kandungan Obat Peluntur | Obat Telat Datang Bulan | Obat Telat Haid | Obat Aborsi Paling Murah | Klinik Jual Obat Aborsi | Jual Pil Cytotec | Apotik Jual Obat Aborsi | Kandungan Dokter Abrasi | Cara Aborsi Cepat | Jual Obat Aborsi Bergaransi | Jual Obat Cytotec Asli | Obat Aborsi Aman Manjur | Obat Misoprostol Cytotec Asli. "APA ITU ABORSI" “Aborsi Adalah dengan membendung hormon yang di perlukan untuk mempertahankan kehamilan yaitu hormon progesteron, karena hormon ini dibendung, maka jalur kehamilan mulai membuka dan leher rahim menjadi melunak,sehingga mengeluarkan darah yang merupakan tanda bahwa obat telah bekerja || maksimal 1 jam obat diminum || PENJELASAN OBAT ABORSI USIA 1 _7 BULAN Pada usia kandungan ini, pasien akan merasakan sakit yang sedikit tidak berlebihan || sekitar 1 jam ||. namun hanya akan terjadi pada saatdarah keluar merupakan pertanda menstruasi. Hal ini dikarenakan pada usiakandungan 3 bulan,janin sudah terbentuk sebesar kepalan tangan orang dewasa. Cara kerja obat aborsi : JUAL OBAT ABORSI AMPUH dosis 3 bulan secara umum sama dengan cara kerja || DOSIS OBAT ABORSI 2 bulan”, hanya berbedanya selain mengisolasijanin juga menghancurkan janin dengan formula methotrexate dikandungdidalamnya. Formula methotrexate ini sangat ampuh untuk menghancurkan janinmenjadi serpihan-serpihan kecil akan sangat berguna pada saat dikeluarkan nanti. APA ALASAN WANITA MELAKUKAN ABORSI? Aborsi di lakukan wanita hamil baik yang sudah menikah maupun belum menikah dengan berbagai alasan , akan tetapi alasan yang utama adalah alasan-alasan non medis (termasuk aborsi sendiri / di sengaja/ buatan] MELAYANI PEMESANAN OBAT ABORSI SETIAP HARI, SIAP KIRIM KESELURUH KOTA BESAR DI INDONESIA DAN LUAR NEGERI. HUBUNGI PEMESANAN LEBIH NYAMAN VIA WA/: 085657271886

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

ZurliaSoop

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

adriantubila

VidaXL dropshipping via API with DroFx.pptx

olyaivanovalion

Discover Why Less is More in B2B Research

michael115558

Gen AI on Enterprise Cloud Apache NiFi Milvus Apache Kafka Apache Flink Cloudera Machine Learning Cloudera DataFlow https://medium.com/@tspann/building-a-milvus-connector-for-nifi-34372cb3c7fa https://www.meetup.com/futureofdata-princeton/events/300737266/ https://lu.ma/q7pcfyjn?source=post_page-----34372cb3c7fa--------------------------------&tk=TTyakY If you're interested in working with Generative AI on the cloud, this virtual workshop is for you. Tim Spann from Cloudera and Yujian Tang from Zilliz will cover how you can implement your own GenAI workflows on the cloud at enterprise scale. 9:00 - 9:05: Intro 9:05 - 9:15: What is Milvus 9:15 - 9:25: Cloudera Development Platform 9:25 - 10:00: Demo Location https://www.youtube.com/watch?v=IfWIzKsoHnA https://github.com/tspannhw/SpeakerProfile https://www.linkedin.com/in/yujiantang/

Generative AI on Enterprise Cloud with NiFi and Milvus

Timothy Spann

Recently uploaded (20)

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Capstone Project on IBM Data Analytics Program

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...

CebaBaby dropshipping via API with DroFX.pptx

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

Halmar dropshipping via API with DroFx

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...

ELKO dropshipping via API with DroFx.pptx

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

VidaXL dropshipping via API with DroFx.pptx

Discover Why Less is More in B2B Research

Generative AI on Enterprise Cloud with NiFi and Milvus

Data Versioning and Reproducible ML with DVC and MLflow

1. Data Versioning and Reproducible ML with DVC and MLflow Petra Kaferle Devisschere Data Scientist, Adaltas

2. Introduction About me: ▪ Data Scientist ▪ 13 years of experience in Data Analytics in Life Sciences ▪ Developed solutions for automating frequently used protocols About Adaltas: ▪ Involved with Big Data since 2010 ▪ 16 experts in France, 2 in Morocco ▪ Partner with Cloudera, Databricks, D2IQ ▪ Actor in Open Source community ▪ Teaching Big Data at 6 universities

3. Agenda ▪ Versioning in Data Science ▪ Data Versioning ▪ Intro to DVC and MLflow ▪ Demo with DVC and MLflow

4. Problem definition ▪ Dataset ▪ Tabular ▪ Images, video, sound, text ▪ Code ▪ Functionality, Hyper-parameters ▪ Data pre-processing, Modeling ▪ Results ▪ Numeric (metrics) ▪ Plots ▪ Environment ▪ Dependencies What do we need to track in Data Science project to be able to reproduce the results?

5. Data versioning use cases ▪ Data engineering during data cleaning and dataset preparation ▪ Test data in software engineering to assure the functionality of the software ▪ Development and testing of database applications ▪ Ontology versioning for Semantic Web ▪ ... Data beyond Data Science

6. Data is in the heart of any ML project ▪ Data quality management ▪ Reproducible training and re-training ▪ Automation of testing or deployment ▪ Use in applications and web services ▪ Audit Why is the exact version of a dataset important?

7. Classical DV solutions and their downsides ▪ Git ▪ Not suitable for big files ▪ Difficulties with binary files ▪ Git-LFS ▪ File size limitation (ex.: 2 GB for GitHub free account and 5 GB for GitHub Enterprise Cloud) ▪ Data needs to be stored on servers of Git provider ▪ External storage ▪ Checksum calculation for any change in the dataset and placing them under version control Classical Software Engineering tools are not enough to solve ML reproducibility crisis

8. Our proposed solution Combining Data Version Control (DVC) and MLflow & - solves Git’s limitations - easy to learn - extensive and easy tracking - user-friendly UI

9. What is DVC? https://dvc.org/ ▪ Tracks datasets and ML projects ▪ Works with many types of storages (cloud, local, HDFS, HTTP…) ▪ Runs on top of a Git repository ▪ Supports building and running pipelines We will focus on dataset tracking.

10. What is MLflow? https://mlflow.org/ We will focus on experiment tracking.

11. Let’s put them together! Best of both worlds

12. Demo

13. Recap Let’s look again at what we just did

14. Thank you for your attention!

15. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

Data Versioning and Reproducible ML with DVC and MLflow

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data Versioning and Reproducible ML with DVC and MLflow

Similar to Data Versioning and Reproducible ML with DVC and MLflow (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Data Versioning and Reproducible ML with DVC and MLflow