SlideShare a Scribd company logo
1 of 23
Download to read offline
Notebook-based AI Pipelines with
Elyra and Kubeflow
Nick Pentreath
Principal Engineer, IBM
@MLnick
About
DEG / Nov 18, 2020 / © 2020 IBM Corporation
– @MLnick on Twitter, Github, LinkedIn
– Principal Engineer, IBM CODAIT (Center for
Open-Source Data & AI Technologies)
– Machine Learning & AI
– Apache Spark committer & PMC
– Author of Machine Learning with Spark
– Various conferences & meetups
2
Improving the Enterprise AI Lifecycle in Open Source
DEG / Nov 18, 2020 / © 2020 IBM Corporation 3
– CODAIT aims to make AI solutions
dramatically easier to create,
deploy, and manage in the
enterprise.
– We contribute to and advocate for
the open-source technologies that
are foundational to IBM’s AI
offerings.
– 30+ open-source developers!
Center for Open Source Data & AI Technologies
codait.org
CODAIT
Open Source @ IBM
Agenda
4
– Machine learning workflow
– JupyerLab & Elyra
– Demo
– Conclusion
DEG / Nov 18, 2020 / © 2020 IBM Corporation
Machine Learning
Workflow
5
Data Analyze Process Train Deploy
Predict
&
Maintain
DEG / Nov 18, 2020 / © 2020 IBM Corporation
Workflow spans teams …
6
Data Analyze Process Train Deploy
Predict
&
Maintain
DEG / Nov 18, 2020 / © 2020 IBM Corporation
Data Engineers Data Scientists & Researchers
Machine Learning &
Production Engineers
… and tools
7
Data Analyze Process Train Deploy
DEG / Nov 18, 2020 / © 2020 IBM Corporation
Data formats
• CSV, SQL
• JSON,
Parquet,
AVRO
• Binary
(image,
audio)
• …
Data Engineers Data Scientists & Researchers
Machine Learning &
Production Engineers
Analysis & data
viz
• ggplot
• dplyr
• matplotlib
• Pandas
• SparkSQL
• …
Pre-processing
& pipelines
• dplyr
• pandas
• scikit-learn
• SparkSQL /
SparkML
• …
Frameworks
• R, scikit-
learn
• SparkML
• TensorFlow
• PyTorch
• LightGBM,
XGBoost
• …
Formats &
mechanisms
• Variety of
formats
• Containers
• …
Iteration &
Experimentation
8
Data Analyze Process Train Deploy
DEG / Nov 18, 2020 / © 2020 IBM Corporation
Data Scientists & Researchers
Load Clean Explore Interpret
Refine
Iteration &
Experimentation
9
Data Process Train Deploy
DEG / Nov 18, 2020 / © 2020 IBM Corporation
Data Scientists & Researchers
Extract
features
Pre-
process
Train Evaluate
Refine
Analyze
Interactive Notebooks
DEG / Nov 18, 2020 / © 2020 IBM Corporation 10
Notebooks have become
the de-facto standard
for content-rich,
interactive & iterative
work
* Logos trademarks of their respective projects
Elyra Overview
DEG / Nov 18, 2020 / © 2020 IBM Corporation 11
Elyra is a set of AI-
centric extensions to
JupyterLab Notebooks
* Logos trademarks of their respective projects
Elyra Key Features
DEG / Nov 18, 2020 / © 2020 IBM Corporation 12
– Visual Pipeline Editor
Visual editor for building AI pipelines,
enabling the conversion of multiple
notebooks into batch jobs or workflows.
– Notebooks as batch jobs
– Python script execution
– Automated Table of Contents
– Code Snippets
– Git integration
Elyra Key Features
DEG / Nov 18, 2020 / © 2020 IBM Corporation 13
– Visual Pipeline Editor
– Notebooks as batch jobs
Extends the notebook UI to simplify the
submission of notebooks as a batch job
for model training
– Python script execution
– Automated Table of Contents
– Code Snippets
– Git integration
Elyra Key Features
DEG / Nov 18, 2020 / © 2020 IBM Corporation 14
– Visual Pipeline Editor
– Notebooks as batch jobs
– Python script execution
Edit and execute python scripts against
local or cloud-based resources
– Automated Table of Contents
– Code Snippets
– Git integration
Elyra Key Features
DEG / Nov 18, 2020 / © 2020 IBM Corporation 15
– Visual Pipeline Editor
– Notebooks as batch jobs
– Python script execution
– Automated Table of Contents
Generate & navigate table of contents
from notebooks & python scripts
– Code Snippets
– Git integration
Elyra Key Features
DEG / Nov 18, 2020 / © 2020 IBM Corporation 16
– Visual Pipeline Editor
– Notebooks as batch jobs
– Python script execution
– Automated Table of Contents
– Code Snippets
Easy creation and insertion of reusable
code snippets for various languages
– Git integration
Elyra Key Features
DEG / Nov 18, 2020 / © 2020 IBM Corporation 17
– Visual Pipeline Editor
– Notebooks as batch jobs
– Python script execution
– Automated Table of Contents
– Code Snippets
– Git integration
Track project changes and share among
teammates
DEG / Nov 18, 2020 / © 2020 IBM Corporation
Getting started with Elyra
1. Try Elyra from Binder
ibm.biz/elyra-demo
2. Run Elyra from Docker
ibm.biz/elyra-docker-installation
3. Install Elyra on your local machine
ibm.biz/elyra-installation
18
DEG / Nov 18, 2020 / © 2020 IBM Corporation 19
Start using Elyra today!
Getting started with Elyra
ibm.biz/elyra-installation
Elyra on Github
github.com/elyra-ai/elyra
Elyra Notebook projects on Github
github.com/CODAIT/flight-delay-notebooks
github.com/CODAIT/covid-notebooks
Contributing to the projects
• Star and fork, submit bug reports, suggest improvements,
help with code reviews, join our community meetings
ibm.biz/elyra-demo
gitter.im/elyra-ai/community
DEG / Nov 18, 2020 / © 2020 IBM Corporation 20
Thank you
codait.org
twitter.com/codait_org
github.com/CODAIT
developer.ibm.com
21DEG / Nov 18, 2020 / © 2020 IBM Corporation
Check out the Data Asset Exchange
https://ibm.biz/data-exchange
Sign up for IBM Cloud
https://ibm.biz/Bdqkfg
DEG / Nov 18, 2020 / © 2020 IBM Corporation 22
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

More Related Content

What's hot

MinIO January 2020 Briefing
MinIO January 2020 BriefingMinIO January 2020 Briefing
MinIO January 2020 BriefingJonathan Symonds
 
Extending kubernetes with CustomResourceDefinitions
Extending kubernetes with CustomResourceDefinitionsExtending kubernetes with CustomResourceDefinitions
Extending kubernetes with CustomResourceDefinitionsStefan Schimanski
 
Présentation de git
Présentation de gitPrésentation de git
Présentation de gitJulien Blin
 
An intro to Kubernetes operators
An intro to Kubernetes operatorsAn intro to Kubernetes operators
An intro to Kubernetes operatorsJ On The Beach
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowSid Anand
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishBruno Cornec
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorDatabricks
 
What's New for GitLab CI/CD February 2020
What's New for GitLab CI/CD February 2020What's New for GitLab CI/CD February 2020
What's New for GitLab CI/CD February 2020Noa Harel
 
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...Kim Hammar
 

What's hot (20)

MinIO January 2020 Briefing
MinIO January 2020 BriefingMinIO January 2020 Briefing
MinIO January 2020 Briefing
 
Extending kubernetes with CustomResourceDefinitions
Extending kubernetes with CustomResourceDefinitionsExtending kubernetes with CustomResourceDefinitions
Extending kubernetes with CustomResourceDefinitions
 
Github
GithubGithub
Github
 
GitHub Presentation
GitHub PresentationGitHub Presentation
GitHub Presentation
 
Présentation de git
Présentation de gitPrésentation de git
Présentation de git
 
An intro to Kubernetes operators
An intro to Kubernetes operatorsAn intro to Kubernetes operators
An intro to Kubernetes operators
 
Building aosp
Building aospBuilding aosp
Building aosp
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
Git & GitLab
Git & GitLabGit & GitLab
Git & GitLab
 
Les bases de git
Les bases de gitLes bases de git
Les bases de git
 
Git training v10
Git training v10Git training v10
Git training v10
 
Github copilot
Github copilotGithub copilot
Github copilot
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live Redfish
 
Git l'essentiel
Git l'essentielGit l'essentiel
Git l'essentiel
 
Gitlab, GitOps & ArgoCD
Gitlab, GitOps & ArgoCDGitlab, GitOps & ArgoCD
Gitlab, GitOps & ArgoCD
 
Gitops Hands On
Gitops Hands OnGitops Hands On
Gitops Hands On
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
 
What's New for GitLab CI/CD February 2020
What's New for GitLab CI/CD February 2020What's New for GitLab CI/CD February 2020
What's New for GitLab CI/CD February 2020
 
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...
 
Github
GithubGithub
Github
 

Similar to Building Notebook-based AI Pipelines with Elyra and Kubeflow

Notebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and KubeflowNotebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and KubeflowNick Pentreath
 
Ai pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksAi pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksLuciano Resende
 
Continuous Deployment for Deep Learning
Continuous Deployment for Deep LearningContinuous Deployment for Deep Learning
Continuous Deployment for Deep LearningDatabricks
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeLuciano Resende
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examplesLuciano Resende
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3DataWorks Summit
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayLuciano Resende
 
Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
 Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e... Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...VMware Tanzu
 
SAPTechED 2015 UX114 -Building custom SAP Fiori Apps Using SAP Web IDE
SAPTechED 2015 UX114 -Building custom SAP Fiori Apps Using SAP Web IDESAPTechED 2015 UX114 -Building custom SAP Fiori Apps Using SAP Web IDE
SAPTechED 2015 UX114 -Building custom SAP Fiori Apps Using SAP Web IDEMarkus Van Kempen
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...Luciano Resende
 
AD308: XPages in a Social World
AD308: XPages in a Social WorldAD308: XPages in a Social World
AD308: XPages in a Social Worldpaidi_ed
 
End-to-End Deep Learning Deployment with ONNX
End-to-End Deep Learning Deployment with ONNXEnd-to-End Deep Learning Deployment with ONNX
End-to-End Deep Learning Deployment with ONNXNick Pentreath
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsLuciano Resende
 
IoT Development from Software Developer Perspective
IoT Development from Software Developer PerspectiveIoT Development from Software Developer Perspective
IoT Development from Software Developer PerspectiveAndri Yadi
 
Integrating Service Mesh with Kubernetes-based connected vehicle platform
Integrating Service Mesh with Kubernetes-based connected vehicle platformIntegrating Service Mesh with Kubernetes-based connected vehicle platform
Integrating Service Mesh with Kubernetes-based connected vehicle platformJun Kai Yong
 
How to deploy machine learning models into production
How to deploy machine learning models into productionHow to deploy machine learning models into production
How to deploy machine learning models into productionDataWorks Summit
 
Scaling up Deep Learning by Scaling Down
Scaling up Deep Learning by Scaling DownScaling up Deep Learning by Scaling Down
Scaling up Deep Learning by Scaling DownDatabricks
 
Deploying End-to-End Deep Learning Pipelines with ONNX
Deploying End-to-End Deep Learning Pipelines with ONNXDeploying End-to-End Deep Learning Pipelines with ONNX
Deploying End-to-End Deep Learning Pipelines with ONNXDatabricks
 
IBM Connect AD206 IBM Domino XPages – Embrace, Extend, Integrate
IBM Connect AD206 IBM Domino XPages –  Embrace, Extend, IntegrateIBM Connect AD206 IBM Domino XPages –  Embrace, Extend, Integrate
IBM Connect AD206 IBM Domino XPages – Embrace, Extend, IntegrateNiklas Heidloff
 

Similar to Building Notebook-based AI Pipelines with Elyra and Kubeflow (20)

Notebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and KubeflowNotebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and Kubeflow
 
Ai pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksAi pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooks
 
Continuous Deployment for Deep Learning
Continuous Deployment for Deep LearningContinuous Deployment for Deep Learning
Continuous Deployment for Deep Learning
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
 
Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
 Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e... Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
 
SAPTechED 2015 UX114 -Building custom SAP Fiori Apps Using SAP Web IDE
SAPTechED 2015 UX114 -Building custom SAP Fiori Apps Using SAP Web IDESAPTechED 2015 UX114 -Building custom SAP Fiori Apps Using SAP Web IDE
SAPTechED 2015 UX114 -Building custom SAP Fiori Apps Using SAP Web IDE
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
 
Evolve18 | Carmen Sutter & Sarah Xu | Accelerate your Digital Experience with...
Evolve18 | Carmen Sutter & Sarah Xu | Accelerate your Digital Experience with...Evolve18 | Carmen Sutter & Sarah Xu | Accelerate your Digital Experience with...
Evolve18 | Carmen Sutter & Sarah Xu | Accelerate your Digital Experience with...
 
AD308: XPages in a Social World
AD308: XPages in a Social WorldAD308: XPages in a Social World
AD308: XPages in a Social World
 
End-to-End Deep Learning Deployment with ONNX
End-to-End Deep Learning Deployment with ONNXEnd-to-End Deep Learning Deployment with ONNX
End-to-End Deep Learning Deployment with ONNX
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
 
IoT Development from Software Developer Perspective
IoT Development from Software Developer PerspectiveIoT Development from Software Developer Perspective
IoT Development from Software Developer Perspective
 
Integrating Service Mesh with Kubernetes-based connected vehicle platform
Integrating Service Mesh with Kubernetes-based connected vehicle platformIntegrating Service Mesh with Kubernetes-based connected vehicle platform
Integrating Service Mesh with Kubernetes-based connected vehicle platform
 
How to deploy machine learning models into production
How to deploy machine learning models into productionHow to deploy machine learning models into production
How to deploy machine learning models into production
 
Scaling up Deep Learning by Scaling Down
Scaling up Deep Learning by Scaling DownScaling up Deep Learning by Scaling Down
Scaling up Deep Learning by Scaling Down
 
Deploying End-to-End Deep Learning Pipelines with ONNX
Deploying End-to-End Deep Learning Pipelines with ONNXDeploying End-to-End Deep Learning Pipelines with ONNX
Deploying End-to-End Deep Learning Pipelines with ONNX
 
IBM Connect AD206 IBM Domino XPages – Embrace, Extend, Integrate
IBM Connect AD206 IBM Domino XPages –  Embrace, Extend, IntegrateIBM Connect AD206 IBM Domino XPages –  Embrace, Extend, Integrate
IBM Connect AD206 IBM Domino XPages – Embrace, Extend, Integrate
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 

Recently uploaded (20)

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 

Building Notebook-based AI Pipelines with Elyra and Kubeflow

  • 1. Notebook-based AI Pipelines with Elyra and Kubeflow Nick Pentreath Principal Engineer, IBM @MLnick
  • 2. About DEG / Nov 18, 2020 / © 2020 IBM Corporation – @MLnick on Twitter, Github, LinkedIn – Principal Engineer, IBM CODAIT (Center for Open-Source Data & AI Technologies) – Machine Learning & AI – Apache Spark committer & PMC – Author of Machine Learning with Spark – Various conferences & meetups 2
  • 3. Improving the Enterprise AI Lifecycle in Open Source DEG / Nov 18, 2020 / © 2020 IBM Corporation 3 – CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise. – We contribute to and advocate for the open-source technologies that are foundational to IBM’s AI offerings. – 30+ open-source developers! Center for Open Source Data & AI Technologies codait.org CODAIT Open Source @ IBM
  • 4. Agenda 4 – Machine learning workflow – JupyerLab & Elyra – Demo – Conclusion DEG / Nov 18, 2020 / © 2020 IBM Corporation
  • 5. Machine Learning Workflow 5 Data Analyze Process Train Deploy Predict & Maintain DEG / Nov 18, 2020 / © 2020 IBM Corporation
  • 6. Workflow spans teams … 6 Data Analyze Process Train Deploy Predict & Maintain DEG / Nov 18, 2020 / © 2020 IBM Corporation Data Engineers Data Scientists & Researchers Machine Learning & Production Engineers
  • 7. … and tools 7 Data Analyze Process Train Deploy DEG / Nov 18, 2020 / © 2020 IBM Corporation Data formats • CSV, SQL • JSON, Parquet, AVRO • Binary (image, audio) • … Data Engineers Data Scientists & Researchers Machine Learning & Production Engineers Analysis & data viz • ggplot • dplyr • matplotlib • Pandas • SparkSQL • … Pre-processing & pipelines • dplyr • pandas • scikit-learn • SparkSQL / SparkML • … Frameworks • R, scikit- learn • SparkML • TensorFlow • PyTorch • LightGBM, XGBoost • … Formats & mechanisms • Variety of formats • Containers • …
  • 8. Iteration & Experimentation 8 Data Analyze Process Train Deploy DEG / Nov 18, 2020 / © 2020 IBM Corporation Data Scientists & Researchers Load Clean Explore Interpret Refine
  • 9. Iteration & Experimentation 9 Data Process Train Deploy DEG / Nov 18, 2020 / © 2020 IBM Corporation Data Scientists & Researchers Extract features Pre- process Train Evaluate Refine Analyze
  • 10. Interactive Notebooks DEG / Nov 18, 2020 / © 2020 IBM Corporation 10 Notebooks have become the de-facto standard for content-rich, interactive & iterative work * Logos trademarks of their respective projects
  • 11. Elyra Overview DEG / Nov 18, 2020 / © 2020 IBM Corporation 11 Elyra is a set of AI- centric extensions to JupyterLab Notebooks * Logos trademarks of their respective projects
  • 12. Elyra Key Features DEG / Nov 18, 2020 / © 2020 IBM Corporation 12 – Visual Pipeline Editor Visual editor for building AI pipelines, enabling the conversion of multiple notebooks into batch jobs or workflows. – Notebooks as batch jobs – Python script execution – Automated Table of Contents – Code Snippets – Git integration
  • 13. Elyra Key Features DEG / Nov 18, 2020 / © 2020 IBM Corporation 13 – Visual Pipeline Editor – Notebooks as batch jobs Extends the notebook UI to simplify the submission of notebooks as a batch job for model training – Python script execution – Automated Table of Contents – Code Snippets – Git integration
  • 14. Elyra Key Features DEG / Nov 18, 2020 / © 2020 IBM Corporation 14 – Visual Pipeline Editor – Notebooks as batch jobs – Python script execution Edit and execute python scripts against local or cloud-based resources – Automated Table of Contents – Code Snippets – Git integration
  • 15. Elyra Key Features DEG / Nov 18, 2020 / © 2020 IBM Corporation 15 – Visual Pipeline Editor – Notebooks as batch jobs – Python script execution – Automated Table of Contents Generate & navigate table of contents from notebooks & python scripts – Code Snippets – Git integration
  • 16. Elyra Key Features DEG / Nov 18, 2020 / © 2020 IBM Corporation 16 – Visual Pipeline Editor – Notebooks as batch jobs – Python script execution – Automated Table of Contents – Code Snippets Easy creation and insertion of reusable code snippets for various languages – Git integration
  • 17. Elyra Key Features DEG / Nov 18, 2020 / © 2020 IBM Corporation 17 – Visual Pipeline Editor – Notebooks as batch jobs – Python script execution – Automated Table of Contents – Code Snippets – Git integration Track project changes and share among teammates
  • 18. DEG / Nov 18, 2020 / © 2020 IBM Corporation Getting started with Elyra 1. Try Elyra from Binder ibm.biz/elyra-demo 2. Run Elyra from Docker ibm.biz/elyra-docker-installation 3. Install Elyra on your local machine ibm.biz/elyra-installation 18
  • 19. DEG / Nov 18, 2020 / © 2020 IBM Corporation 19
  • 20. Start using Elyra today! Getting started with Elyra ibm.biz/elyra-installation Elyra on Github github.com/elyra-ai/elyra Elyra Notebook projects on Github github.com/CODAIT/flight-delay-notebooks github.com/CODAIT/covid-notebooks Contributing to the projects • Star and fork, submit bug reports, suggest improvements, help with code reviews, join our community meetings ibm.biz/elyra-demo gitter.im/elyra-ai/community DEG / Nov 18, 2020 / © 2020 IBM Corporation 20
  • 21. Thank you codait.org twitter.com/codait_org github.com/CODAIT developer.ibm.com 21DEG / Nov 18, 2020 / © 2020 IBM Corporation Check out the Data Asset Exchange https://ibm.biz/data-exchange Sign up for IBM Cloud https://ibm.biz/Bdqkfg
  • 22. DEG / Nov 18, 2020 / © 2020 IBM Corporation 22
  • 23. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.