SlideShare a Scribd company logo
1 of 14
Download to read offline
GENERAL DISTRIBUTION
Democratizing Data Science
on Kubernetes
RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes
RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes
DATA SCIENCE PRESSURES
EXPLOSIVE GROWTH
in data analytics teams and analytic tools
MULTIPLE TEAMS COMPETING
for use of the same storage and computing resources
CONGESTION
in busy analytic clusters causing frustration and missed SLAs
EMERGING DATAOPS
Data Scientist Developers vs Full Stack Developer agility and enablement gaps
RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes
NEED: SHARE CODE (PRODUCT) WITH USERS
Jupyter Notebooks as a technology we could use to combine python code, a GUI, documentation for sharing with
customers.
Start of a Interactive Data Science environment.
Red Hat OpenShift PoC at XOM. Could this new technology benefit us in creating a
Reproducible & Interactive Data Science environment?
Prize: This would enable the team to not only quickly obtain customer feedback, but
also easily utilize Agile Methodology; therefore, quickly delivering MVPs.
Drawback: how does one
avoid the
setup/configuration
issues and reliably
deploy the notebook? Pip install required
Anaconda libraries
Jupyter Notebook Python 3.x
(load onto PC – or setup server)
Local admin access
Access to latest source code
OS?SQL
Server
PC Setup
RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes
LOCAL PC VS OPENSHIFT PROJECT CONTAINERS
Jupyter Notebook
Python 3.x
(image)
Libraries
• Numpy
• Pandas
• Matplotlib
• IPyWidgets
• SciPy
• Lmfit
• Seaborne
• Plotly
SQLite
Container v2.0
GIT
Image project
Code project
OpenShift
URL
to PoCCode
Local PC Setup
pip install required
Anaconda libraries
Jupyter Notebook Python 3.x
(load onto PC – or setup server)
Local admin access
Access to latest source code
OS?SQL
Server
Reproducible Data Science environment that users interact with via Chrome.
Hardware Freedom
& easier
Reproduction!
RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes
For a Data Scientist, the ability to rapidly deploy code and quickly obtain feedback from a
user is extremely valuable and Agile! Openshift facilitates these capabilities!
REPRODUCIBLE & INTERACTIVE SCIENTIFIC ENVIRONMENT
1. Understand the
Problem
2. Suggest
Solutions
Deliver POC
3. Refine the
Problem
Agile
How to Deploy? No worries: Supported Kubernetes with OpenShift
URL
to
PoC
Code
GIT
Image project
Code project
OpenShift
“Interactive” feedback!
RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes
Moving Forward: ExxonMobil Data Science Capability today!
As a Data Scientist (all I care about) is that using Openshift, I can now deploy a common Jupyter Notebook /
Anaconda image (with all required libraries) in a matter of seconds.
Freeing myself (and other Data Scientists) to perform data science and not worry about architecture and delivery
mechanisms. Now that is Democratizing Data Science!
Selected Openshift on premises and public cloud for Container as a Service (CaaS)
• Openshift supports:
• One Click Notebooks and JupyterHub/Lab templates
• Self-service for accessing data & data science packages
• Nexus Repository to allow for Python, Java, R, PHP, .Net Core package managers
• Docker public repository security built-in process – protects against rooted containers
and new CVE attacks
• NVidia GPU support allows for sharing these resources across multiple teams
RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes
DATA SCIENTIST DEVELOPERS NEEDS
All Developers need
● Choice of architectures
● Choice of programming languages
● Choice of databases and persistence
● Choice of application services
● Choice of development tools
● Choice of build and deploy workflows
Data Science Additional Needs
● Access to GPUs and TPUs
● Access to Curated Data
● Automated pipelines
● Collaboration with the Business
● Access to specific data science
languages and toolsets
They don’t want to have to worry about the infrastructure.
RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on KubernetesRICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes
YOUR DIFFERENTIATION DEPENDS ON YOUR
ABILITY TO DELIVER INTELLIGENT APPS FASTER
CONTAINERS, KUBERNETES, DEVOPS & DATAOPS ARE KEY INGREDIENTS
Innovation
Culture
Cloud-native
Applications
AI & Machine
Learning
Internet of
Things
Virtual GPU
RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes
WHY DO CONTAINERS NEED KUBERNETES?
CONTAINERIZED APPLICATIONS
MANAGE CONTAINERS SECURELY
MANAGE CONTAINERS AT SCALE
INTEGRATE IT OPERATIONS
ENABLE HYBRID CLOUD
RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes
REFERENCE ARCHITECTURE
FOR ENTERPRISE KUBERNETES
*coming soon
Automated Operations*
Kubernetes
Red Hat Enterprise Linux or Red Hat CoreOS
Application
Services
CaaS PaaSBest IT Ops Experience Best Developer Experience
Cluster
Services
Developer
Services
Middleware, Service Mesh, Functions, ISV Metrics, Chargeback, Registry, Logging Dev Tools, Automated Builds, CI/CD, IDE
RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes
MODERN DATA ANALYTICS PIPELINE
KEY TERMINOLOGY
DATA
GENERATION
INGEST DATA
SCIENCE
MACHINE
LEARNING
STREAM
PROCESSING
TRANSFORM,
MERGE, JOIN
DATA
ANALYTICS
• IoT Telemetry
• G&G - Well Logs
• Transactions
• Production
• NiFi
• Kafka
• MQTT
• Presto
• Impala
• SparkSQL
• Notebooks
• TensorFlow
• PyTorch
• Keras
• scikit-learn
• Kafka • Hadoop
• Spark
• Pandas
• Apache Arrow
• Spark
• Hadoop
RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes
● Kubeflow
○ Tensorflow
○ Seldon
○ JupyterHub
○ PyTorch
● radanalytics.io
○ Oshinko - Apache Spark Cluster
○ source-to-image (S2I)
● KAML-D - Early Stage JupyterLab Plugin
○ Data Explorer
○ Containerized CURLable data
■ Dotmesh, Minio, Ceph
○ Data Versioning and Metadata
OSS DATA SCIENCE PROJECTS
RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes
● Openshift Self Service Education - https://learn.openshift.com
● Install Minishift -
https://docs.okd.io/latest/minishift/getting-started/installing.html
○ MacOS - brew cask install minishift
○ Manual - https://github.com/minishift/minishift/releases
● Install Jupyter and JupyterHub Openshift templates
○ https://github.com/jupyter-on-openshift/jupyterhub-quickstart
● Review the projects at https://radanalytics.io
HOW CAN I GET STARTED?

More Related Content

What's hot

Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Dataconomy Media
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015DataWorks Summit
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorialrustd
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for ArchitectsTomasz Kopacz
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics SuiteJames Serra
 
Red Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft AzureRed Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft AzureJohn Archer
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAmazon Web Services
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteMark van Rijmenam
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Abhimanyu Singhal
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica
 

What's hot (20)

Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Big Data with Azure
Big Data with AzureBig Data with Azure
Big Data with Azure
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
 
Red Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft AzureRed Hat Openshift on Microsoft Azure
Red Hat Openshift on Microsoft Azure
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
 
Azure Big data
Azure Big data Azure Big data
Azure Big data
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes Keynote
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
 

Similar to Democratizing Data Science on Kubernetes

Delivering Agile Data Science on Openshift - Red Hat Summit 2019
Delivering Agile Data Science on Openshift  - Red Hat Summit 2019Delivering Agile Data Science on Openshift  - Red Hat Summit 2019
Delivering Agile Data Science on Openshift - Red Hat Summit 2019John Archer
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...Abhinav Joshi
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and FutureKeiichiro Ono
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Tushar Katarki
 
DDDP 2019 - Brown to Green
DDDP 2019  - Brown to GreenDDDP 2019  - Brown to Green
DDDP 2019 - Brown to GreenJohn Archer
 
Red hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategyRed hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategyOrgad Kimchi
 
What's New in Cytoscape
What's New in CytoscapeWhat's New in Cytoscape
What's New in CytoscapeKeiichiro Ono
 
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud ComputingOSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud ComputingMark Hinkle
 
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford ConsortiumSDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford ConsortiumKeiichiro Ono
 
Future of jobs and digital economy citi conference 090618
Future of jobs and digital economy citi conference 090618Future of jobs and digital economy citi conference 090618
Future of jobs and digital economy citi conference 090618Economic Strategy Institute
 
Bahrain ch9 introduction to docker 5th birthday
Bahrain ch9 introduction to docker 5th birthday Bahrain ch9 introduction to docker 5th birthday
Bahrain ch9 introduction to docker 5th birthday Walid Shaari
 
OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021OpenACC
 
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Mark Goldstein
 
Oscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby projectOscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby projectPatrick Chanezon
 
HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...
HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...
HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...Dana Gardner
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRBWilliam Poos
 

Similar to Democratizing Data Science on Kubernetes (20)

Delivering Agile Data Science on Openshift - Red Hat Summit 2019
Delivering Agile Data Science on Openshift  - Red Hat Summit 2019Delivering Agile Data Science on Openshift  - Red Hat Summit 2019
Delivering Agile Data Science on Openshift - Red Hat Summit 2019
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 
DDDP 2019 - Brown to Green
DDDP 2019  - Brown to GreenDDDP 2019  - Brown to Green
DDDP 2019 - Brown to Green
 
Red hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategyRed hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategy
 
What's New in Cytoscape
What's New in CytoscapeWhat's New in Cytoscape
What's New in Cytoscape
 
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud ComputingOSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
 
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford ConsortiumSDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
 
CTE Phase III
CTE Phase IIICTE Phase III
CTE Phase III
 
Mundi
MundiMundi
Mundi
 
Future of jobs and digital economy citi conference 090618
Future of jobs and digital economy citi conference 090618Future of jobs and digital economy citi conference 090618
Future of jobs and digital economy citi conference 090618
 
Bahrain ch9 introduction to docker 5th birthday
Bahrain ch9 introduction to docker 5th birthday Bahrain ch9 introduction to docker 5th birthday
Bahrain ch9 introduction to docker 5th birthday
 
OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021OpenACC Monthly Highlights: February 2021
OpenACC Monthly Highlights: February 2021
 
Digital transformation and AI @Edge
Digital transformation and AI @EdgeDigital transformation and AI @Edge
Digital transformation and AI @Edge
 
Nimbus Concept
Nimbus ConceptNimbus Concept
Nimbus Concept
 
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
 
Oscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby projectOscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby project
 
HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...
HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...
HPE’s Erik Vogel on Key Factors for Driving Success in Hybrid Cloud Adoption ...
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRB
 

More from John Archer

Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdfEnabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdfJohn Archer
 
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...John Archer
 
Leveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformationLeveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformationJohn Archer
 
Locationless data science on a modern secure edge
Locationless data science on a modern secure edgeLocationless data science on a modern secure edge
Locationless data science on a modern secure edgeJohn Archer
 
Red Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus IntroductionRed Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus IntroductionJohn Archer
 
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data ScienceOpenshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data ScienceJohn Archer
 
Single View of Well, Production and Assets
Single View of Well, Production and AssetsSingle View of Well, Production and Assets
Single View of Well, Production and AssetsJohn Archer
 
Field development and operational optimization for unconventionals
 Field development and operational optimization for unconventionals Field development and operational optimization for unconventionals
Field development and operational optimization for unconventionalsJohn Archer
 

More from John Archer (8)

Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdfEnabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdf
 
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
 
Leveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformationLeveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformation
 
Locationless data science on a modern secure edge
Locationless data science on a modern secure edgeLocationless data science on a modern secure edge
Locationless data science on a modern secure edge
 
Red Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus IntroductionRed Hat Java Update and Quarkus Introduction
Red Hat Java Update and Quarkus Introduction
 
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data ScienceOpenshift 3.10 & Container solutions for Blockchain, IoT and Data Science
Openshift 3.10 & Container solutions for Blockchain, IoT and Data Science
 
Single View of Well, Production and Assets
Single View of Well, Production and AssetsSingle View of Well, Production and Assets
Single View of Well, Production and Assets
 
Field development and operational optimization for unconventionals
 Field development and operational optimization for unconventionals Field development and operational optimization for unconventionals
Field development and operational optimization for unconventionals
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Democratizing Data Science on Kubernetes

  • 2. RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes
  • 3. RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes DATA SCIENCE PRESSURES EXPLOSIVE GROWTH in data analytics teams and analytic tools MULTIPLE TEAMS COMPETING for use of the same storage and computing resources CONGESTION in busy analytic clusters causing frustration and missed SLAs EMERGING DATAOPS Data Scientist Developers vs Full Stack Developer agility and enablement gaps
  • 4. RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes NEED: SHARE CODE (PRODUCT) WITH USERS Jupyter Notebooks as a technology we could use to combine python code, a GUI, documentation for sharing with customers. Start of a Interactive Data Science environment. Red Hat OpenShift PoC at XOM. Could this new technology benefit us in creating a Reproducible & Interactive Data Science environment? Prize: This would enable the team to not only quickly obtain customer feedback, but also easily utilize Agile Methodology; therefore, quickly delivering MVPs. Drawback: how does one avoid the setup/configuration issues and reliably deploy the notebook? Pip install required Anaconda libraries Jupyter Notebook Python 3.x (load onto PC – or setup server) Local admin access Access to latest source code OS?SQL Server PC Setup
  • 5. RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes LOCAL PC VS OPENSHIFT PROJECT CONTAINERS Jupyter Notebook Python 3.x (image) Libraries • Numpy • Pandas • Matplotlib • IPyWidgets • SciPy • Lmfit • Seaborne • Plotly SQLite Container v2.0 GIT Image project Code project OpenShift URL to PoCCode Local PC Setup pip install required Anaconda libraries Jupyter Notebook Python 3.x (load onto PC – or setup server) Local admin access Access to latest source code OS?SQL Server Reproducible Data Science environment that users interact with via Chrome. Hardware Freedom & easier Reproduction!
  • 6. RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes For a Data Scientist, the ability to rapidly deploy code and quickly obtain feedback from a user is extremely valuable and Agile! Openshift facilitates these capabilities! REPRODUCIBLE & INTERACTIVE SCIENTIFIC ENVIRONMENT 1. Understand the Problem 2. Suggest Solutions Deliver POC 3. Refine the Problem Agile How to Deploy? No worries: Supported Kubernetes with OpenShift URL to PoC Code GIT Image project Code project OpenShift “Interactive” feedback!
  • 7. RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes Moving Forward: ExxonMobil Data Science Capability today! As a Data Scientist (all I care about) is that using Openshift, I can now deploy a common Jupyter Notebook / Anaconda image (with all required libraries) in a matter of seconds. Freeing myself (and other Data Scientists) to perform data science and not worry about architecture and delivery mechanisms. Now that is Democratizing Data Science! Selected Openshift on premises and public cloud for Container as a Service (CaaS) • Openshift supports: • One Click Notebooks and JupyterHub/Lab templates • Self-service for accessing data & data science packages • Nexus Repository to allow for Python, Java, R, PHP, .Net Core package managers • Docker public repository security built-in process – protects against rooted containers and new CVE attacks • NVidia GPU support allows for sharing these resources across multiple teams
  • 8. RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes DATA SCIENTIST DEVELOPERS NEEDS All Developers need ● Choice of architectures ● Choice of programming languages ● Choice of databases and persistence ● Choice of application services ● Choice of development tools ● Choice of build and deploy workflows Data Science Additional Needs ● Access to GPUs and TPUs ● Access to Curated Data ● Automated pipelines ● Collaboration with the Business ● Access to specific data science languages and toolsets They don’t want to have to worry about the infrastructure.
  • 9. RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on KubernetesRICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes YOUR DIFFERENTIATION DEPENDS ON YOUR ABILITY TO DELIVER INTELLIGENT APPS FASTER CONTAINERS, KUBERNETES, DEVOPS & DATAOPS ARE KEY INGREDIENTS Innovation Culture Cloud-native Applications AI & Machine Learning Internet of Things Virtual GPU
  • 10. RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes WHY DO CONTAINERS NEED KUBERNETES? CONTAINERIZED APPLICATIONS MANAGE CONTAINERS SECURELY MANAGE CONTAINERS AT SCALE INTEGRATE IT OPERATIONS ENABLE HYBRID CLOUD
  • 11. RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes REFERENCE ARCHITECTURE FOR ENTERPRISE KUBERNETES *coming soon Automated Operations* Kubernetes Red Hat Enterprise Linux or Red Hat CoreOS Application Services CaaS PaaSBest IT Ops Experience Best Developer Experience Cluster Services Developer Services Middleware, Service Mesh, Functions, ISV Metrics, Chargeback, Registry, Logging Dev Tools, Automated Builds, CI/CD, IDE
  • 12. RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes MODERN DATA ANALYTICS PIPELINE KEY TERMINOLOGY DATA GENERATION INGEST DATA SCIENCE MACHINE LEARNING STREAM PROCESSING TRANSFORM, MERGE, JOIN DATA ANALYTICS • IoT Telemetry • G&G - Well Logs • Transactions • Production • NiFi • Kafka • MQTT • Presto • Impala • SparkSQL • Notebooks • TensorFlow • PyTorch • Keras • scikit-learn • Kafka • Hadoop • Spark • Pandas • Apache Arrow • Spark • Hadoop
  • 13. RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes ● Kubeflow ○ Tensorflow ○ Seldon ○ JupyterHub ○ PyTorch ● radanalytics.io ○ Oshinko - Apache Spark Cluster ○ source-to-image (S2I) ● KAML-D - Early Stage JupyterLab Plugin ○ Data Explorer ○ Containerized CURLable data ■ Dotmesh, Minio, Ceph ○ Data Versioning and Metadata OSS DATA SCIENCE PROJECTS
  • 14. RICE DATA SCIENCE CONFERENCE - 2018 - Democratizing Data Science on Kubernetes ● Openshift Self Service Education - https://learn.openshift.com ● Install Minishift - https://docs.okd.io/latest/minishift/getting-started/installing.html ○ MacOS - brew cask install minishift ○ Manual - https://github.com/minishift/minishift/releases ● Install Jupyter and JupyterHub Openshift templates ○ https://github.com/jupyter-on-openshift/jupyterhub-quickstart ● Review the projects at https://radanalytics.io HOW CAN I GET STARTED?