SlideShare a Scribd company logo
1 of 82
© 2017 MapR Technologies 1
Progress for big data in
Kubernetes
© 2017 MapR Technologies 2
kubernetes is coming!
© 2017 MapR Technologies 3
why?
© 2017 MapR Technologies 4
Source: Shippable.com http://blog.shippable.com/why-the-adoption-of-kubernetes-will-explode-in-2018
kubernetes = major community support
© 2017 MapR Technologies 5
every cloud supports kubernetes
https://www.sinax.be/en/aws/
https://www.westconcomstor.com/za/en/vendors/wc-vendors/microsoft-azure-EN-UK.html
https://www.g2crowd.com/products/google-kubernetes-engine-gke/details
© 2017 MapR Technologies 6
massive customer adoption rate
© 2017 MapR Technologies 7
© 2017 MapR Technologies 8
what is kubernetes?
© 2017 MapR Technologies 9
kubernetes (n.) - greek word for pilot or
helm
© 2017 MapR Technologies 10
kubernetes started life as a
successor to google’s borg
project...
https://cloud.google.com/security/encryption-in-transit/
© 2017 MapR Technologies 11
kubernetes is an ecosystem...
Source: Redmonk - http://redmonk.com/sogrady/2017/09/22/cloud-native-license-choices/
© 2017 MapR Technologies 12
container and resource orchestration engine...
© 2017 MapR Technologies 13
Source: Shippable.com http://blog.shippable.com/why-the-adoption-of-kubernetes-will-explode-in-2018
kubernetes won the container orchestration war...
© 2017 MapR Technologies 14
what is kubernetes?
© 2017 MapR Technologies 15
it runs containers
© 2017 MapR Technologies 16
what is a container?
© 2017 MapR Technologies 17
not a vm
© 2017 MapR Technologies 18
hardware
vm vs container
os
hypervisor
vm
os
libs
app
vm
os
libs
app
hardware
os
container
libs
app
container
libs
app
container
libs
app
© 2017 MapR Technologies 19
pets vs cattle
https://fwallpapers.com/view/cat-jeans
http://www.clipartpanda.com/clipart_images/free-clip-art-1083418
© 2017 MapR Technologies 20
pets vs cattle
- long lived
- name them
- care for them
- ephemeral
- brand them with #’s
- well..vets are expensive
© 2017 MapR Technologies 21
© 2017 MapR Technologies 22
isolation
cgroups
● cpu
● memory
● network
● etc.
namespaces
● pids
● mnts
● etc.
Chroot (filesystem)
© 2017 MapR Technologies 23
File File Read-only Layer
container images
© 2017 MapR Technologies 24
File File
File
Read-only Layer
Read-only Layer
container images
© 2017 MapR Technologies 25
File File
File
Read-only Layer
Read-only Layer
Writable Layer
container images
© 2017 MapR Technologies 26
File File File Container Image
chroot
container = image + isolation
cgroups
● cpu
● memory
● network
● etc.
namespaces
● pids
● mnts
● etc.
© 2017 MapR Technologies 27
containers have a problem
© 2017 MapR Technologies 28
you can never get away from
pets unless:
- you handle the problem of
container state
- you need an environment to
support cattle
MapR and kubernetes are the
solution
© 2017 MapR Technologies 29
Things docker can’t (or won’t) do...
• solve port mapping hell
• monitor running containers
• handle dead containers
• move containers so utilization improves
• autoscale container instances to handle load
© 2017 MapR Technologies 30
Magical View of Kubernetes
© 2017 MapR Technologies 31
App 1
Kubernetes
Magical View of Kubernetes
Kubernetes starts application
containers “somewhere”
© 2017 MapR Technologies 32
Magical View of Kubernetes
App 1 App 3
Kubernetes
Later containers may be started
elsewhere due to “affinities”
© 2017 MapR Technologies 33
Magical View of Kubernetes
App 1 App 2 App 3
Kubernetes
Kubernetes provides super fast
naming via DNS so containers
can find each other
© 2017 MapR Technologies 34
Note that you don’t think about
which machine at all
© 2017 MapR Technologies 35
You don’t think about which
machine at all
No more names from The Hobbit
Just cattle
© 2017 MapR Technologies 36
The Impact of Kubernetes
• Software engineering can be viewed as freezing bits
• Initially, everything is possible, nothing is actual
• We freeze the source
Then the binary
Then the package
Then the environment
Ultimately the system
© 2017 MapR Technologies 37
© 2017 MapR Technologies 38
© 2017 MapR Technologies 39
© 2017 MapR Technologies 40
git cc/ld
java/jar
Build
docker build
resources
config libraries
Package
helm package
Construct
© 2017 MapR Technologies 41
git cc/ld
java/jar
Build
docker build
resources
config libraries
Package
helm package
Construct
helm install/scale
Load balancer
Deploy
© 2017 MapR Technologies 42
This is glorious
© 2017 MapR Technologies 43
but we still have a problem
© 2017 MapR Technologies 44
state
© 2017 MapR Technologies 45
Not Done Yet
Load balancer
© 2017 MapR Technologies 46
Not Done Yet
Load balancer
Here’s the problem
© 2017 MapR Technologies 47
Not Really Ready at All
• State in containers messes
things up
• Restarts lose the state
• Replicating state makes
services complex
• Application developers just
aren’t systems developers
• State life-cycle doesn’t
match app life-cycle
Load balancer
Here’s the problem
© 2017 MapR Technologies 48
What is a Service Anyway?
Load balancer
RPC in
© 2017 MapR Technologies 49
But … Not Entirely
• Synchronous RPC-based services only serve one need
• In a synchronous service it’s common to do some, defer some
• But deferring work is hard in a synchronous world … we have
to give up the return call in some sense
• This is the germ of streaming architecture
© 2017 MapR Technologies 50
Deferred
What is a Service Anyway?
Load balancer
RPC in
© 2017 MapR Technologies 51
Isolation is The Defining Characteristic
• If I can hide details of who and where, I have a service
• If I can hide details of deployment, I have a micro-service
• If I can hide details of when, I have a streaming micro-service
© 2017 MapR Technologies 52
Temporal and Geo Isolation
Producer Consumer isn’t even running
© 2017 MapR Technologies 53
Temporal and Geo Isolation
Producer Consumer
© 2017 MapR Technologies 54
Producer
Consumer
Temporal and Geo Isolation
Consumer could be an ocean away
© 2017 MapR Technologies 55
We Need Multiple Forms of Persistence
• Files are important
– Config files, image files, archival data data
– Legacy applications like machine learning, web
• Tables are important
– Critical to have random update for some applications
– Should scale transparently without dedicated cluster
• Streams are important
– Should be co-equal form of persistence
© 2017 MapR Technologies 56
App 1 App 2 App 3
© 2017 MapR Technologies 57
App 1 App 2 App 3
stream
LogFile
© 2017 MapR Technologies 58
App 1 App 2 App 3
stream
LogFile
© 2017 MapR Technologies 59
App 1 App 2 App 3
© 2017 MapR Technologies 60
What Does This Data Platform Need to Have?
• Global namespace across entire Kubernetes cluster
– Between clusters as well if possible
• All three forms of primitive persistence
– Files, streams, tables
• Inherently scalable
– Performance, cardinality, locality
• Uniform access and control
– Path names for all objects, identical permission scheme
© 2017 MapR Technologies 61
What Does This Data Platform Need to Have?
• Global namespace across entire Kubernetes cluster
– Between clusters as well if possible
• All three forms of primitive persistence
– Files, streams, tables
• Inherently scalable
– Performance, cardinality, locality
• Uniform access and control
– Path names for all objects, identical permission scheme
• Oh…. got that already. Just need to wire it up to Kubernetes
© 2017 MapR Technologies 62
© 2017 MapR Technologies 63
© 2017 MapR Technologies 64
kubelet
docker
pod
Normally pods interact
directly with node resources
© 2017 MapR Technologies 65
kubelet
docker
pod
plugin We can install a volume
plugin (recently introduced)
© 2017 MapR Technologies 66
mapr-
fs
kubelet
docker
pod
plugin
fuse
This allows uniform access
to files, tables and streams
© 2017 MapR Technologies 67
Where does that take us?
© 2017 MapR Technologies 68
Consequences
• Installation of plugin is K8S level operation
– No per-node attention required
• Use of plugin is overlay operation
– No change needed for an container
– Any Helm chart can use the plugin for conventional file access
• Can share storage/compute or isolate or scale independently
© 2017 MapR Technologies 69
More Consequences
• State is no longer a dirty word for Kubernetes
• HPC can run on K8S
• Boring things can run on K8S without storage appliances
• Previously crazy ideas can now be valuable
• Complexity is largely not visible
© 2017 MapR Technologies 70
Cloud as-is: No unified data access or security concepts
Public Cloud
Application
✓
API
API Connector
Single cloud vendor strategy:
• Vendor lock in
• No failover in case of global outage
• Limited Edge capabilities
AWS Services:
• Kinesis & Elastic MapReduce
• Redshift & DynamoDB
• S3 & Glacier
© 2017 MapR Technologies 71
Cloud as-is: No unified data access or security concepts
Public Cloud
Application
✓
API
API Connector
AWS Services:
• Kinesis & Elastic MapReduce
• Redshift & DynamoDB
• S3 & Glacier
API
Azure Services:
• HD Insight
• SQL Server & CosmosDB
• Blob & DataLakeStore
Multi cloud strategy:
• Complex data movement between clouds
• On any other cloud:
• Different API‘s: application breaks
• Different Security concept
© 2017 MapR Technologies 72
Cloud as-is: No unified data access or security concepts
Edge Private Cloud
On Premise
Public Cloud Public Cloud Public Cloud
API
Application
✓
API API API API
API Connector
Multi cloud strategy:
• Complex data movement between clouds
• On any other cloud:
• Different API‘s, application breaks
• Different Security concept
© 2017 MapR Technologies 73
Open APIs
How a “Media Company” is using MapR
Application
• Unified Security Model
• Data access decoupled from physical
storage location. Globally.
• No lock-in to proprietary APIs
• Full openness
• Data made portable
API Connector
✓
GLOBAL DATA MANAGEMENT
Edge Private Cloud
On Premise
Public Cloud Public Cloud Public Cloud
Uniform computing
environment everywhere
© 2017 MapR Technologies 74
Open APIs
How “Manufacturing Company” is using MapR
Application
• Unified Security Model
• Data access decoupled from physical
storage location. Globally.
• No lock-in to proprietary APIs
• Full openness
• Data made portable
API Connector
✓
GLOBAL DATA MANAGEMENT
Edge Private Cloud
On Premise
Public Cloud Public Cloud Public Cloud
Platform level
data replication
© 2017 MapR Technologies 75
Tier 1 Bank #1 Creating a Global Filesystem
/mapr/edge1
/mapr/edge2
/mapr/edge3
/mapr/newyork
/mapr/amsterdam
/mapr/azure /mapr/gcp
/mapr/aws-eu-west
NFS POSIX HDFSREST
HOT
WARM
COLD
/mapr
Kafka JSON HBASE SQL S3
Application
✓
Global access
to local data
© 2017 MapR Technologies 76
Tier 1 Bank #2: Creating an “Ubernetes” Platform with MapR
Application
GLOBAL DATA MANAGEMENT
Edge Private Cloud
On Premise
Public Cloud Public Cloud Public Cloud
PodPod Pod Image Classification using
Tensorflow in a Docker container
Classic ETL
Scheduling & Scaling
MapR Kubernetes Volume Driver
Single pane of glass to
control jobs anywhere
© 2017 MapR Technologies 77
Additional Resources
O’Reilly report by Ted Dunning & Ellen Friedman © September 2017
Read free courtesy of MapR:
https://mapr.com/ebook/machine-learning-logistics/
O’Reilly book by Ted Dunning & Ellen Friedman
© March 2016
Read free courtesy of MapR:
https://mapr.com/streaming-architecture-using-
apache-kafka-mapr-streams/
Ted Dunning & Ellen Friedman
Model Management in theReal World
MachineLearning
Logistics
© 2017 MapR Technologies 78
Additional Resources
O’Reilly book by Ted Dunning & Ellen Friedman
© June 2014
Read free courtesy of MapR:
https://mapr.com/practical-machine-learning-new-
look-anomaly-detection/
O’Reilly book by Ellen Friedman & Ted Dunning
© February 2014
Read free courtesy of MapR:
https://mapr.com/practical-machine-learning/
© 2017 MapR Technologies 79
Additional Resources
by Ellen Friedman 8 Aug 2017 on MapR blog:
https://mapr.com/blog/tensorflow-mxnet-caffe-h2o-which-ml-best/
by Ted Dunning 13 Sept 2017 in
InfoWorld:
https://www.infoworld.com/article/3223
688/machine-learning/machine-
learning-skills-for-software-
engineers.html
© 2017 MapR Technologies 80
New Book!
We will be signing this book at the MapR booth
later today.
Detailed schedule at the booth.
© 2017 MapR Technologies 81
Please support women in tech – help build
girls’ dreams of what they can accomplish
© Ellen Friedman 2015#womenintech #datawomen
© 2017 MapR Technologies 82
Q&A
@mapr
tdunning@mapr.com
ENGAGE WITH US
@ Ted_Dunning

More Related Content

What's hot

Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNHadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
DataWorks Summit
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
Nathan Bijnens
 
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...
DataWorks Summit
 

What's hot (20)

In-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software EngineersIn-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software Engineers
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
 
Webinar: Déployez facilement Kubernetes & vos containers
Webinar: Déployez facilement Kubernetes & vos containersWebinar: Déployez facilement Kubernetes & vos containers
Webinar: Déployez facilement Kubernetes & vos containers
 
In-Memory Computing Essentials
In-Memory Computing EssentialsIn-Memory Computing Essentials
In-Memory Computing Essentials
 
Save 60% of Kubernetes storage costs on AWS & others with OpenEBS
Save 60% of Kubernetes storage costs on AWS & others with OpenEBSSave 60% of Kubernetes storage costs on AWS & others with OpenEBS
Save 60% of Kubernetes storage costs on AWS & others with OpenEBS
 
GKE Tip Series how do i choose between gke standard, autopilot and cloud run
GKE Tip Series   how do i choose between gke standard, autopilot and cloud run GKE Tip Series   how do i choose between gke standard, autopilot and cloud run
GKE Tip Series how do i choose between gke standard, autopilot and cloud run
 
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNHadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
 
Webinar: Using Litmus Chaos Engineering and AI for auto incident detection
Webinar: Using Litmus Chaos Engineering and AI for auto incident detectionWebinar: Using Litmus Chaos Engineering and AI for auto incident detection
Webinar: Using Litmus Chaos Engineering and AI for auto incident detection
 
Google Cloud - Stand Out Features
Google Cloud - Stand Out FeaturesGoogle Cloud - Stand Out Features
Google Cloud - Stand Out Features
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
Continuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteContinuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache Ignite
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Cloudera DataTalks 2019 Bangalore - YuniKorn A next generation scheduler for ...
Cloudera DataTalks 2019 Bangalore - YuniKorn A next generation scheduler for ...Cloudera DataTalks 2019 Bangalore - YuniKorn A next generation scheduler for ...
Cloudera DataTalks 2019 Bangalore - YuniKorn A next generation scheduler for ...
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
 
JFall 2018 k8s patterns
JFall 2018 k8s patternsJFall 2018 k8s patterns
JFall 2018 k8s patterns
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
Integrating Vert.x
Integrating Vert.xIntegrating Vert.x
Integrating Vert.x
 
"Embedded Lucas-Kanade Tracking: How it Works, How to Implement It, and How t...
"Embedded Lucas-Kanade Tracking: How it Works, How to Implement It, and How t..."Embedded Lucas-Kanade Tracking: How it Works, How to Implement It, and How t...
"Embedded Lucas-Kanade Tracking: How it Works, How to Implement It, and How t...
 
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...
 
Post quantum cryptography in vault (hashi talks 2020)
Post quantum cryptography in vault (hashi talks 2020)Post quantum cryptography in vault (hashi talks 2020)
Post quantum cryptography in vault (hashi talks 2020)
 

Similar to Progress for big data in Kubernetes

Similar to Progress for big data in Kubernetes (20)

Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
Containers and Kubernetes without limits
Containers and Kubernetes without limitsContainers and Kubernetes without limits
Containers and Kubernetes without limits
 
Mesos swam-kubernetes-vds-02062017
Mesos swam-kubernetes-vds-02062017Mesos swam-kubernetes-vds-02062017
Mesos swam-kubernetes-vds-02062017
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and Kubernetes
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond Kubernetes
 
Running and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStackRunning and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStack
 
Completing the Microservices Puzzle: Kubernetes, Prometheus and FreshTracks.io
Completing the Microservices Puzzle: Kubernetes, Prometheus and FreshTracks.ioCompleting the Microservices Puzzle: Kubernetes, Prometheus and FreshTracks.io
Completing the Microservices Puzzle: Kubernetes, Prometheus and FreshTracks.io
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
 
Episode 2: Deploying Kubernetes at Scale
Episode 2: Deploying Kubernetes at ScaleEpisode 2: Deploying Kubernetes at Scale
Episode 2: Deploying Kubernetes at Scale
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
Storage for containers and cloud-native deployments - Rancher Online Meetup -...
Storage for containers and cloud-native deployments - Rancher Online Meetup -...Storage for containers and cloud-native deployments - Rancher Online Meetup -...
Storage for containers and cloud-native deployments - Rancher Online Meetup -...
 
A Hitchhiker's Guide to the Cloud Native Stack
A Hitchhiker's Guide to the Cloud Native StackA Hitchhiker's Guide to the Cloud Native Stack
A Hitchhiker's Guide to the Cloud Native Stack
 
A Hitchhiker’s Guide to the Cloud Native Stack. #DevoxxPL
A Hitchhiker’s Guide to the Cloud Native Stack. #DevoxxPLA Hitchhiker’s Guide to the Cloud Native Stack. #DevoxxPL
A Hitchhiker’s Guide to the Cloud Native Stack. #DevoxxPL
 
Graph Day 2017 Spring Boot
Graph Day 2017 Spring BootGraph Day 2017 Spring Boot
Graph Day 2017 Spring Boot
 
Hybrid cloud openstack meetup
Hybrid cloud openstack meetupHybrid cloud openstack meetup
Hybrid cloud openstack meetup
 
Operating Kubernetes at Scale (Australia Presentation)
Operating Kubernetes at Scale (Australia Presentation)Operating Kubernetes at Scale (Australia Presentation)
Operating Kubernetes at Scale (Australia Presentation)
 
Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"
 

More from Ted Dunning

Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
Ted Dunning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
Ted Dunning
 

More from Ted Dunning (20)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache Mahout
 

Recently uploaded

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Recently uploaded (20)

%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 

Progress for big data in Kubernetes

  • 1. © 2017 MapR Technologies 1 Progress for big data in Kubernetes
  • 2. © 2017 MapR Technologies 2 kubernetes is coming!
  • 3. © 2017 MapR Technologies 3 why?
  • 4. © 2017 MapR Technologies 4 Source: Shippable.com http://blog.shippable.com/why-the-adoption-of-kubernetes-will-explode-in-2018 kubernetes = major community support
  • 5. © 2017 MapR Technologies 5 every cloud supports kubernetes https://www.sinax.be/en/aws/ https://www.westconcomstor.com/za/en/vendors/wc-vendors/microsoft-azure-EN-UK.html https://www.g2crowd.com/products/google-kubernetes-engine-gke/details
  • 6. © 2017 MapR Technologies 6 massive customer adoption rate
  • 7. © 2017 MapR Technologies 7
  • 8. © 2017 MapR Technologies 8 what is kubernetes?
  • 9. © 2017 MapR Technologies 9 kubernetes (n.) - greek word for pilot or helm
  • 10. © 2017 MapR Technologies 10 kubernetes started life as a successor to google’s borg project... https://cloud.google.com/security/encryption-in-transit/
  • 11. © 2017 MapR Technologies 11 kubernetes is an ecosystem... Source: Redmonk - http://redmonk.com/sogrady/2017/09/22/cloud-native-license-choices/
  • 12. © 2017 MapR Technologies 12 container and resource orchestration engine...
  • 13. © 2017 MapR Technologies 13 Source: Shippable.com http://blog.shippable.com/why-the-adoption-of-kubernetes-will-explode-in-2018 kubernetes won the container orchestration war...
  • 14. © 2017 MapR Technologies 14 what is kubernetes?
  • 15. © 2017 MapR Technologies 15 it runs containers
  • 16. © 2017 MapR Technologies 16 what is a container?
  • 17. © 2017 MapR Technologies 17 not a vm
  • 18. © 2017 MapR Technologies 18 hardware vm vs container os hypervisor vm os libs app vm os libs app hardware os container libs app container libs app container libs app
  • 19. © 2017 MapR Technologies 19 pets vs cattle https://fwallpapers.com/view/cat-jeans http://www.clipartpanda.com/clipart_images/free-clip-art-1083418
  • 20. © 2017 MapR Technologies 20 pets vs cattle - long lived - name them - care for them - ephemeral - brand them with #’s - well..vets are expensive
  • 21. © 2017 MapR Technologies 21
  • 22. © 2017 MapR Technologies 22 isolation cgroups ● cpu ● memory ● network ● etc. namespaces ● pids ● mnts ● etc. Chroot (filesystem)
  • 23. © 2017 MapR Technologies 23 File File Read-only Layer container images
  • 24. © 2017 MapR Technologies 24 File File File Read-only Layer Read-only Layer container images
  • 25. © 2017 MapR Technologies 25 File File File Read-only Layer Read-only Layer Writable Layer container images
  • 26. © 2017 MapR Technologies 26 File File File Container Image chroot container = image + isolation cgroups ● cpu ● memory ● network ● etc. namespaces ● pids ● mnts ● etc.
  • 27. © 2017 MapR Technologies 27 containers have a problem
  • 28. © 2017 MapR Technologies 28 you can never get away from pets unless: - you handle the problem of container state - you need an environment to support cattle MapR and kubernetes are the solution
  • 29. © 2017 MapR Technologies 29 Things docker can’t (or won’t) do... • solve port mapping hell • monitor running containers • handle dead containers • move containers so utilization improves • autoscale container instances to handle load
  • 30. © 2017 MapR Technologies 30 Magical View of Kubernetes
  • 31. © 2017 MapR Technologies 31 App 1 Kubernetes Magical View of Kubernetes Kubernetes starts application containers “somewhere”
  • 32. © 2017 MapR Technologies 32 Magical View of Kubernetes App 1 App 3 Kubernetes Later containers may be started elsewhere due to “affinities”
  • 33. © 2017 MapR Technologies 33 Magical View of Kubernetes App 1 App 2 App 3 Kubernetes Kubernetes provides super fast naming via DNS so containers can find each other
  • 34. © 2017 MapR Technologies 34 Note that you don’t think about which machine at all
  • 35. © 2017 MapR Technologies 35 You don’t think about which machine at all No more names from The Hobbit Just cattle
  • 36. © 2017 MapR Technologies 36 The Impact of Kubernetes • Software engineering can be viewed as freezing bits • Initially, everything is possible, nothing is actual • We freeze the source Then the binary Then the package Then the environment Ultimately the system
  • 37. © 2017 MapR Technologies 37
  • 38. © 2017 MapR Technologies 38
  • 39. © 2017 MapR Technologies 39
  • 40. © 2017 MapR Technologies 40 git cc/ld java/jar Build docker build resources config libraries Package helm package Construct
  • 41. © 2017 MapR Technologies 41 git cc/ld java/jar Build docker build resources config libraries Package helm package Construct helm install/scale Load balancer Deploy
  • 42. © 2017 MapR Technologies 42 This is glorious
  • 43. © 2017 MapR Technologies 43 but we still have a problem
  • 44. © 2017 MapR Technologies 44 state
  • 45. © 2017 MapR Technologies 45 Not Done Yet Load balancer
  • 46. © 2017 MapR Technologies 46 Not Done Yet Load balancer Here’s the problem
  • 47. © 2017 MapR Technologies 47 Not Really Ready at All • State in containers messes things up • Restarts lose the state • Replicating state makes services complex • Application developers just aren’t systems developers • State life-cycle doesn’t match app life-cycle Load balancer Here’s the problem
  • 48. © 2017 MapR Technologies 48 What is a Service Anyway? Load balancer RPC in
  • 49. © 2017 MapR Technologies 49 But … Not Entirely • Synchronous RPC-based services only serve one need • In a synchronous service it’s common to do some, defer some • But deferring work is hard in a synchronous world … we have to give up the return call in some sense • This is the germ of streaming architecture
  • 50. © 2017 MapR Technologies 50 Deferred What is a Service Anyway? Load balancer RPC in
  • 51. © 2017 MapR Technologies 51 Isolation is The Defining Characteristic • If I can hide details of who and where, I have a service • If I can hide details of deployment, I have a micro-service • If I can hide details of when, I have a streaming micro-service
  • 52. © 2017 MapR Technologies 52 Temporal and Geo Isolation Producer Consumer isn’t even running
  • 53. © 2017 MapR Technologies 53 Temporal and Geo Isolation Producer Consumer
  • 54. © 2017 MapR Technologies 54 Producer Consumer Temporal and Geo Isolation Consumer could be an ocean away
  • 55. © 2017 MapR Technologies 55 We Need Multiple Forms of Persistence • Files are important – Config files, image files, archival data data – Legacy applications like machine learning, web • Tables are important – Critical to have random update for some applications – Should scale transparently without dedicated cluster • Streams are important – Should be co-equal form of persistence
  • 56. © 2017 MapR Technologies 56 App 1 App 2 App 3
  • 57. © 2017 MapR Technologies 57 App 1 App 2 App 3 stream LogFile
  • 58. © 2017 MapR Technologies 58 App 1 App 2 App 3 stream LogFile
  • 59. © 2017 MapR Technologies 59 App 1 App 2 App 3
  • 60. © 2017 MapR Technologies 60 What Does This Data Platform Need to Have? • Global namespace across entire Kubernetes cluster – Between clusters as well if possible • All three forms of primitive persistence – Files, streams, tables • Inherently scalable – Performance, cardinality, locality • Uniform access and control – Path names for all objects, identical permission scheme
  • 61. © 2017 MapR Technologies 61 What Does This Data Platform Need to Have? • Global namespace across entire Kubernetes cluster – Between clusters as well if possible • All three forms of primitive persistence – Files, streams, tables • Inherently scalable – Performance, cardinality, locality • Uniform access and control – Path names for all objects, identical permission scheme • Oh…. got that already. Just need to wire it up to Kubernetes
  • 62. © 2017 MapR Technologies 62
  • 63. © 2017 MapR Technologies 63
  • 64. © 2017 MapR Technologies 64 kubelet docker pod Normally pods interact directly with node resources
  • 65. © 2017 MapR Technologies 65 kubelet docker pod plugin We can install a volume plugin (recently introduced)
  • 66. © 2017 MapR Technologies 66 mapr- fs kubelet docker pod plugin fuse This allows uniform access to files, tables and streams
  • 67. © 2017 MapR Technologies 67 Where does that take us?
  • 68. © 2017 MapR Technologies 68 Consequences • Installation of plugin is K8S level operation – No per-node attention required • Use of plugin is overlay operation – No change needed for an container – Any Helm chart can use the plugin for conventional file access • Can share storage/compute or isolate or scale independently
  • 69. © 2017 MapR Technologies 69 More Consequences • State is no longer a dirty word for Kubernetes • HPC can run on K8S • Boring things can run on K8S without storage appliances • Previously crazy ideas can now be valuable • Complexity is largely not visible
  • 70. © 2017 MapR Technologies 70 Cloud as-is: No unified data access or security concepts Public Cloud Application ✓ API API Connector Single cloud vendor strategy: • Vendor lock in • No failover in case of global outage • Limited Edge capabilities AWS Services: • Kinesis & Elastic MapReduce • Redshift & DynamoDB • S3 & Glacier
  • 71. © 2017 MapR Technologies 71 Cloud as-is: No unified data access or security concepts Public Cloud Application ✓ API API Connector AWS Services: • Kinesis & Elastic MapReduce • Redshift & DynamoDB • S3 & Glacier API Azure Services: • HD Insight • SQL Server & CosmosDB • Blob & DataLakeStore Multi cloud strategy: • Complex data movement between clouds • On any other cloud: • Different API‘s: application breaks • Different Security concept
  • 72. © 2017 MapR Technologies 72 Cloud as-is: No unified data access or security concepts Edge Private Cloud On Premise Public Cloud Public Cloud Public Cloud API Application ✓ API API API API API Connector Multi cloud strategy: • Complex data movement between clouds • On any other cloud: • Different API‘s, application breaks • Different Security concept
  • 73. © 2017 MapR Technologies 73 Open APIs How a “Media Company” is using MapR Application • Unified Security Model • Data access decoupled from physical storage location. Globally. • No lock-in to proprietary APIs • Full openness • Data made portable API Connector ✓ GLOBAL DATA MANAGEMENT Edge Private Cloud On Premise Public Cloud Public Cloud Public Cloud Uniform computing environment everywhere
  • 74. © 2017 MapR Technologies 74 Open APIs How “Manufacturing Company” is using MapR Application • Unified Security Model • Data access decoupled from physical storage location. Globally. • No lock-in to proprietary APIs • Full openness • Data made portable API Connector ✓ GLOBAL DATA MANAGEMENT Edge Private Cloud On Premise Public Cloud Public Cloud Public Cloud Platform level data replication
  • 75. © 2017 MapR Technologies 75 Tier 1 Bank #1 Creating a Global Filesystem /mapr/edge1 /mapr/edge2 /mapr/edge3 /mapr/newyork /mapr/amsterdam /mapr/azure /mapr/gcp /mapr/aws-eu-west NFS POSIX HDFSREST HOT WARM COLD /mapr Kafka JSON HBASE SQL S3 Application ✓ Global access to local data
  • 76. © 2017 MapR Technologies 76 Tier 1 Bank #2: Creating an “Ubernetes” Platform with MapR Application GLOBAL DATA MANAGEMENT Edge Private Cloud On Premise Public Cloud Public Cloud Public Cloud PodPod Pod Image Classification using Tensorflow in a Docker container Classic ETL Scheduling & Scaling MapR Kubernetes Volume Driver Single pane of glass to control jobs anywhere
  • 77. © 2017 MapR Technologies 77 Additional Resources O’Reilly report by Ted Dunning & Ellen Friedman © September 2017 Read free courtesy of MapR: https://mapr.com/ebook/machine-learning-logistics/ O’Reilly book by Ted Dunning & Ellen Friedman © March 2016 Read free courtesy of MapR: https://mapr.com/streaming-architecture-using- apache-kafka-mapr-streams/ Ted Dunning & Ellen Friedman Model Management in theReal World MachineLearning Logistics
  • 78. © 2017 MapR Technologies 78 Additional Resources O’Reilly book by Ted Dunning & Ellen Friedman © June 2014 Read free courtesy of MapR: https://mapr.com/practical-machine-learning-new- look-anomaly-detection/ O’Reilly book by Ellen Friedman & Ted Dunning © February 2014 Read free courtesy of MapR: https://mapr.com/practical-machine-learning/
  • 79. © 2017 MapR Technologies 79 Additional Resources by Ellen Friedman 8 Aug 2017 on MapR blog: https://mapr.com/blog/tensorflow-mxnet-caffe-h2o-which-ml-best/ by Ted Dunning 13 Sept 2017 in InfoWorld: https://www.infoworld.com/article/3223 688/machine-learning/machine- learning-skills-for-software- engineers.html
  • 80. © 2017 MapR Technologies 80 New Book! We will be signing this book at the MapR booth later today. Detailed schedule at the booth.
  • 81. © 2017 MapR Technologies 81 Please support women in tech – help build girls’ dreams of what they can accomplish © Ellen Friedman 2015#womenintech #datawomen
  • 82. © 2017 MapR Technologies 82 Q&A @mapr tdunning@mapr.com ENGAGE WITH US @ Ted_Dunning