SlideShare a Scribd company logo
1 of 44
© 2017 MapR TechnologiesMapR Confidential 1
Self-Service Data Science for
Leveraging ML & AI on All of Your
Data:
Introducing the MapR Data Science Refinery
Rachel Silver
Product Manager – Data Science & Analytics
11/16/17
© 2017 MapR TechnologiesMapR Confidential 2
Summary
• Why Companies Invest In ML/AI
• Winning With a Data First Approach
• Introducing the MapR Data Science Refinery
• Deep Dive & Demos
– Ease of Deployment
– Data Exploration
– Extensibility & Collaboration
© 2017 MapR TechnologiesMapR Confidential 3
Why Companies Invest In ML/AI
© 2017 MapR TechnologiesMapR Confidential 4
Where AI Creates Value In The Value Chain
Produce
Optimized Production &
Maintenance
Provide rich, personal, and convenient
user experiences.
Project
Smarter R&D and
forecasting
Promote
Targeted Sales &
Marketing
Source: McKinsey Global Institute – Artificial Intelligence / The Next Digital Frontier? (2017)
© 2017 MapR TechnologiesMapR Confidential 5
Project Where The Next Threat Will Come From
Deep security analytics and advanced persistent threat (APT) detection
• Centralization and
visibility of all data
from an information
security perspective
• Reduced risk of
data breaches from
DDOS and APT
attacks
• Real-time insights
into what is
happening within
the environment
OBJECTIVE
• Early detection of data breaches and suspicious activity
• Aggregate and retain all security related data into a single central store and
then build statistical models to detect abnormal activity within the
environment.
• Get insights into what are insiders doing within the environment
CHALLENGES
• Existing SIEM solution could not scale
• Current solutions do not work well for “unknown” threats
SOLUTION
• Leverage MapR-DB for fast data ingestion and query performance
• MapR provided the deep storage and machine learning algorithms
• NFS enabled easy integration with the IT ecosystem
Retail
Bank
© 2017 MapR TechnologiesMapR Confidential 6
Source
1
Source
2
Source
1000
Houston
MAPR
Core
Cluster
Time to insight (48 hrs)
Manual Process
Before Edge
Source
1
Source
2
Source
1000
Houston
MAPR
Core
Cluster
Time to insight (<2 hrs)
Automated Process
1000s of
Oil & Drill Sources
Will do Pre Processing locally +at Core
(Custom App + Down Sampling)
After Edge
Produce More Efficiently
ML aggregation and processing at the edge optimizes production
Oil & Gas
company
© 2017 MapR TechnologiesMapR Confidential 7
Promote personalized offers in real-time
Targeting credit card customers using Recommendation Engine
A Global Financial
Services company
wanted to offer real-time
localized & personalized
recommendations to
their credit card holders
using ML/AI
OBJECTIVE
• Increase revenue and customer loyalty through real-time personalized offers
generated by a recommendation engine
CHALLENGES
• In order to be accurate, data had to be updated on a real-time basis
• Being a global company, their Platform has to be consistent and 100%
available 24x7 – no downtime
• Must be able to simultaneously ingest (stream) and update data in the
same cluster
SOLUTION
• MapR was the only distribution that met the mission critical needs of the
customer and also provided the capability to ingest data continuously into
the cluster
• Direct NFS allows data to be continuously ingested directly into their cluster
• MapR-XD’s self-healing capability allowed them to go into production safely
Leading
Credit Card
Company
© 2017 MapR TechnologiesMapR Confidential 8
Provide Customers With a Customized Experience
Provide customers with a personalized and convenient experience
Using ML/AI to bring
customer understanding
to the center of business
processes
OBJECTIVE
• Use full knowledge of customer relationship to inform online interactions.
CHALLENGES
• Need to store 20 trillion records
• Training sample size is 400 million records
• The decision trees contained 2 million possible pathways
• Every combination must be evaluated every time a model is used (~15 billion
combinations)
SOLUTION
• The MapR Converged Data Platform centralizes analytics and operational apps on
one platform allowing Quantium to make one large infrastructure investment
instead of many small silo’d ones. Current cluster has 50TB of memory and 5000
CPUs to process and store 5PB of data
© 2017 MapR TechnologiesMapR Confidential 9
A Winning Approach: Data First
© 2017 MapR TechnologiesMapR Confidential 10
Gartner estimates they solve between
10-100 business problems in three to
five years.
Gartner estimates they solve
between 3-20 business
problems in three to five years.
20%
Contemplators Experimenters
41%40%
Adopters
Uncertain about the
benefits of Data Science.
Desire easy entry
Entry Points in the Data Science Journey
20%
Source: McKinsey Global Institute – Artificial Intelligence / The Next Digital Frontier? (2017)
Source: Gartner – Magic Quadrant for Data Science Platforms (2017)
© 2017 MapR TechnologiesMapR Confidential 11
Entry Points in the Data Science Journey
Gartner estimates they solve between
10-100 business problems in three to
five years.
Gartner estimates they solve
between 3-20 business
problems in three to five years.
Uncertain about the
benefits of Data Science.
Desire easy entry
Adopters
20%
Contemplators Experimenters
41%40%
80%!
Source: McKinsey Global Institute – Artificial Intelligence / The Next Digital Frontier? (2017)
Source: Gartner – Magic Quadrant for Data Science Platforms (2017)
© 2017 MapR TechnologiesMapR Confidential 12
Entry Points in the Data Science Journey
Gartner estimates they solve between
10-100 business problems in three to
five years.
Gartner estimates they solve
between 3-20 business
problems in three to five years.
Uncertain about the
benefits of Data Science.
Desire easy entry
Adopters
20%
Experimenters
41%
Source: McKinsey Global Institute – Artificial Intelligence / The Next Digital Frontier? (2017)
AI adoption outside of the tech sector
is stuck here and many firms report they are
uncertain of the ROI
Contemplators
40%
Investment in AI is growing at a high rate,
but adoption in 2017 remains low
AI is only deployed into production
12% of the time
© 2017 MapR TechnologiesMapR Confidential 13
Entry Points in the Data Science Journey
Gartner estimates they solve between
10-100 business problems in three to
five years.
Gartner estimates they solve
between 3-20 business
problems in three to five years.
Uncertain about the
benefits of Data Science.
Desire easy entry
Contemplators Experimenters
41%40%
Adopters
20%
Source: McKinsey Global Institute – Artificial Intelligence / The Next Digital Frontier? (2017)
Seamless Data Access
Technical Capabilities (a strong digital foundation)
Leadership From The Top
Key Traits Of A Successful Data Science Approach
© 2017 MapR TechnologiesMapR Confidential 14
If it is ALL about the data,
then it better be about ALL your data.
Seamless Data Access
© 2017 MapR TechnologiesMapR Confidential 15
ML Models Improve when Trained on Larger Datasets
Instead of relying on
assumptions and weak
correlations, presence of
more data results in better
and more accurate models
Source: A Survey of Applications of AI Algorithms in Eco-environmental modelling (2009)
© 2017 MapR TechnologiesMapR Confidential 16
Data Growth Puts A Premium on Efficient Leverage
Source: McKinsey Global Institute: “The Age of Analytics”, Dec. 2016
The amount of data
is predicted to
double every three
years
Data Diversity
EmailsCall Detail
Records
Click
stream
CSV DocumentsData
PDFBilling Data Meta
Data
JSON Network
Data
Mobile
Data
XMLProduct
Catalog
Medical
Records
Text Files VideoText
Messages
Merchant
Listings
Sensor
Data
Server
Logs
Set Top
Box
Social
Media
Audio
4 Zettabytes
of Data
20111986
300 Exabytes
of Data
3 Exabytes
of Data
20192016
2 Zettabytes
of Data
© 2017 MapR TechnologiesMapR Confidential 17
Hadoop + Vendor Approach to Data Science
Requires yet another cluster
Data Science
cluster
Batch
Cluster
Streaming
Cluster
NoSQL
Cluster
On Premises
© 2017 MapR TechnologiesMapR Confidential 18
© 2017 MapR TechnologiesMapR Confidential 19
A Capable Platform With a Strong Digital Foundation
NFS POSIX REST HDFS
MAPR CONVERGED DATA PLATFORM
ON-PREMISES, MULTI-CLOUD, IoT EDGE
FILE
STORE
CONTAINER
STORE
CUSTOM
FILE APPS
METADATA
MANAGEMENT
JSON HBASEKAFKA
HADOOP & SPARK
APPS
REAL-TIME
BI APPS
STREAMING
APPS
IoT/EDGE
SQL
OPERATIONAL DATA
HUB
CDC
CONTEXTUAL
USER
EXPERIENCES
CORE BUSINESS
APPS
SINGLE
VIEW
IOT
© 2017 MapR TechnologiesMapR Confidential 20
Real-time Machine Learning Pipelines
A Robust Microservices Framework
Event Streams
• Persistent
• Infinitely replicable
• Re-playable
Compare model
results live!
M
Model A
M
Model B
Persistent
Client & Application
Containers
© 2017 MapR TechnologiesMapR Confidential 21
Advice For Leadership
Avoid
• Creating new silos
• Looking for a one-trick pony
• Adopting tools that have
unwieldy install, integration,
and configuration processes
• Tools that don’t scale to
broader enterprise use
• Ensure secure role based
access to all data
• Adopt tools that meet the
needs of a broad range of
Data Science Teams
• Encourage adoption by
making things easy, secure,
and complete
Important
© 2017 MapR TechnologiesMapR Confidential 22
Data Science @ MapR
© 2017 MapR TechnologiesMapR Confidential 23
The MapR Data Science Vision
A Holistic Approach To Self-Service Data Science
MAPR DATA SCIENCE REFINERY REFINERY DATA SCIENTISTS
Data Scientist led product-and-
services offerings including Quick
Start Solutions (QSS) & Training
REFINERY PARTNERSHIPS
Expand on what we offer in-
product to meet the needs of all
data science teams
An easy-to-deploy, secure, and
extensible data science offering
that leverages all existing platform
assets
MAPR CONVERGED DATA PLATFORM
© 2017 MapR TechnologiesMapR Confidential 24
MapR Data Science Refinery
Provides the ability to work across many
engines in one visual space
• Apache Spark: Spark Streaming, SparkSQL, SparkR, and
PySpark
• Apache Hive
• Apache Pig
• Apache Drill
• Python
• Shell access to MapR-FS
• Programmatic access to MapR-DB and MapR-ES in Spark
Pluggable Visualization Available via Helium!
An Enterprise-ready Data Science Notebook
MAPR
POSIX CLIENT
FOR CONTAINERS
MAPR
CONVERGED CLIENT
FOR CONTAINERS
© 2017 MapR TechnologiesMapR Confidential 25
MapR Data Science Refinery Benefits
Easy to Deploy
• A Docker Image includes all the necessary bits - no more,
no less - required to leverage MapR as a persistent data
store for your data science output.
• Available on DockerHub
Secure
• Authentication occurs at a container level to ensure
containerized applications only have access to data for
which they are authorized.
• Communications are encrypted to ensure privacy when
accessing data in MapR.
Extensible
• A Dockerfile is also available on GitHub, allowing you to
further customize the image as needed to support your
specific application needs.
• The Helium Framework enables pluggable visualization
Leverage Locally, On-premise, or in Cloud
CLOUD-SCALE
DATA STORE
MAPR-XD
OPERATIONAL
DATABASE
MAPR-DB
EVENT
STREAMING
MAPR-ES
High Availability Real-time Unified Security Multi-Tenancy Disaster Recovery Global Namespace
MAPR CONVERGED DATA PLATFORM
© 2017 MapR TechnologiesMapR Confidential 26
Partner Integration: An Example
We’re enabling our partners to integrate with and use this product
DataScience.com Platform
Services
MapR DSR
Zeppelin Livy
JDBC
MapR Clients
© 2017 MapR TechnologiesMapR Confidential 27
© 2017 MapR TechnologiesMapR Confidential 28
Demo: Ease of Deployment &
Data Exploration
© 2017 MapR TechnologiesMapR Confidential 29
Demo: Ease of Deployment
What’s in the command
docker run --rm -it --cap-add SYS_ADMIN --cap-add SYS_RESOURCE --
device /dev/fuse --memory 0 -e MAPR_CLUSTER=my.cluster.com -e
MAPR_MEMORY=0 -e MAPR_MOUNT_PATH=/mapr -e
MAPR_TZ=America/Los_Angeles -e MAPR_CONTAINER_USER=mapr -e
MAPR_CONTAINER_UID=5000 -e MAPR_CONTAINER_GROUP=mapr -e
MAPR_CONTAINER_GID=5000 -e
MAPR_CLDB_HOSTS=172.24.8.195,172.24.11.200,172.24.10.4 -e
MAPR_TICKETFILE_LOCATION=/tmp/maprticket_5000 -e
ZEPPELIN_SSL_PORT=9995 -e HOST_IP=172.24.11.62 -e
MAPR_HS_HOST=172.24.8.195 -p 9995:9995 -p 10000-10010:10000-10010 -v
/tmp/maprticket_5000:/tmp/maprticket_5000:ro -v
/sys/fs/cgroup:/sys/fs/cgroup:ro maprtech/data-science-
refinery:v1.0_6.0.0_4.0.0_centos7
© 2017 MapR TechnologiesMapR Confidential 30
Demo: Ease of Deployment
What’s in the command
docker run --rm -it --cap-add SYS_ADMIN --cap-add SYS_RESOURCE --
device /dev/fuse --memory 0 -e MAPR_CLUSTER=my.cluster.com -e
MAPR_MEMORY=0 -e MAPR_MOUNT_PATH=/mapr -e
MAPR_TZ=America/Los_Angeles -e MAPR_CONTAINER_USER=mapr -e
MAPR_CONTAINER_UID=5000 -e MAPR_CONTAINER_GROUP=mapr -e
MAPR_CONTAINER_GID=5000 -e
MAPR_CLDB_HOSTS=172.24.8.195,172.24.11.200,172.24.10.4 -e
MAPR_TICKETFILE_LOCATION=/tmp/maprticket_5000 -e
ZEPPELIN_SSL_PORT=9995 -e HOST_IP=172.24.11.62 -e
MAPR_HS_HOST=172.24.8.195 -p 9995:9995 -p 10000-10010:10000-10010 -v
/tmp/maprticket_5000:/tmp/maprticket_5000:ro -v
/sys/fs/cgroup:/sys/fs/cgroup:ro maprtech/data-science-
refinery:v1.0_6.0.0_4.0.0_centos7
© 2017 MapR TechnologiesMapR Confidential 31
Demo: Ease of Deployment
What’s in the command
docker run --rm -it --cap-add SYS_ADMIN --cap-add SYS_RESOURCE --
device /dev/fuse --memory 0 -e MAPR_CLUSTER=my.cluster.com -e
MAPR_MEMORY=0 -e MAPR_MOUNT_PATH=/mapr -e
MAPR_TZ=America/Los_Angeles -e MAPR_CONTAINER_USER=mapr -e
MAPR_CONTAINER_UID=5000 -e MAPR_CONTAINER_GROUP=mapr -e
MAPR_CONTAINER_GID=5000 -e
MAPR_CLDB_HOSTS=172.24.8.195,172.24.11.200,172.24.10.4 -e
MAPR_TICKETFILE_LOCATION=/tmp/maprticket_5000 -e
ZEPPELIN_SSL_PORT=9995 -e HOST_IP=172.24.11.62 -e
MAPR_HS_HOST=172.24.8.195 -p 9995:9995 -p 10000-10010:10000-10010 -v
/tmp/maprticket_5000:/tmp/maprticket_5000:ro -v
/sys/fs/cgroup:/sys/fs/cgroup:ro maprtech/data-science-
refinery:v1.0_6.0.0_4.0.0_centos7
© 2017 MapR TechnologiesMapR Confidential 32
Demo: Ease of Deployment
What’s in the command
docker run --rm -it --cap-add SYS_ADMIN --cap-add SYS_RESOURCE --
device /dev/fuse --memory 0 -e MAPR_CLUSTER=my.cluster.com -e
MAPR_MEMORY=0 -e MAPR_MOUNT_PATH=/mapr -e
MAPR_TZ=America/Los_Angeles -e MAPR_CONTAINER_USER=mapr -e
MAPR_CONTAINER_UID=5000 -e MAPR_CONTAINER_GROUP=mapr -e
MAPR_CONTAINER_GID=5000 -e
MAPR_CLDB_HOSTS=172.24.8.195,172.24.11.200,172.24.10.4 -e
MAPR_TICKETFILE_LOCATION=/tmp/maprticket_5000 -e
ZEPPELIN_SSL_PORT=9995 -e HOST_IP=172.24.11.62 -e
MAPR_HS_HOST=172.24.8.195 -p 9995:9995 -p 10000-10010:10000-10010 -v
/tmp/maprticket_5000:/tmp/maprticket_5000:ro -v
/sys/fs/cgroup:/sys/fs/cgroup:ro maprtech/data-science-
refinery:v1.0_6.0.0_4.0.0_centos7
© 2017 MapR TechnologiesMapR Confidential 33
Demo: Ease of Deployment
What’s in the command
docker run --rm -it --cap-add SYS_ADMIN --cap-add SYS_RESOURCE --
device /dev/fuse --memory 0 -e MAPR_CLUSTER=my.cluster.com -e
MAPR_MEMORY=0 -e MAPR_MOUNT_PATH=/mapr -e
MAPR_TZ=America/Los_Angeles -e MAPR_CONTAINER_USER=mapr -e
MAPR_CONTAINER_UID=5000 -e MAPR_CONTAINER_GROUP=mapr -e
MAPR_CONTAINER_GID=5000 -e
MAPR_CLDB_HOSTS=172.24.8.195,172.24.11.200,172.24.10.4 -e
MAPR_TICKETFILE_LOCATION=/tmp/maprticket_5000 -e
ZEPPELIN_SSL_PORT=9995 -e HOST_IP=172.24.11.62 -e
MAPR_HS_HOST=172.24.8.195 -p 9995:9995 -p 10000-10010:10000-10010 -v
/tmp/maprticket_5000:/tmp/maprticket_5000:ro -v
/sys/fs/cgroup:/sys/fs/cgroup:ro maprtech/data-science-
refinery:v1.0_6.0.0_4.0.0_centos7
© 2017 MapR TechnologiesMapR Confidential 34
Demo: Ease of Deployment
How is Security Handled?
$ maprlogin password
[Password for user ’jane' at cluster 'my.cluster.com': ]
MapR credentials of user ’john' for cluster 'my.cluster.com' are written to '/tmp/janes_ticket’
Job submits as ‘jane’
© 2017 MapR TechnologiesMapR Confidential 35
Demo: Ease of Deployment
Why Livy?
CLOUD-SCALE
DATA STORE
MAPR-XD
OPERATIONAL
DATABASE
MAPR-DB
EVENT
STREAMING
MAPR-ES
MAPR CONVERGED DATA PLATFORMHTTP (RPC)
Advantages over native Spark Interpreter:
• Jobs are submitted in YARN cluster mode
• Spark context can be shared
• Support for Spark Dynamic Resource Allocation
© 2017 MapR TechnologiesMapR Confidential 36
Demo: Extensibility &
Collaboration
© 2017 MapR TechnologiesMapR Confidential 37
Demo: Extensibility & Collaboration
Collaboration
CLOUD-SCALE
DATA STORE
MAPR-XD
OPERATIONAL
DATABASE
MAPR-DB
EVENT
STREAMING
MAPR-ES
MAPR CONVERGED DATA PLATFORM
© 2017 MapR TechnologiesMapR Confidential 38
Demo: Extensibility & Collaboration
Collaboration
CLOUD-SCALE
DATA STORE
MAPR-XD
OPERATIONAL
DATABASE
MAPR-DB
EVENT
STREAMING
MAPR-ES
MAPR CONVERGED DATA PLATFORM
MAPR
POSIX CLIENT
FOR CONTAINERS
© 2017 MapR TechnologiesMapR Confidential 39
Demo: Extensibility & Collaboration
What’s in the command
docker run --rm -it --cap-add SYS_ADMIN --cap-add SYS_RESOURCE --device
/dev/fuse --memory 0 -e MAPR_CLUSTER=my.cluster.com -e
MAPR_MEMORY=0 -e MAPR_MOUNT_PATH=/mapr -e
ZEPPELIN_NOTEBOOK_DIR=/mapr/my.cluster.com/user/mapr/zeppelin/shared-
notebooks/ -e MAPR_TZ=America/Los_Angeles -e
MAPR_CONTAINER_USER=mapr -e MAPR_CONTAINER_UID=5000 -e
MAPR_CONTAINER_GROUP=mapr -e MAPR_CONTAINER_GID=5000 -e
MAPR_CLDB_HOSTS=172.24.8.195,172.24.11.200,172.24.10.4 -e
MAPR_TICKETFILE_LOCATION=/tmp/maprticket_5000 -e
ZEPPELIN_SSL_PORT=9995 -e HOST_IP=172.24.11.62 -e
MAPR_HS_HOST=172.24.8.195 -p 9995:9995 -p 10000-10010:10000-10010 -v
/tmp/maprticket_5000:/tmp/maprticket_5000:ro -v /sys/fs/cgroup:/sys/fs/cgroup:ro
maprtech/data-science-refinery:v1.0_6.0.0_4.0.0_centos7
© 2017 MapR TechnologiesMapR Confidential 40
Demo: Extensibility
Adding Deep Learning libraries to the container
© 2017 MapR TechnologiesMapR Confidential 41
Demo: Extensibility
Adding Deep Learning libraries to the container
CLOUD-SCALE
DATA STORE
MAPR-XD
OPERATIONAL
DATABASE
MAPR-DB
EVENT
STREAMING
MAPR-ES
MAPR CONVERGED DATA PLATFORM
Compute Persistent Storage
© 2017 MapR TechnologiesMapR Confidential 42
Demo: Extensibility
Adding Deep Learning libraries to the container
CLOUD-SCALE
DATA STORE
MAPR-XD
OPERATIONAL
DATABASE
MAPR-DB
EVENT
STREAMING
MAPR-ES
MAPR CONVERGED DATA PLATFORM
Compute Persistent Storage
What if this was a
box of GPUs?
© 2017 MapR TechnologiesMapR Confidential 43
A Final Comparison
Traditional Hadoop Vendor
BatchCluster
StreamingCluster
NoSQLCluster
On Premises
Data
Science
cluster
© 2017 MapR TechnologiesMapR Confidential 44
Q&A
ENGAGE WITH US
@mapr
rsilver@mapr.com

More Related Content

What's hot

Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataInside Analysis
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseGanesan Narayanasamy
 
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...ervogler
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017Ray Bugg
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive AnalyticsInfochimps, a CSC Big Data Business
 
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web ServicesDickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web ServicesPrecisely
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
IMGS Geospatial User Group 2014: Hexagon Geospatial Vision, Mission and Strategy
IMGS Geospatial User Group 2014: Hexagon Geospatial Vision, Mission and StrategyIMGS Geospatial User Group 2014: Hexagon Geospatial Vision, Mission and Strategy
IMGS Geospatial User Group 2014: Hexagon Geospatial Vision, Mission and StrategyIMGS
 
Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0SnapLogic
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dataconomy Media
 
7 inspiring Big Data factories in AWS
7 inspiring Big Data factories in AWS7 inspiring Big Data factories in AWS
7 inspiring Big Data factories in AWSSebastien BONNOTTE
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media
 
2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey ResultsCarole Gunst
 
Integrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environmentIntegrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environmentMapR Technologies
 

What's hot (20)

Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate Data
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the Enterprise
 
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Hadoop dev 01
Hadoop dev 01Hadoop dev 01
Hadoop dev 01
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
 
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web ServicesDickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
Dickey's Barbecue Pit Heats Up Analytics with Amazon Web Services
 
Ibm big data
Ibm big dataIbm big data
Ibm big data
 
Using Hadoop for Cognitive Analytics
Using Hadoop for Cognitive AnalyticsUsing Hadoop for Cognitive Analytics
Using Hadoop for Cognitive Analytics
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
IMGS Geospatial User Group 2014: Hexagon Geospatial Vision, Mission and Strategy
IMGS Geospatial User Group 2014: Hexagon Geospatial Vision, Mission and StrategyIMGS Geospatial User Group 2014: Hexagon Geospatial Vision, Mission and Strategy
IMGS Geospatial User Group 2014: Hexagon Geospatial Vision, Mission and Strategy
 
Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 
7 inspiring Big Data factories in AWS
7 inspiring Big Data factories in AWS7 inspiring Big Data factories in AWS
7 inspiring Big Data factories in AWS
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
Smart App@Pivotal by Dat Tran
Smart App@Pivotal by Dat TranSmart App@Pivotal by Dat Tran
Smart App@Pivotal by Dat Tran
 
2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results
 
Integrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environmentIntegrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environment
 
Hadoop for the Masses
Hadoop for the MassesHadoop for the Masses
Hadoop for the Masses
 

Viewers also liked

Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Technologies
 
Processing IoT Data with Apache Kafka
Processing IoT Data with Apache KafkaProcessing IoT Data with Apache Kafka
Processing IoT Data with Apache KafkaMatthew Howlett
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeSlim Baltagi
 
MapR Data Analyst
MapR Data AnalystMapR Data Analyst
MapR Data Analystselvaraaju
 

Viewers also liked (6)

Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
 
Processing IoT Data with Apache Kafka
Processing IoT Data with Apache KafkaProcessing IoT Data with Apache Kafka
Processing IoT Data with Apache Kafka
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
MapR Data Analyst
MapR Data AnalystMapR Data Analyst
MapR Data Analyst
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 

Similar to Self-Service Data Science for Leveraging ML & AI on All of Your Data

Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsMatt Stubbs
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning PrimerMathieu Dumoulin
 
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...Matt Stubbs
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise WeAreEsynergy
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Carol McDonald
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014MapR Technologies
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleIan Downard
 
Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)
Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)
Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)Denodo
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
 
Future-Proofing Asset Failures with Cognitive Predictive Maintenance
Future-Proofing Asset Failures with Cognitive Predictive MaintenanceFuture-Proofing Asset Failures with Cognitive Predictive Maintenance
Future-Proofing Asset Failures with Cognitive Predictive MaintenanceAnita Raj
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopRevolution Analytics
 
Big Data for Smart City
Big Data for Smart CityBig Data for Smart City
Big Data for Smart CityKoltiva
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Data & Analytic Innovations: 5 lessons from our customers
Data & Analytic Innovations: 5 lessons from our customersData & Analytic Innovations: 5 lessons from our customers
Data & Analytic Innovations: 5 lessons from our customersNick Smith
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globallyridhav
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive
 

Similar to Self-Service Data Science for Leveraging ML & AI on All of Your Data (20)

Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business Solutions
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
Big Data LDN 2017: The Intelligent Edge: What Data-driven Means in the Age of...
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)
Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)
Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
 
Future-Proofing Asset Failures with Cognitive Predictive Maintenance
Future-Proofing Asset Failures with Cognitive Predictive MaintenanceFuture-Proofing Asset Failures with Cognitive Predictive Maintenance
Future-Proofing Asset Failures with Cognitive Predictive Maintenance
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
Big Data for Smart City
Big Data for Smart CityBig Data for Smart City
Big Data for Smart City
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Data & Analytic Innovations: 5 lessons from our customers
Data & Analytic Innovations: 5 lessons from our customersData & Analytic Innovations: 5 lessons from our customers
Data & Analytic Innovations: 5 lessons from our customers
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
 

More from MapR Technologies

Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR Technologies
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceMapR Technologies
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataMapR Technologies
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationMapR Technologies
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast DataMapR Technologies
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...MapR Technologies
 

More from MapR Technologies (18)

Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big Data
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital Transformation
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast Data
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 

Recently uploaded

Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 

Recently uploaded (20)

Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 

Self-Service Data Science for Leveraging ML & AI on All of Your Data

  • 1. © 2017 MapR TechnologiesMapR Confidential 1 Self-Service Data Science for Leveraging ML & AI on All of Your Data: Introducing the MapR Data Science Refinery Rachel Silver Product Manager – Data Science & Analytics 11/16/17
  • 2. © 2017 MapR TechnologiesMapR Confidential 2 Summary • Why Companies Invest In ML/AI • Winning With a Data First Approach • Introducing the MapR Data Science Refinery • Deep Dive & Demos – Ease of Deployment – Data Exploration – Extensibility & Collaboration
  • 3. © 2017 MapR TechnologiesMapR Confidential 3 Why Companies Invest In ML/AI
  • 4. © 2017 MapR TechnologiesMapR Confidential 4 Where AI Creates Value In The Value Chain Produce Optimized Production & Maintenance Provide rich, personal, and convenient user experiences. Project Smarter R&D and forecasting Promote Targeted Sales & Marketing Source: McKinsey Global Institute – Artificial Intelligence / The Next Digital Frontier? (2017)
  • 5. © 2017 MapR TechnologiesMapR Confidential 5 Project Where The Next Threat Will Come From Deep security analytics and advanced persistent threat (APT) detection • Centralization and visibility of all data from an information security perspective • Reduced risk of data breaches from DDOS and APT attacks • Real-time insights into what is happening within the environment OBJECTIVE • Early detection of data breaches and suspicious activity • Aggregate and retain all security related data into a single central store and then build statistical models to detect abnormal activity within the environment. • Get insights into what are insiders doing within the environment CHALLENGES • Existing SIEM solution could not scale • Current solutions do not work well for “unknown” threats SOLUTION • Leverage MapR-DB for fast data ingestion and query performance • MapR provided the deep storage and machine learning algorithms • NFS enabled easy integration with the IT ecosystem Retail Bank
  • 6. © 2017 MapR TechnologiesMapR Confidential 6 Source 1 Source 2 Source 1000 Houston MAPR Core Cluster Time to insight (48 hrs) Manual Process Before Edge Source 1 Source 2 Source 1000 Houston MAPR Core Cluster Time to insight (<2 hrs) Automated Process 1000s of Oil & Drill Sources Will do Pre Processing locally +at Core (Custom App + Down Sampling) After Edge Produce More Efficiently ML aggregation and processing at the edge optimizes production Oil & Gas company
  • 7. © 2017 MapR TechnologiesMapR Confidential 7 Promote personalized offers in real-time Targeting credit card customers using Recommendation Engine A Global Financial Services company wanted to offer real-time localized & personalized recommendations to their credit card holders using ML/AI OBJECTIVE • Increase revenue and customer loyalty through real-time personalized offers generated by a recommendation engine CHALLENGES • In order to be accurate, data had to be updated on a real-time basis • Being a global company, their Platform has to be consistent and 100% available 24x7 – no downtime • Must be able to simultaneously ingest (stream) and update data in the same cluster SOLUTION • MapR was the only distribution that met the mission critical needs of the customer and also provided the capability to ingest data continuously into the cluster • Direct NFS allows data to be continuously ingested directly into their cluster • MapR-XD’s self-healing capability allowed them to go into production safely Leading Credit Card Company
  • 8. © 2017 MapR TechnologiesMapR Confidential 8 Provide Customers With a Customized Experience Provide customers with a personalized and convenient experience Using ML/AI to bring customer understanding to the center of business processes OBJECTIVE • Use full knowledge of customer relationship to inform online interactions. CHALLENGES • Need to store 20 trillion records • Training sample size is 400 million records • The decision trees contained 2 million possible pathways • Every combination must be evaluated every time a model is used (~15 billion combinations) SOLUTION • The MapR Converged Data Platform centralizes analytics and operational apps on one platform allowing Quantium to make one large infrastructure investment instead of many small silo’d ones. Current cluster has 50TB of memory and 5000 CPUs to process and store 5PB of data
  • 9. © 2017 MapR TechnologiesMapR Confidential 9 A Winning Approach: Data First
  • 10. © 2017 MapR TechnologiesMapR Confidential 10 Gartner estimates they solve between 10-100 business problems in three to five years. Gartner estimates they solve between 3-20 business problems in three to five years. 20% Contemplators Experimenters 41%40% Adopters Uncertain about the benefits of Data Science. Desire easy entry Entry Points in the Data Science Journey 20% Source: McKinsey Global Institute – Artificial Intelligence / The Next Digital Frontier? (2017) Source: Gartner – Magic Quadrant for Data Science Platforms (2017)
  • 11. © 2017 MapR TechnologiesMapR Confidential 11 Entry Points in the Data Science Journey Gartner estimates they solve between 10-100 business problems in three to five years. Gartner estimates they solve between 3-20 business problems in three to five years. Uncertain about the benefits of Data Science. Desire easy entry Adopters 20% Contemplators Experimenters 41%40% 80%! Source: McKinsey Global Institute – Artificial Intelligence / The Next Digital Frontier? (2017) Source: Gartner – Magic Quadrant for Data Science Platforms (2017)
  • 12. © 2017 MapR TechnologiesMapR Confidential 12 Entry Points in the Data Science Journey Gartner estimates they solve between 10-100 business problems in three to five years. Gartner estimates they solve between 3-20 business problems in three to five years. Uncertain about the benefits of Data Science. Desire easy entry Adopters 20% Experimenters 41% Source: McKinsey Global Institute – Artificial Intelligence / The Next Digital Frontier? (2017) AI adoption outside of the tech sector is stuck here and many firms report they are uncertain of the ROI Contemplators 40% Investment in AI is growing at a high rate, but adoption in 2017 remains low AI is only deployed into production 12% of the time
  • 13. © 2017 MapR TechnologiesMapR Confidential 13 Entry Points in the Data Science Journey Gartner estimates they solve between 10-100 business problems in three to five years. Gartner estimates they solve between 3-20 business problems in three to five years. Uncertain about the benefits of Data Science. Desire easy entry Contemplators Experimenters 41%40% Adopters 20% Source: McKinsey Global Institute – Artificial Intelligence / The Next Digital Frontier? (2017) Seamless Data Access Technical Capabilities (a strong digital foundation) Leadership From The Top Key Traits Of A Successful Data Science Approach
  • 14. © 2017 MapR TechnologiesMapR Confidential 14 If it is ALL about the data, then it better be about ALL your data. Seamless Data Access
  • 15. © 2017 MapR TechnologiesMapR Confidential 15 ML Models Improve when Trained on Larger Datasets Instead of relying on assumptions and weak correlations, presence of more data results in better and more accurate models Source: A Survey of Applications of AI Algorithms in Eco-environmental modelling (2009)
  • 16. © 2017 MapR TechnologiesMapR Confidential 16 Data Growth Puts A Premium on Efficient Leverage Source: McKinsey Global Institute: “The Age of Analytics”, Dec. 2016 The amount of data is predicted to double every three years Data Diversity EmailsCall Detail Records Click stream CSV DocumentsData PDFBilling Data Meta Data JSON Network Data Mobile Data XMLProduct Catalog Medical Records Text Files VideoText Messages Merchant Listings Sensor Data Server Logs Set Top Box Social Media Audio 4 Zettabytes of Data 20111986 300 Exabytes of Data 3 Exabytes of Data 20192016 2 Zettabytes of Data
  • 17. © 2017 MapR TechnologiesMapR Confidential 17 Hadoop + Vendor Approach to Data Science Requires yet another cluster Data Science cluster Batch Cluster Streaming Cluster NoSQL Cluster On Premises
  • 18. © 2017 MapR TechnologiesMapR Confidential 18
  • 19. © 2017 MapR TechnologiesMapR Confidential 19 A Capable Platform With a Strong Digital Foundation NFS POSIX REST HDFS MAPR CONVERGED DATA PLATFORM ON-PREMISES, MULTI-CLOUD, IoT EDGE FILE STORE CONTAINER STORE CUSTOM FILE APPS METADATA MANAGEMENT JSON HBASEKAFKA HADOOP & SPARK APPS REAL-TIME BI APPS STREAMING APPS IoT/EDGE SQL OPERATIONAL DATA HUB CDC CONTEXTUAL USER EXPERIENCES CORE BUSINESS APPS SINGLE VIEW IOT
  • 20. © 2017 MapR TechnologiesMapR Confidential 20 Real-time Machine Learning Pipelines A Robust Microservices Framework Event Streams • Persistent • Infinitely replicable • Re-playable Compare model results live! M Model A M Model B Persistent Client & Application Containers
  • 21. © 2017 MapR TechnologiesMapR Confidential 21 Advice For Leadership Avoid • Creating new silos • Looking for a one-trick pony • Adopting tools that have unwieldy install, integration, and configuration processes • Tools that don’t scale to broader enterprise use • Ensure secure role based access to all data • Adopt tools that meet the needs of a broad range of Data Science Teams • Encourage adoption by making things easy, secure, and complete Important
  • 22. © 2017 MapR TechnologiesMapR Confidential 22 Data Science @ MapR
  • 23. © 2017 MapR TechnologiesMapR Confidential 23 The MapR Data Science Vision A Holistic Approach To Self-Service Data Science MAPR DATA SCIENCE REFINERY REFINERY DATA SCIENTISTS Data Scientist led product-and- services offerings including Quick Start Solutions (QSS) & Training REFINERY PARTNERSHIPS Expand on what we offer in- product to meet the needs of all data science teams An easy-to-deploy, secure, and extensible data science offering that leverages all existing platform assets MAPR CONVERGED DATA PLATFORM
  • 24. © 2017 MapR TechnologiesMapR Confidential 24 MapR Data Science Refinery Provides the ability to work across many engines in one visual space • Apache Spark: Spark Streaming, SparkSQL, SparkR, and PySpark • Apache Hive • Apache Pig • Apache Drill • Python • Shell access to MapR-FS • Programmatic access to MapR-DB and MapR-ES in Spark Pluggable Visualization Available via Helium! An Enterprise-ready Data Science Notebook MAPR POSIX CLIENT FOR CONTAINERS MAPR CONVERGED CLIENT FOR CONTAINERS
  • 25. © 2017 MapR TechnologiesMapR Confidential 25 MapR Data Science Refinery Benefits Easy to Deploy • A Docker Image includes all the necessary bits - no more, no less - required to leverage MapR as a persistent data store for your data science output. • Available on DockerHub Secure • Authentication occurs at a container level to ensure containerized applications only have access to data for which they are authorized. • Communications are encrypted to ensure privacy when accessing data in MapR. Extensible • A Dockerfile is also available on GitHub, allowing you to further customize the image as needed to support your specific application needs. • The Helium Framework enables pluggable visualization Leverage Locally, On-premise, or in Cloud CLOUD-SCALE DATA STORE MAPR-XD OPERATIONAL DATABASE MAPR-DB EVENT STREAMING MAPR-ES High Availability Real-time Unified Security Multi-Tenancy Disaster Recovery Global Namespace MAPR CONVERGED DATA PLATFORM
  • 26. © 2017 MapR TechnologiesMapR Confidential 26 Partner Integration: An Example We’re enabling our partners to integrate with and use this product DataScience.com Platform Services MapR DSR Zeppelin Livy JDBC MapR Clients
  • 27. © 2017 MapR TechnologiesMapR Confidential 27
  • 28. © 2017 MapR TechnologiesMapR Confidential 28 Demo: Ease of Deployment & Data Exploration
  • 29. © 2017 MapR TechnologiesMapR Confidential 29 Demo: Ease of Deployment What’s in the command docker run --rm -it --cap-add SYS_ADMIN --cap-add SYS_RESOURCE -- device /dev/fuse --memory 0 -e MAPR_CLUSTER=my.cluster.com -e MAPR_MEMORY=0 -e MAPR_MOUNT_PATH=/mapr -e MAPR_TZ=America/Los_Angeles -e MAPR_CONTAINER_USER=mapr -e MAPR_CONTAINER_UID=5000 -e MAPR_CONTAINER_GROUP=mapr -e MAPR_CONTAINER_GID=5000 -e MAPR_CLDB_HOSTS=172.24.8.195,172.24.11.200,172.24.10.4 -e MAPR_TICKETFILE_LOCATION=/tmp/maprticket_5000 -e ZEPPELIN_SSL_PORT=9995 -e HOST_IP=172.24.11.62 -e MAPR_HS_HOST=172.24.8.195 -p 9995:9995 -p 10000-10010:10000-10010 -v /tmp/maprticket_5000:/tmp/maprticket_5000:ro -v /sys/fs/cgroup:/sys/fs/cgroup:ro maprtech/data-science- refinery:v1.0_6.0.0_4.0.0_centos7
  • 30. © 2017 MapR TechnologiesMapR Confidential 30 Demo: Ease of Deployment What’s in the command docker run --rm -it --cap-add SYS_ADMIN --cap-add SYS_RESOURCE -- device /dev/fuse --memory 0 -e MAPR_CLUSTER=my.cluster.com -e MAPR_MEMORY=0 -e MAPR_MOUNT_PATH=/mapr -e MAPR_TZ=America/Los_Angeles -e MAPR_CONTAINER_USER=mapr -e MAPR_CONTAINER_UID=5000 -e MAPR_CONTAINER_GROUP=mapr -e MAPR_CONTAINER_GID=5000 -e MAPR_CLDB_HOSTS=172.24.8.195,172.24.11.200,172.24.10.4 -e MAPR_TICKETFILE_LOCATION=/tmp/maprticket_5000 -e ZEPPELIN_SSL_PORT=9995 -e HOST_IP=172.24.11.62 -e MAPR_HS_HOST=172.24.8.195 -p 9995:9995 -p 10000-10010:10000-10010 -v /tmp/maprticket_5000:/tmp/maprticket_5000:ro -v /sys/fs/cgroup:/sys/fs/cgroup:ro maprtech/data-science- refinery:v1.0_6.0.0_4.0.0_centos7
  • 31. © 2017 MapR TechnologiesMapR Confidential 31 Demo: Ease of Deployment What’s in the command docker run --rm -it --cap-add SYS_ADMIN --cap-add SYS_RESOURCE -- device /dev/fuse --memory 0 -e MAPR_CLUSTER=my.cluster.com -e MAPR_MEMORY=0 -e MAPR_MOUNT_PATH=/mapr -e MAPR_TZ=America/Los_Angeles -e MAPR_CONTAINER_USER=mapr -e MAPR_CONTAINER_UID=5000 -e MAPR_CONTAINER_GROUP=mapr -e MAPR_CONTAINER_GID=5000 -e MAPR_CLDB_HOSTS=172.24.8.195,172.24.11.200,172.24.10.4 -e MAPR_TICKETFILE_LOCATION=/tmp/maprticket_5000 -e ZEPPELIN_SSL_PORT=9995 -e HOST_IP=172.24.11.62 -e MAPR_HS_HOST=172.24.8.195 -p 9995:9995 -p 10000-10010:10000-10010 -v /tmp/maprticket_5000:/tmp/maprticket_5000:ro -v /sys/fs/cgroup:/sys/fs/cgroup:ro maprtech/data-science- refinery:v1.0_6.0.0_4.0.0_centos7
  • 32. © 2017 MapR TechnologiesMapR Confidential 32 Demo: Ease of Deployment What’s in the command docker run --rm -it --cap-add SYS_ADMIN --cap-add SYS_RESOURCE -- device /dev/fuse --memory 0 -e MAPR_CLUSTER=my.cluster.com -e MAPR_MEMORY=0 -e MAPR_MOUNT_PATH=/mapr -e MAPR_TZ=America/Los_Angeles -e MAPR_CONTAINER_USER=mapr -e MAPR_CONTAINER_UID=5000 -e MAPR_CONTAINER_GROUP=mapr -e MAPR_CONTAINER_GID=5000 -e MAPR_CLDB_HOSTS=172.24.8.195,172.24.11.200,172.24.10.4 -e MAPR_TICKETFILE_LOCATION=/tmp/maprticket_5000 -e ZEPPELIN_SSL_PORT=9995 -e HOST_IP=172.24.11.62 -e MAPR_HS_HOST=172.24.8.195 -p 9995:9995 -p 10000-10010:10000-10010 -v /tmp/maprticket_5000:/tmp/maprticket_5000:ro -v /sys/fs/cgroup:/sys/fs/cgroup:ro maprtech/data-science- refinery:v1.0_6.0.0_4.0.0_centos7
  • 33. © 2017 MapR TechnologiesMapR Confidential 33 Demo: Ease of Deployment What’s in the command docker run --rm -it --cap-add SYS_ADMIN --cap-add SYS_RESOURCE -- device /dev/fuse --memory 0 -e MAPR_CLUSTER=my.cluster.com -e MAPR_MEMORY=0 -e MAPR_MOUNT_PATH=/mapr -e MAPR_TZ=America/Los_Angeles -e MAPR_CONTAINER_USER=mapr -e MAPR_CONTAINER_UID=5000 -e MAPR_CONTAINER_GROUP=mapr -e MAPR_CONTAINER_GID=5000 -e MAPR_CLDB_HOSTS=172.24.8.195,172.24.11.200,172.24.10.4 -e MAPR_TICKETFILE_LOCATION=/tmp/maprticket_5000 -e ZEPPELIN_SSL_PORT=9995 -e HOST_IP=172.24.11.62 -e MAPR_HS_HOST=172.24.8.195 -p 9995:9995 -p 10000-10010:10000-10010 -v /tmp/maprticket_5000:/tmp/maprticket_5000:ro -v /sys/fs/cgroup:/sys/fs/cgroup:ro maprtech/data-science- refinery:v1.0_6.0.0_4.0.0_centos7
  • 34. © 2017 MapR TechnologiesMapR Confidential 34 Demo: Ease of Deployment How is Security Handled? $ maprlogin password [Password for user ’jane' at cluster 'my.cluster.com': ] MapR credentials of user ’john' for cluster 'my.cluster.com' are written to '/tmp/janes_ticket’ Job submits as ‘jane’
  • 35. © 2017 MapR TechnologiesMapR Confidential 35 Demo: Ease of Deployment Why Livy? CLOUD-SCALE DATA STORE MAPR-XD OPERATIONAL DATABASE MAPR-DB EVENT STREAMING MAPR-ES MAPR CONVERGED DATA PLATFORMHTTP (RPC) Advantages over native Spark Interpreter: • Jobs are submitted in YARN cluster mode • Spark context can be shared • Support for Spark Dynamic Resource Allocation
  • 36. © 2017 MapR TechnologiesMapR Confidential 36 Demo: Extensibility & Collaboration
  • 37. © 2017 MapR TechnologiesMapR Confidential 37 Demo: Extensibility & Collaboration Collaboration CLOUD-SCALE DATA STORE MAPR-XD OPERATIONAL DATABASE MAPR-DB EVENT STREAMING MAPR-ES MAPR CONVERGED DATA PLATFORM
  • 38. © 2017 MapR TechnologiesMapR Confidential 38 Demo: Extensibility & Collaboration Collaboration CLOUD-SCALE DATA STORE MAPR-XD OPERATIONAL DATABASE MAPR-DB EVENT STREAMING MAPR-ES MAPR CONVERGED DATA PLATFORM MAPR POSIX CLIENT FOR CONTAINERS
  • 39. © 2017 MapR TechnologiesMapR Confidential 39 Demo: Extensibility & Collaboration What’s in the command docker run --rm -it --cap-add SYS_ADMIN --cap-add SYS_RESOURCE --device /dev/fuse --memory 0 -e MAPR_CLUSTER=my.cluster.com -e MAPR_MEMORY=0 -e MAPR_MOUNT_PATH=/mapr -e ZEPPELIN_NOTEBOOK_DIR=/mapr/my.cluster.com/user/mapr/zeppelin/shared- notebooks/ -e MAPR_TZ=America/Los_Angeles -e MAPR_CONTAINER_USER=mapr -e MAPR_CONTAINER_UID=5000 -e MAPR_CONTAINER_GROUP=mapr -e MAPR_CONTAINER_GID=5000 -e MAPR_CLDB_HOSTS=172.24.8.195,172.24.11.200,172.24.10.4 -e MAPR_TICKETFILE_LOCATION=/tmp/maprticket_5000 -e ZEPPELIN_SSL_PORT=9995 -e HOST_IP=172.24.11.62 -e MAPR_HS_HOST=172.24.8.195 -p 9995:9995 -p 10000-10010:10000-10010 -v /tmp/maprticket_5000:/tmp/maprticket_5000:ro -v /sys/fs/cgroup:/sys/fs/cgroup:ro maprtech/data-science-refinery:v1.0_6.0.0_4.0.0_centos7
  • 40. © 2017 MapR TechnologiesMapR Confidential 40 Demo: Extensibility Adding Deep Learning libraries to the container
  • 41. © 2017 MapR TechnologiesMapR Confidential 41 Demo: Extensibility Adding Deep Learning libraries to the container CLOUD-SCALE DATA STORE MAPR-XD OPERATIONAL DATABASE MAPR-DB EVENT STREAMING MAPR-ES MAPR CONVERGED DATA PLATFORM Compute Persistent Storage
  • 42. © 2017 MapR TechnologiesMapR Confidential 42 Demo: Extensibility Adding Deep Learning libraries to the container CLOUD-SCALE DATA STORE MAPR-XD OPERATIONAL DATABASE MAPR-DB EVENT STREAMING MAPR-ES MAPR CONVERGED DATA PLATFORM Compute Persistent Storage What if this was a box of GPUs?
  • 43. © 2017 MapR TechnologiesMapR Confidential 43 A Final Comparison Traditional Hadoop Vendor BatchCluster StreamingCluster NoSQLCluster On Premises Data Science cluster
  • 44. © 2017 MapR TechnologiesMapR Confidential 44 Q&A ENGAGE WITH US @mapr rsilver@mapr.com