SlideShare a Scribd company logo
1 of 23
Download to read offline
Modernize Your Data Analytics
Architecture with a Unified
Approach to Data + AI
Anand Venugopal
Global Leader - Industry Solutions (Migrations)
Databricks
Topics
Why migrate from Hadoop to Databricks ?
Success stories, technical and business benefits
How can you migrate fast with low costs & low risk ?
Legacy On-Prem Analytics Architectures
Are Not Keeping Up
Hadoop costs rising when
costs need to be cut
Innovation hinges on ML
and predictive insights
Business agility requires
real-time data
This is preventing teams from driving
high-impact business outcomes
Why Migrate to Databricks ?
Forrester study finds 417% ROI for
companies switching to Databricks
47%
Cost-savings from retiring
legacy infrastructure
5%
Increase in revenue
25%
Data team productivity
increase
DEVOPS INTENSIVE RIGID AND INELASTIC
Hadoop is Costly, Complex and Ineffective
Hadoop ecosystem is
complex and hard to manage
that is prone to failures
Low Productivity
24/7 HDFS clusters that
need to be built for peak
use and costly to upgrade
Cost Prohibitive
LACKS AI CAPABILITIES
No out-of-box Hadoop support
for ML/AI and separate
environments for data and AI
Slow Innovation
X
Enterprises Need a Modern
Data Analytics Architecture
CRITICAL REQUIREMENTS
Cost-effective scale and performance in the cloud
Easy to manage and highly reliable for diverse data
Predictive and real-time insights to drive innovation
Enhanced Productivity Lower Cost at Scale New Insights Faster
Building a Modern Cloud Analytics
Architecture with Databricks
Data Science
Workspace
EASY TO MANAGE MASSIVE SCALE AI-ENABLED INNOVATION
Managed cloud platform
that can reliably handle all
types of data
On-demand, elastic
autoscale clusters with
optimized Apache Spark
Unified and collaborative
notebooks with built-in ML
capabilities
Databricks Unified Data Analytics
High performance query engineDELTA ENGINE
One platform for every use caseStreaming
Analytics
BI Data
Science
Machine
Learning
Data Lake for all your data
Structured, Semi-Structured and Unstructured
Data
Structured transactional layer
Powering Innovation with Modern Data Analytics
Customers that migrated from Hadoop
Business value: What did they do with us?
“The Un-carrier strategy is an approach that
seeks to listen to the customer, address their
pain points, bring innovation to the industry
and improve the wireless experience for all.”
Situation
○ Every network interaction (call, website load, text, app)
logged in 1,600 node HDP data lake (30PB).
○ 4-5 “large scale” pipelines, with hundreds of downstream
pipelines feeding the business
● PCMD (network measurement data), CDR (call records),
EDR (DNS (website)), LSR (Location)
○ Process call data to get critical network insights:
call-failure reasons and network outages.
○ PCMD – Per Call Measurement Data
● Provides insights on call failures at a granular level
● Best source to determine the outage cause and effect
● Provides rich information about the Sprint customers
roaming in T-Mobile network
Solution:
Holistic transformation instead of ‘lift & shift’
Overview
● Migration and transformation of
streaming data analytics from
Apache Storm and Hive on
Hortonworks to Azure Databricks
● The Data was streaming in at an
average of 2M records per second,
375GB per batch, 23 TB per day
(uncompressed)
Results
Accelerating key insights
e.g. hourly dashboards
protecting revenue and
customer churn.
78.5xPerformance gain versus
on-prem operation
BEFORE (with Hive on Tez): 47
mins for 15k cores to do the
job
AFTER: 35 mins for 256 cores
to do the same job
KPI computations took 1/4th
of the time enabling new
hourly dashboards (w/out
optimizations e.g. warm pool
and others still in process)
40%Reduction in use of 1600 node
on-prem cluster
Supply Chain decisions
Apply ML to 5000+ stores data
Impact
• 70% reduction in operational costs
• Accelerated Business growth
Demand Forecasting
500K stores, 2TB, 250 pipelines
Impact
• 10X more capacity
• 2X faster data pipelines
Predict Bakery food
spoilage
10+ Large Hadoop clusters
Impact
• $100M in fresh food spoilage saved
• $900K costs down, Time: 7 hr → 40m
Optimize programming
• Could not process 90 days of data
with large Hadoop cluster
Impact
• 26% Team productivity increase
• More Data, lower costs, low devops
Databricks Drives New Business
Value at 3 Levels
Databricks Value Framework
The Data
Platform
Business
Outcome
More value
Less value
$$$
$$
$
BUSINESS
IMPACTING
USE CASES
PRODUCTIVITY
INFRASTRUCTURE
Databricks accelerates and expands the
realization of value from
business-oriented use cases that use
net-new capabilities vs. Hadoop
Higher productivity among data scientists
& data engineers eliminating manual tasks
Reduced infrastructure spend with the
performance of the Databricks runtime
3
1
2
$12.8M in value delivered with Databricks
Value of Databricks
■ Removed Cloudera licensing
■ No need to add expensive new hardware for additional capacity
■ Avoided data center costs
■ Avoided Hadoop administration costs
Cloudera costs vs. Databricks value & investment
Units: $ Cumulative PV over 3 years
Potential value
with Databricks
Cloudera - Cost of
inaction
Investment -
Databricks,
migration & cloud
Net impact
Includes cost of both
solutions during
migration
$13.8 M
-$18.7 M
-$4.9 M
$12.8 M
Cloudera costs
■ Data center, Hadoop administration, new
hardware, licensing
Databricks investment
■ Databricks usage & support
■ Migration
■ Cloud compute
Databricks customer example:
Large U.S. Telco, 156 node cluster
Source: Databricks value model
Value of Databricks
■ Avoided Cloudera licensing
■ No need to add expensive new hardware
for additional capacity
■ Avoided data center costs
■ Avoided Hadoop administration costs
Work with us for a Tailored Value Case
for Your Migration
Tailored Financial Analysis
Tailored business case to be produced
by answering 4 core questions:
1. How many nodes in your Hadoop
environment?
2. How many people support your
Hadoop environment?
3. When is your Cloudera renewal?
4. How do you expect your
data needs to grow over time?
Customer
example
Proven Migration Strategy:Reduce Risk,
Costs
Databricks
Expert Team
System IntegratorsTools, ISV Partners
AUTOMATION, TOOLS AND
PROVEN METHODOLOGIES
Cloud Partners
COMPONENTS TO MIGRATE SUCCESSFUL MIGRATION
Data +
Metadata
Workloads/
Jobs
Security &
governance
Other tools,
integrations
Strategy Options: Lift & shift (faster, automatable) Transformation (higher impact)
Automated conversion for most workload types
Data Migration
Metastore Migration
SQL Migration
Security
Scheduled Data pulls
Orchestration
HDFS
Hive Databases / Tables / Views
Impala Databases / Tables/ Views
HDFS
Hive Queries
Spark Queries
Sentry permissions /Ranger policies
HDFS access permissions
Sqoop statements
Oozie Jobs
Azure ADLS Gen 2, AWS S3
Databricks Tables
Databricks Tables
Spark Sql Databricks Notebooks
Spark Sql Databricks Notebooks
Databricks Notebooks
Databricks permissions
AWS IAM, ADLS ACLs
Databricks compatible PySpark
code
Airflow DAGs & Databricks Jobs
Typically, customers save 55-66 % in costs and see a
reduction of 2-3x in timelines by using Automation tool
Data MigrationAssessment & Design
Manual
Migration
Workloads Migration, Validation Cutover Operations
17- 20 Weeks
8 Weeks
Using
Automation
Accelerated Data & Workloads Migration,
Validation
Accelerated
Assessment &
Design
Cutover
Operations
* Typical implementation scenario ~ 4 PB of Data and 3000 jobs with mixed workloads considered
Our Partner Ecosystem will Accelerate Migrations
ISV Partners and Migration Tools
Security
Governance
Consulting & SI Partners
Databricks
Migration
SWAT team +
CS Packaged
Services
For Migration
Cloud
Customized Hadoop Migration Success
Plan with a Free Expert-led Assessment
1
2
3
Pre-questionnaire + Discovery, education workshops led by experts
▪ Learn about how Databricks works and how your current workloads, tools and
processes map and transform in the future state in cloud
Proposal and Recommendations for path forward
▪ The expert team will summarize all the findings and walk through the proposed
costs, business value summary and recommended migration plan
Technical, Use-case and Business Value analysis
▪ High level current and future state architecture, discuss use-cases and prioritize
them, understand how $$ value is driven with the migration
Databricks Experts Know Hadoop
▪ More than 100 years of combined experience in Hadoop
▪ Practitioners, Architects, Engineers, and Consultants, Open Source
Contributors and Committers
▪ Expertise with all Hadoop ecosystem components and distributions
IMG IMG IMG IMG IMGIMG IMG
Hadoop migration to Databricks - recap
Why - Costs, Productivity, Innovation → Business Impact
Your competitors and market leaders are doing it NOW
Databricks experts and automation strategy can help you
migrate faster, with much lower cost and risk
Thank you
Please visit databricks.com/migration

More Related Content

What's hot

Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesNeo4j
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake ArchitectureDATAVERSITY
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Scaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with DatabricksScaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with DatabricksDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?confluent
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guideslidedown1
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architectureSudheer Kondla
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 

What's hot (20)

Data Vault and DW2.0
Data Vault and DW2.0Data Vault and DW2.0
Data Vault and DW2.0
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
The Ecosystem is too damn big
The Ecosystem is too damn big The Ecosystem is too damn big
The Ecosystem is too damn big
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake Architecture
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Scaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with DatabricksScaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guide
 
Sqoop
SqoopSqoop
Sqoop
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 

Similar to The Hidden Value of Hadoop Migration

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsjdijcks
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA
 
CirrusDB Offerings
CirrusDB OfferingsCirrusDB Offerings
CirrusDB OfferingsAshok Sami
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
Mainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzureMainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzurePrecisely
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Denodo
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingAmazon Web Services
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptxWasm1953
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Data Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSData Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSDenodo
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Abhimanyu Singhal
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the CloudKellyn Pot'Vin-Gorman
 
Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesAccelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesPaul Van Siclen
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
 

Similar to The Hidden Value of Hadoop Migration (20)

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...
 
CirrusDB Offerings
CirrusDB OfferingsCirrusDB Offerings
CirrusDB Offerings
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Mainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzureMainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft Azure
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Data Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSData Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWS
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the Cloud
 
Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesAccelerate and modernize your data pipelines
Accelerate and modernize your data pipelines
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 

More from Databricks

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityDatabricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueDatabricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesDatabricks
 

More from Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot Instances
 

Recently uploaded

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 

Recently uploaded (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 

The Hidden Value of Hadoop Migration

  • 1. Modernize Your Data Analytics Architecture with a Unified Approach to Data + AI Anand Venugopal Global Leader - Industry Solutions (Migrations) Databricks
  • 2. Topics Why migrate from Hadoop to Databricks ? Success stories, technical and business benefits How can you migrate fast with low costs & low risk ?
  • 3. Legacy On-Prem Analytics Architectures Are Not Keeping Up Hadoop costs rising when costs need to be cut Innovation hinges on ML and predictive insights Business agility requires real-time data This is preventing teams from driving high-impact business outcomes
  • 4. Why Migrate to Databricks ? Forrester study finds 417% ROI for companies switching to Databricks 47% Cost-savings from retiring legacy infrastructure 5% Increase in revenue 25% Data team productivity increase
  • 5. DEVOPS INTENSIVE RIGID AND INELASTIC Hadoop is Costly, Complex and Ineffective Hadoop ecosystem is complex and hard to manage that is prone to failures Low Productivity 24/7 HDFS clusters that need to be built for peak use and costly to upgrade Cost Prohibitive LACKS AI CAPABILITIES No out-of-box Hadoop support for ML/AI and separate environments for data and AI Slow Innovation X
  • 6. Enterprises Need a Modern Data Analytics Architecture CRITICAL REQUIREMENTS Cost-effective scale and performance in the cloud Easy to manage and highly reliable for diverse data Predictive and real-time insights to drive innovation
  • 7. Enhanced Productivity Lower Cost at Scale New Insights Faster Building a Modern Cloud Analytics Architecture with Databricks Data Science Workspace EASY TO MANAGE MASSIVE SCALE AI-ENABLED INNOVATION Managed cloud platform that can reliably handle all types of data On-demand, elastic autoscale clusters with optimized Apache Spark Unified and collaborative notebooks with built-in ML capabilities
  • 8. Databricks Unified Data Analytics High performance query engineDELTA ENGINE One platform for every use caseStreaming Analytics BI Data Science Machine Learning Data Lake for all your data Structured, Semi-Structured and Unstructured Data Structured transactional layer
  • 9. Powering Innovation with Modern Data Analytics Customers that migrated from Hadoop
  • 10. Business value: What did they do with us? “The Un-carrier strategy is an approach that seeks to listen to the customer, address their pain points, bring innovation to the industry and improve the wireless experience for all.” Situation ○ Every network interaction (call, website load, text, app) logged in 1,600 node HDP data lake (30PB). ○ 4-5 “large scale” pipelines, with hundreds of downstream pipelines feeding the business ● PCMD (network measurement data), CDR (call records), EDR (DNS (website)), LSR (Location) ○ Process call data to get critical network insights: call-failure reasons and network outages. ○ PCMD – Per Call Measurement Data ● Provides insights on call failures at a granular level ● Best source to determine the outage cause and effect ● Provides rich information about the Sprint customers roaming in T-Mobile network
  • 11. Solution: Holistic transformation instead of ‘lift & shift’ Overview ● Migration and transformation of streaming data analytics from Apache Storm and Hive on Hortonworks to Azure Databricks ● The Data was streaming in at an average of 2M records per second, 375GB per batch, 23 TB per day (uncompressed) Results Accelerating key insights e.g. hourly dashboards protecting revenue and customer churn. 78.5xPerformance gain versus on-prem operation BEFORE (with Hive on Tez): 47 mins for 15k cores to do the job AFTER: 35 mins for 256 cores to do the same job KPI computations took 1/4th of the time enabling new hourly dashboards (w/out optimizations e.g. warm pool and others still in process) 40%Reduction in use of 1600 node on-prem cluster
  • 12. Supply Chain decisions Apply ML to 5000+ stores data Impact • 70% reduction in operational costs • Accelerated Business growth Demand Forecasting 500K stores, 2TB, 250 pipelines Impact • 10X more capacity • 2X faster data pipelines Predict Bakery food spoilage 10+ Large Hadoop clusters Impact • $100M in fresh food spoilage saved • $900K costs down, Time: 7 hr → 40m Optimize programming • Could not process 90 days of data with large Hadoop cluster Impact • 26% Team productivity increase • More Data, lower costs, low devops
  • 13. Databricks Drives New Business Value at 3 Levels Databricks Value Framework The Data Platform Business Outcome More value Less value $$$ $$ $ BUSINESS IMPACTING USE CASES PRODUCTIVITY INFRASTRUCTURE Databricks accelerates and expands the realization of value from business-oriented use cases that use net-new capabilities vs. Hadoop Higher productivity among data scientists & data engineers eliminating manual tasks Reduced infrastructure spend with the performance of the Databricks runtime 3 1 2
  • 14. $12.8M in value delivered with Databricks Value of Databricks ■ Removed Cloudera licensing ■ No need to add expensive new hardware for additional capacity ■ Avoided data center costs ■ Avoided Hadoop administration costs Cloudera costs vs. Databricks value & investment Units: $ Cumulative PV over 3 years Potential value with Databricks Cloudera - Cost of inaction Investment - Databricks, migration & cloud Net impact Includes cost of both solutions during migration $13.8 M -$18.7 M -$4.9 M $12.8 M Cloudera costs ■ Data center, Hadoop administration, new hardware, licensing Databricks investment ■ Databricks usage & support ■ Migration ■ Cloud compute Databricks customer example: Large U.S. Telco, 156 node cluster Source: Databricks value model Value of Databricks ■ Avoided Cloudera licensing ■ No need to add expensive new hardware for additional capacity ■ Avoided data center costs ■ Avoided Hadoop administration costs
  • 15. Work with us for a Tailored Value Case for Your Migration Tailored Financial Analysis Tailored business case to be produced by answering 4 core questions: 1. How many nodes in your Hadoop environment? 2. How many people support your Hadoop environment? 3. When is your Cloudera renewal? 4. How do you expect your data needs to grow over time? Customer example
  • 16. Proven Migration Strategy:Reduce Risk, Costs Databricks Expert Team System IntegratorsTools, ISV Partners AUTOMATION, TOOLS AND PROVEN METHODOLOGIES Cloud Partners COMPONENTS TO MIGRATE SUCCESSFUL MIGRATION Data + Metadata Workloads/ Jobs Security & governance Other tools, integrations Strategy Options: Lift & shift (faster, automatable) Transformation (higher impact)
  • 17. Automated conversion for most workload types Data Migration Metastore Migration SQL Migration Security Scheduled Data pulls Orchestration HDFS Hive Databases / Tables / Views Impala Databases / Tables/ Views HDFS Hive Queries Spark Queries Sentry permissions /Ranger policies HDFS access permissions Sqoop statements Oozie Jobs Azure ADLS Gen 2, AWS S3 Databricks Tables Databricks Tables Spark Sql Databricks Notebooks Spark Sql Databricks Notebooks Databricks Notebooks Databricks permissions AWS IAM, ADLS ACLs Databricks compatible PySpark code Airflow DAGs & Databricks Jobs
  • 18. Typically, customers save 55-66 % in costs and see a reduction of 2-3x in timelines by using Automation tool Data MigrationAssessment & Design Manual Migration Workloads Migration, Validation Cutover Operations 17- 20 Weeks 8 Weeks Using Automation Accelerated Data & Workloads Migration, Validation Accelerated Assessment & Design Cutover Operations * Typical implementation scenario ~ 4 PB of Data and 3000 jobs with mixed workloads considered
  • 19. Our Partner Ecosystem will Accelerate Migrations ISV Partners and Migration Tools Security Governance Consulting & SI Partners Databricks Migration SWAT team + CS Packaged Services For Migration Cloud
  • 20. Customized Hadoop Migration Success Plan with a Free Expert-led Assessment 1 2 3 Pre-questionnaire + Discovery, education workshops led by experts ▪ Learn about how Databricks works and how your current workloads, tools and processes map and transform in the future state in cloud Proposal and Recommendations for path forward ▪ The expert team will summarize all the findings and walk through the proposed costs, business value summary and recommended migration plan Technical, Use-case and Business Value analysis ▪ High level current and future state architecture, discuss use-cases and prioritize them, understand how $$ value is driven with the migration
  • 21. Databricks Experts Know Hadoop ▪ More than 100 years of combined experience in Hadoop ▪ Practitioners, Architects, Engineers, and Consultants, Open Source Contributors and Committers ▪ Expertise with all Hadoop ecosystem components and distributions IMG IMG IMG IMG IMGIMG IMG
  • 22. Hadoop migration to Databricks - recap Why - Costs, Productivity, Innovation → Business Impact Your competitors and market leaders are doing it NOW Databricks experts and automation strategy can help you migrate faster, with much lower cost and risk
  • 23. Thank you Please visit databricks.com/migration