SlideShare a Scribd company logo
1 of 25
1© Cloudera, Inc. All rights reserved.
More Data in Less Time
Deploying an Operational Data Store with Cloudera
2© Cloudera, Inc. All rights reserved.
Trends in the Market
16 billion connected devices
generating more data
“It will soon be technically
feasible & affordable to record
& store everything…”
ELT drives up to 80% of
database capacity
Internet of Things Data Storage Costs Resource Intensive ELT
Trends Driving Change
Source: Forbes Source: New York Times Source: Syncsort
3© Cloudera, Inc. All rights reserved.
Customers are augmenting their
traditional architectures for
modern business needs.
4© Cloudera, Inc. All rights reserved.
Operational Data Store (ODS):
Ingesting, storing, and preparing data for
both operational and analytical use.
(AKA: Operational Data Warehouse., RDBMS, Storage)
5© Cloudera, Inc. All rights reserved.
ODS Use Cases
Offload resource intensive ETL
workloads from systems
Migrate old data and ELT
workloads off of EDW
Store old data online so analyst
can access historic data
ETL Offload EDW Optimization Active Archive
6© Cloudera, Inc. All rights reserved.
Goals of an Operational Data Store
Ingest Data Store DataPrepare Data
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
7© Cloudera, Inc. All rights reserved.
Challenges with a Traditional Architecture
1) Limited Data Ingest
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
1
8© Cloudera, Inc. All rights reserved.
Challenges with a Traditional Architecture
1) Limited Data Ingest 2) Inefficient Data Processing
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
1
2
2
9© Cloudera, Inc. All rights reserved.
Challenges with a Traditional Architecture
1) Limited Data Ingest 2) Inefficient Data Processing 3) Data Archived
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
1
2
2
3
10© Cloudera, Inc. All rights reserved.
A New Way Forward
1) Ingest More Data
ApplicationsData Sources
Structured
Unstructured
Operational Data Store
Modern Architecture
Enterprise Data Warehouse
EDHIngest
Active
Structured Data
Serve
Serve
ELT
Archive
Load
1
ETL
BI System
Modeling
Reporting
11© Cloudera, Inc. All rights reserved.
A New Way Forward
1) Ingest More Data 2) Optimize Data Processing
ApplicationsData Sources
Structured
Unstructured
Operational Data Store
Modern Architecture
Enterprise Data Warehouse
EDHIngest
Active
Structured Data
Serve
Serve
ELT
Archive
Load
2
1
ETL
BI System
Modeling
Reporting
12© Cloudera, Inc. All rights reserved.
A New Way Forward
1) Ingest More Data 2) Optimize Data Processing 3) Automated Secure Archive
ApplicationsData Sources
Structured
Unstructured
Operational Data Store
Modern Architecture
Enterprise Data Warehouse
EDHIngest
Active
Structured Data
Serve
Serve
ELT
Archive
Load
2
31
ETL
BI System
Modeling
Reporting
13© Cloudera, Inc. All rights reserved.
RelayHealth Customer Story
14© Cloudera, Inc. All rights reserved.
About RelayHealth (A McKesson Business)
What does RelayHealth do-
RelayHealth is a financial solution of McKesson used to automate 2.4 billion financial transactions per year
200K Physicians, 2K Hospitals, 1.9K Payers/ Health Plans
Who is McKesson-
Largest healthcare solution company in the world with $103+ billion in revenue
Headquarters in San Francisco and established in 1833
32K employees
15© Cloudera, Inc. All rights reserved.
RelayHealth’s Objectives
Offload resource intensive ETL
workloads from systems
Migrate old data and ELT
workloads off of EDW
Store old data online so analyst
can access historic data
ETL Offload EDW Optimization Active Archive
16© Cloudera, Inc. All rights reserved.
The Pre-Hadoop Environment
1 Deleted & archived information
Challenges
OLTP
Claim
Submitters
Various
Applications
RDBMS
EDW
Reports
Archive
1
RelayHealth Transaction
BATCH Processing System
17© Cloudera, Inc. All rights reserved.
The Pre-Hadoop Environment
1 Deleted & archived information
Challenges
OLTP
Claim
Submitters
Various
Applications
RDBMS
EDW
Reports
Archive
2 Batch wasn’t cutting it
1
2
RelayHealth Transaction
BATCH Processing System
18© Cloudera, Inc. All rights reserved.
The Pre-Hadoop Environment
1 Deleted & archived information
Challenges
OLTP
Claim
Submitters
Various
Applications
RDBMS
EDW
Reports
Archive
2 Batch wasn’t cutting it
3 Application & report latency
1
3
3
2
3
RelayHealth Transaction
BATCH Processing System
19© Cloudera, Inc. All rights reserved.
RelayHealth’s Modern Hadoop Architecture
Active archive on Hadoop1
Improvements
Traditional BATCH Processing
Hadoop STREAM Processing
Process
Payer
Application
Reports
Spark
Streaming
Claim
Submitters
RelayHealth Transaction Processing System
Ingest Store Access
Kafka Hbase
Search
Spark
Modeling
1
20© Cloudera, Inc. All rights reserved.
RelayHealth’s Modern Hadoop Architecture
Active archive on Hadoop1
Improvements
Traditional BATCH Processing
Hadoop STREAM Processing
Process
Payer
Application
Reports
Spark
Streaming
Claim
Submitters
RelayHealth Transaction Processing System
Ingest Store Access
Kafka Hbase
Search
Spark
Modeling
Stream & batch processing2
2
1
21© Cloudera, Inc. All rights reserved.
RelayHealth’s Modern Hadoop Architecture
Active archive on Hadoop1
Improvements
Traditional BATCH Processing
Hadoop STREAM Processing
Process
Payer
Application
Reports
Spark
Streaming
Claim
Submitters
RelayHealth Transaction Processing System
Ingest Store Access
Kafka Hbase
Search
Spark
Modeling
Stream & batch processing2
Prepared for future use cases3
2
3
1
22© Cloudera, Inc. All rights reserved.
Business and Technical ROI
Technology ROI
Business ROI
1) Active archive and Navigator for HIPAA compliance
2) Prepared for future use cases
3) Data ingest goes from end of day to near real-time
1) Transaction processed in 20ms VS 1 hour prior
2) $250k in licensing and hardware savings per year
3) Greater flexibility with data ingest
23© Cloudera, Inc. All rights reserved.
Key Leanings
Crawl, walk, run
It takes time, start now
Lean on experts in the community
24© Cloudera, Inc. All rights reserved.
INSERT PARTNER SLIDES
25© Cloudera, Inc. All rights reserved.
Thank you

More Related Content

What's hot

Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
B2B Integration in the Cloud
B2B Integration in the CloudB2B Integration in the Cloud
B2B Integration in the Cloudi8c
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeDatabricks
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSightAmazon Web Services
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing
 
A deep dive session on Tableau
A deep dive session on TableauA deep dive session on Tableau
A deep dive session on TableauVisual_BI
 
Seven building blocks for MDM
Seven building blocks for MDMSeven building blocks for MDM
Seven building blocks for MDMKousik Mukherjee
 
Mdm introduction
Mdm introductionMdm introduction
Mdm introductionNagesh Slj
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceDenodo
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeAmazon Web Services
 

What's hot (20)

Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
B2B Integration in the Cloud
B2B Integration in the CloudB2B Integration in the Cloud
B2B Integration in the Cloud
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSight
 
Partnering with AWS
Partnering with AWSPartnering with AWS
Partnering with AWS
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Intro to AWS: Storage Services
Intro to AWS: Storage ServicesIntro to AWS: Storage Services
Intro to AWS: Storage Services
 
A deep dive session on Tableau
A deep dive session on TableauA deep dive session on Tableau
A deep dive session on Tableau
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
 
Seven building blocks for MDM
Seven building blocks for MDMSeven building blocks for MDM
Seven building blocks for MDM
 
MS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTUREMS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTURE
 
Mdm introduction
Mdm introductionMdm introduction
Mdm introduction
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data Lake
 
Data Vault and DW2.0
Data Vault and DW2.0Data Vault and DW2.0
Data Vault and DW2.0
 

Similar to Breakout: Hadoop and the Operational Data Store

Breakout: Data Discovery with Hadoop
Breakout: Data Discovery with HadoopBreakout: Data Discovery with Hadoop
Breakout: Data Discovery with HadoopCloudera, Inc.
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)Cloudera, Inc.
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetupByung Ho Lee
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Cloudera, Inc.
 
Google take on heterogeneous data base replication
Google take on heterogeneous data base replication Google take on heterogeneous data base replication
Google take on heterogeneous data base replication Svetlin Stanchev
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
 
times ten in-memory database for extreme performance
times ten in-memory database for extreme performancetimes ten in-memory database for extreme performance
times ten in-memory database for extreme performanceOracle Korea
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightCloudera, Inc.
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...EMC
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSteven Totman
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETLLily Luo
 
Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopCloudera, Inc.
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeKent Graziano
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightPrecisely
 

Similar to Breakout: Hadoop and the Operational Data Store (20)

Breakout: Data Discovery with Hadoop
Breakout: Data Discovery with HadoopBreakout: Data Discovery with Hadoop
Breakout: Data Discovery with Hadoop
 
CS-Op Analytics
CS-Op AnalyticsCS-Op Analytics
CS-Op Analytics
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)
 
Modernise your EDW - Data Lake
Modernise your EDW - Data LakeModernise your EDW - Data Lake
Modernise your EDW - Data Lake
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetup
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
Google take on heterogeneous data base replication
Google take on heterogeneous data base replication Google take on heterogeneous data base replication
Google take on heterogeneous data base replication
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
times ten in-memory database for extreme performance
times ten in-memory database for extreme performancetimes ten in-memory database for extreme performance
times ten in-memory database for extreme performance
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETL
 
Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with Hadoop
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Breakout: Hadoop and the Operational Data Store

  • 1. 1© Cloudera, Inc. All rights reserved. More Data in Less Time Deploying an Operational Data Store with Cloudera
  • 2. 2© Cloudera, Inc. All rights reserved. Trends in the Market 16 billion connected devices generating more data “It will soon be technically feasible & affordable to record & store everything…” ELT drives up to 80% of database capacity Internet of Things Data Storage Costs Resource Intensive ELT Trends Driving Change Source: Forbes Source: New York Times Source: Syncsort
  • 3. 3© Cloudera, Inc. All rights reserved. Customers are augmenting their traditional architectures for modern business needs.
  • 4. 4© Cloudera, Inc. All rights reserved. Operational Data Store (ODS): Ingesting, storing, and preparing data for both operational and analytical use. (AKA: Operational Data Warehouse., RDBMS, Storage)
  • 5. 5© Cloudera, Inc. All rights reserved. ODS Use Cases Offload resource intensive ETL workloads from systems Migrate old data and ELT workloads off of EDW Store old data online so analyst can access historic data ETL Offload EDW Optimization Active Archive
  • 6. 6© Cloudera, Inc. All rights reserved. Goals of an Operational Data Store Ingest Data Store DataPrepare Data Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load
  • 7. 7© Cloudera, Inc. All rights reserved. Challenges with a Traditional Architecture 1) Limited Data Ingest Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load 1
  • 8. 8© Cloudera, Inc. All rights reserved. Challenges with a Traditional Architecture 1) Limited Data Ingest 2) Inefficient Data Processing Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load 1 2 2
  • 9. 9© Cloudera, Inc. All rights reserved. Challenges with a Traditional Architecture 1) Limited Data Ingest 2) Inefficient Data Processing 3) Data Archived Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load 1 2 2 3
  • 10. 10© Cloudera, Inc. All rights reserved. A New Way Forward 1) Ingest More Data ApplicationsData Sources Structured Unstructured Operational Data Store Modern Architecture Enterprise Data Warehouse EDHIngest Active Structured Data Serve Serve ELT Archive Load 1 ETL BI System Modeling Reporting
  • 11. 11© Cloudera, Inc. All rights reserved. A New Way Forward 1) Ingest More Data 2) Optimize Data Processing ApplicationsData Sources Structured Unstructured Operational Data Store Modern Architecture Enterprise Data Warehouse EDHIngest Active Structured Data Serve Serve ELT Archive Load 2 1 ETL BI System Modeling Reporting
  • 12. 12© Cloudera, Inc. All rights reserved. A New Way Forward 1) Ingest More Data 2) Optimize Data Processing 3) Automated Secure Archive ApplicationsData Sources Structured Unstructured Operational Data Store Modern Architecture Enterprise Data Warehouse EDHIngest Active Structured Data Serve Serve ELT Archive Load 2 31 ETL BI System Modeling Reporting
  • 13. 13© Cloudera, Inc. All rights reserved. RelayHealth Customer Story
  • 14. 14© Cloudera, Inc. All rights reserved. About RelayHealth (A McKesson Business) What does RelayHealth do- RelayHealth is a financial solution of McKesson used to automate 2.4 billion financial transactions per year 200K Physicians, 2K Hospitals, 1.9K Payers/ Health Plans Who is McKesson- Largest healthcare solution company in the world with $103+ billion in revenue Headquarters in San Francisco and established in 1833 32K employees
  • 15. 15© Cloudera, Inc. All rights reserved. RelayHealth’s Objectives Offload resource intensive ETL workloads from systems Migrate old data and ELT workloads off of EDW Store old data online so analyst can access historic data ETL Offload EDW Optimization Active Archive
  • 16. 16© Cloudera, Inc. All rights reserved. The Pre-Hadoop Environment 1 Deleted & archived information Challenges OLTP Claim Submitters Various Applications RDBMS EDW Reports Archive 1 RelayHealth Transaction BATCH Processing System
  • 17. 17© Cloudera, Inc. All rights reserved. The Pre-Hadoop Environment 1 Deleted & archived information Challenges OLTP Claim Submitters Various Applications RDBMS EDW Reports Archive 2 Batch wasn’t cutting it 1 2 RelayHealth Transaction BATCH Processing System
  • 18. 18© Cloudera, Inc. All rights reserved. The Pre-Hadoop Environment 1 Deleted & archived information Challenges OLTP Claim Submitters Various Applications RDBMS EDW Reports Archive 2 Batch wasn’t cutting it 3 Application & report latency 1 3 3 2 3 RelayHealth Transaction BATCH Processing System
  • 19. 19© Cloudera, Inc. All rights reserved. RelayHealth’s Modern Hadoop Architecture Active archive on Hadoop1 Improvements Traditional BATCH Processing Hadoop STREAM Processing Process Payer Application Reports Spark Streaming Claim Submitters RelayHealth Transaction Processing System Ingest Store Access Kafka Hbase Search Spark Modeling 1
  • 20. 20© Cloudera, Inc. All rights reserved. RelayHealth’s Modern Hadoop Architecture Active archive on Hadoop1 Improvements Traditional BATCH Processing Hadoop STREAM Processing Process Payer Application Reports Spark Streaming Claim Submitters RelayHealth Transaction Processing System Ingest Store Access Kafka Hbase Search Spark Modeling Stream & batch processing2 2 1
  • 21. 21© Cloudera, Inc. All rights reserved. RelayHealth’s Modern Hadoop Architecture Active archive on Hadoop1 Improvements Traditional BATCH Processing Hadoop STREAM Processing Process Payer Application Reports Spark Streaming Claim Submitters RelayHealth Transaction Processing System Ingest Store Access Kafka Hbase Search Spark Modeling Stream & batch processing2 Prepared for future use cases3 2 3 1
  • 22. 22© Cloudera, Inc. All rights reserved. Business and Technical ROI Technology ROI Business ROI 1) Active archive and Navigator for HIPAA compliance 2) Prepared for future use cases 3) Data ingest goes from end of day to near real-time 1) Transaction processed in 20ms VS 1 hour prior 2) $250k in licensing and hardware savings per year 3) Greater flexibility with data ingest
  • 23. 23© Cloudera, Inc. All rights reserved. Key Leanings Crawl, walk, run It takes time, start now Lean on experts in the community
  • 24. 24© Cloudera, Inc. All rights reserved. INSERT PARTNER SLIDES
  • 25. 25© Cloudera, Inc. All rights reserved. Thank you

Editor's Notes

  1. Data storage costs: http://thecaucus.blogs.nytimes.com/2012/08/14/advances-in-data-storage-have-implications-for-government-surveillance/IoT: http://www.forbes.com/sites/gilpress/2014/08/22/internet-of-things-by-the-numbers-market-estimates-and-forecasts/ Resource Intensive ELT: http://www.syncsort.com/getattachment/45696aa9-1e40-43cb-8905-b9fc7e2519f7/Syncsort-Data-Warehouse-Offload-Solution.aspx
  2. An Operational Data Store provides a staging environment in order to ingest, store, and process data in preparation for operational and analytical use. Depending on whether or not this data is structured or unstructured, different systems can be used to optimize data pipelines. The only challenge is that as your organization continues to ask for larger volumes of diverse data, traditional systems face issues.
  3. These challenges specifically arise around data storage and processing. The first challenge is limited data access. Collecting and ingesting a wide variety of diverse data is not a simple task and usually results in additional systems, or capacity being added to the architecture. As the business continues to ask for more data this continues to put strains on IT. In order to avoid these challenges only the most valuable data is brought in, limiting the businesses access to data that could be extremely valuable. The second challenges that we see organizations try to hurdle is around processing data volumes. These organizations have already collected and operationalized large volumes of data and need to process this data efficiently in order to meet SLAs. If data doesn’t reach the employees in a timely manner then they continue on without the most recent information. The third and final set of challenges is around archiving data. When systems reach capacity as larger volumes of diverse data is leveraged within an organization, this causes IT professionals to archive or delete data that has been deemed “invaluable”. When data is moved offline to an archive, this significantly reduces the return on the data and can hurt the business. This data can be extremely important as analyst attempt to find patterns in historic data but can’t access this information because it’s offline. However, as the external and internal data environment has changed over the years so has the data management space.
  4. These challenges specifically arise around data storage and processing. The first challenge is limited data access. Collecting and ingesting a wide variety of diverse data is not a simple task and usually results in additional systems, or capacity being added to the architecture. As the business continues to ask for more data this continues to put strains on IT. In order to avoid these challenges only the most valuable data is brought in, limiting the businesses access to data that could be extremely valuable. The second challenges that we see organizations try to hurdle is around processing data volumes. These organizations have already collected and operationalized large volumes of data and need to process this data efficiently in order to meet SLAs. If data doesn’t reach the employees in a timely manner then they continue on without the most recent information. The third and final set of challenges is around archiving data. When systems reach capacity as larger volumes of diverse data is leveraged within an organization, this causes IT professionals to archive or delete data that has been deemed “invaluable”. When data is moved offline to an archive, this significantly reduces the return on the data and can hurt the business. This data can be extremely important as analyst attempt to find patterns in historic data but can’t access this information because it’s offline. However, as the external and internal data environment has changed over the years so has the data management space.
  5. These challenges specifically arise around data storage and processing. The first challenge is limited data access. Collecting and ingesting a wide variety of diverse data is not a simple task and usually results in additional systems, or capacity being added to the architecture. As the business continues to ask for more data this continues to put strains on IT. In order to avoid these challenges only the most valuable data is brought in, limiting the businesses access to data that could be extremely valuable. The second challenges that we see organizations try to hurdle is around processing data volumes. These organizations have already collected and operationalized large volumes of data and need to process this data efficiently in order to meet SLAs. If data doesn’t reach the employees in a timely manner then they continue on without the most recent information. The third and final set of challenges is around archiving data. When systems reach capacity as larger volumes of diverse data is leveraged within an organization, this causes IT professionals to archive or delete data that has been deemed “invaluable”. When data is moved offline to an archive, this significantly reduces the return on the data and can hurt the business. This data can be extremely important as analyst attempt to find patterns in historic data but can’t access this information because it’s offline. However, as the external and internal data environment has changed over the years so has the data management space.
  6. We have been working closely with leading organizations to create a platform that allows them to complement their current architecture in order to avoid these common challenges. This in turn prepares for future growth of data within their organizations. Ingest More Data- Cloudera allows you to collect and ingest any data type or volume of data, in full fidelity, in order to allow for complete data access to your current systems and end users. This has allowed organizations to collect and access more diverse data, opening up the possibilities of what data can do for the business, without compromising system performance or existing resource constraints. Efficiently Process & Store Data Volumes- By offloading heavy processing workloads to Cloudera, organizations are able to use parallel processing in order to significantly reduce processing time on large volumes of data. With the scalable nature of Cloudera, you also ensure that no matter how much data is stored the platform continues to perform at peak performance. Automated Secure Archive- Leveraging Cloudera as an ODS and using it as a centralized staging environment for new data allows you to automatically create a secure archive. Because of the platform’s scalable nature, there is never a reason to archive your data. Historic data can remain on the platform for analysts allowing them complete access without derogating system performance. While smaller volumes of already defined active data can run directly into the right systems, with outdated data being offloaded to Cloudera. Leading data organizations have already seen these benefits.
  7. We have been working closely with leading organizations to create a platform that allows them to complement their current architecture in order to avoid these common challenges. This in turn prepares for future growth of data within their organizations. Ingest More Data- Cloudera allows you to collect and ingest any data type or volume of data, in full fidelity, in order to allow for complete data access to your current systems and end users. This has allowed organizations to collect and access more diverse data, opening up the possibilities of what data can do for the business, without compromising system performance or existing resource constraints. Efficiently Process & Store Data Volumes- By offloading heavy processing workloads to Cloudera, organizations are able to use parallel processing in order to significantly reduce processing time on large volumes of data. With the scalable nature of Cloudera, you also ensure that no matter how much data is stored the platform continues to perform at peak performance. Automated Secure Archive- Leveraging Cloudera as an ODS and using it as a centralized staging environment for new data allows you to automatically create a secure archive. Because of the platform’s scalable nature, there is never a reason to archive your data. Historic data can remain on the platform for analysts allowing them complete access without derogating system performance. While smaller volumes of already defined active data can run directly into the right systems, with outdated data being offloaded to Cloudera. Leading data organizations have already seen these benefits.
  8. We have been working closely with leading organizations to create a platform that allows them to complement their current architecture in order to avoid these common challenges. This in turn prepares for future growth of data within their organizations. Ingest More Data- Cloudera allows you to collect and ingest any data type or volume of data, in full fidelity, in order to allow for complete data access to your current systems and end users. This has allowed organizations to collect and access more diverse data, opening up the possibilities of what data can do for the business, without compromising system performance or existing resource constraints. Efficiently Process & Store Data Volumes- By offloading heavy processing workloads to Cloudera, organizations are able to use parallel processing in order to significantly reduce processing time on large volumes of data. With the scalable nature of Cloudera, you also ensure that no matter how much data is stored the platform continues to perform at peak performance. Automated Secure Archive- Leveraging Cloudera as an ODS and using it as a centralized staging environment for new data allows you to automatically create a secure archive. Because of the platform’s scalable nature, there is never a reason to archive your data. Historic data can remain on the platform for analysts allowing them complete access without derogating system performance. While smaller volumes of already defined active data can run directly into the right systems, with outdated data being offloaded to Cloudera. Leading data organizations have already seen these benefits.
  9. Arrow from batch to stream processing