SlideShare a Scribd company logo
1 of 25
Download to read offline
mycervello.com
How to select a modern cloud data
warehouse and get the most out of it
18th June 2019
Slim Baltagi
Director, Big Data & ML
NYC Advanced Analytics Meetup
Jim Leavitt
Vice President
co-CEO & Founder of Cervello
© 2019 Cervello, an A.T. Kearney company
Agenda
PART I (Slim Baltagi – 35 minutes)
1. Key Terms
2. Traditional Data Warehouses vs. Modern Data Warehouses
3. How to select a cloud data warehouse?
PART II (Jim Leavitt – 20 minutes)
1. Data era
2. Business use cases
3. How to get/make the most of a modern data warehouse?
PART III ( Discussion with attendees – 30 minutes)
2
© 2019 Cervello, an A.T. Kearney company
PART I – MODERN
DATA WAREHOUSE
3
© 2019 Cervello, an A.T. Kearney company
1. Key Terms: Cloud
• What is exactly the cloud? According to Gartner, ‘A style of computing in which scalable and
elastic IT-enabled capabilities are delivered as a service using Internet technologies’
• What are the models within the cloud and how do they apply to data warehousing?
4
Hardware:
Datacenter
Software: Data warehouse Management:
Optimizing, tuning,
maintenance
Kitchen, Cook/Order,
Serve, Eat, Clean
On-premises You You You At home
Infrastructure IaaS
Example: EC2
Vendor You You At a vacation
home
Platform PaaS
Example: Amazon
Redshift, EMR
Vendor Vendor You At a fast food
restaurant
Software SaaS
Example: Snowflake
Vendor Vendor Vendor At a full service
restaurant
© 2019 Cervello, an A.T. Kearney company
1. Key Terms: Data Warehouse
• What is a data warehouse? ‘A subject-oriented, integrated, time-variant, non-volatile collection of data in
support of management’s decision-making process.’ Source: Building the data warehouse, a book by Bill
Inmon, who is considered to be the father of data warehousing
• Since the beginning, data warehousing was about the business making better and quicker decisions
• Over the last 30 years, data warehouses evolved from traditional DBMS to Specialized Analytics DBMS to
Out-of-the-box data warehouse appliances to Hadoop/Spark-based data warehouses to ‘cloudified’, to finally
built-for-the cloud data warehouses.
• A ’cloudified’ data warehouse is a cloud-based representation of a traditional data warehouse (such as
Amazon Redshift and Microsoft SQL Data warehouse) while a data warehouse ‘built for the cloud’ (such as
Google BigQuery and Snowflake) is a data warehouse designed from the ground up to run natively on the
cloud and fully deliver on the elasticity of the cloud.
• A key factor driving the evolution of data warehousing is the cloud. Why?
• In this modern era of cloud, social networks and mobile, the ‘original’ definition of a data warehouse might
need an update! How about additional attributes such as elastic, scalable (All data, All users), agile, shareable,
conductive to exploratory analytics. What do you think?
5
© 2019 Cervello, an A.T. Kearney company
Aspect Traditional Data warehouse Modern Data Warehouse
Offering mode Product Service
Deployment mode On-premise, cloudified Built for the Cloud
Data age Historical Historical, Live
Data sources Traditional such as ERP, CRM, … Additional such as social media, sensor data, …
Storage & compute Tightly coupled Decoupled
Data processing Batch Batch, Micro-batch, Streaming
Data format Structured data Structured, Semi-Structured & Unstructured data
Data coverage Only your most critical data Possibly all of your available data
Data transformation ETL ETL, ELT
Usage Primarily BI BI, Advanced Analytics, Service
Cost Model Pay upfront Pay-as-you-go
Usability More complex Easier
Management Burdens on the DBA Fully managed
Time to market Usually longer Usually shorter
Provisioning Usual delays for infrastructure Almost instant
Time to insight Usually Longer Usually Shorter
Agility Schema-on-write Schema-on-read, Schema-on-write
Flexibility Static resources On-demand resources
1. Key Terms: Modern
6
© 2019 Cervello, an A.T. Kearney company
2. Traditional Data Warehouses: Key Challenges
1. Scalability: growth in data volume, number of users and applications
2. Concurrency: as number of users increase, they can not operate simultaneously
3. Performance: slow running queries, …
4. Resilience: data backup/retention and node failure protection
5. Complexity: from initial implementation to ongoing maintenance
6. Lack of native support for semi-structured data: JSON and XML formats are popular for
data exchange. In traditional data warehouses, data needs to be transformed first and
schema needs to be defined before loading
7. High maintenance overhead in the form of constant indexing, tuning, sorting
8. Handling workload fluctuation: sizing servers for workload peaks and valleys
9. Waste of resources: expensive computing resources are wasted during off peak usage
10. Lack of support for processing streaming data
11. Upfront costs: hardware, software licenses, staffing, …
12. Project delays due to provisioning infrastructure
Sure, you are familiar with real world scenarios such as End of Month Reporting, Intensive Load
Process, Demanding Executive Dashboard Users, …
7
© 2019 Cervello, an A.T. Kearney company
2. Modern Data Warehouses: How they overcome above challenges?
Major modern data warehouses are cloud based. Most of them do overcome the key challenges
of the traditional data warehouses. Unfortunately, overcoming these challenges is happening at
different degrees and it is not always a simple check of a yes or no!!
1. Scalability: Scale up, down, or off quickly without delay. Yes for Snowflake. Low for Redshift,
Yes for Azure Data Warehouse, Yes for Big Query ( but with limits)
2. Concurrency: Lots of users can operate simultaneously
3. Performance: processing bottlenecks and delays, slow running queries, …
4. Resilience: data backup/retention and node failure protection
5. Complexity: Cloud data warehouses are easier to use than on-premise data warehouses
6. Lack of native support of semi-structured data: Although cloud data warehouse do support
semi-structured data such as JSON. This support varies from one data warehouse to
another.
oConcurrent throughput for JSON: Yes for Snowflake, No for Redshift, No for Azure Data
Warehouse, Moderate for Big Query.
oHigh performance for JSON scans: Yes for Snowflake, No for Redshift unless you use the
additional Amazon Spectrum tool, No for Azure Data Warehouse, Moderate for Big Query.
8
© 2019 Cervello, an A.T. Kearney company
2. Modern Data Warehouses
7. High maintenance overhead is not a major issue with managed infrastructure for cloud data
warehouses, but the level of maintenance varies from one data warehouse to another. This
fluctuates to creating indices and distribution keys to near zero management
8. Handling workload fluctuation: With elasticity, cloud data warehouses adapt to workload peaks
and valleys
9. Waste of resources: No more wasted resources during off peak usage! Auto suspend, Auto
resume?
10. Lack of support for streaming data: For example, loading and processing is possible with
Snowpipe, a serverless compute service from Snowflake data warehouse
11. Upfront costs: No upfront costs as the model of cloud data warehouse is pay as you go
12. Project Delays due to provisioning infrastructure: Not Applicable for cloud data warehouse.
With instant provisioning, projects don’t need to wait for infrastructure
9
© 2019 Cervello, an A.T. Kearney company
3. How to select a Cloud Data Warehouse?
• You’ll come across some marketing fluff when you are researching and selecting cloud data
warehouses. Example: One cloud data warehouse claims to be 100X faster than the other.
Another ‘modern’ one claims 100,000% cost savings compared to a legacy ‘one’!
• Often cloud data warehouses are compared against a handful set of criteria only and sometimes
even a single criteria: performance, using a workload derived from a so called an industry
standard benchmark.
• A fit for purpose comparison across key categories and evaluation criteria with a focus on your
own use cases would be more practical. Example of a category: Data & Service Availability.
Related criteria: Failure recovery, Disaster recovery, Data protection, Service monitoring &
altering.
• At Cervello, we came with a 4-Step Tool Evaluation Framework that integrates input from
analysts, vendors, our own perspective and the client current requirements and future state.
• Be aware that the most popular cloud data warehouse might not be the best fit for your
organization!
• Selecting a cloud data warehouse for your organization is all about figuring out what is the best
fit for your business, your budget and your employees.
10
© 2019 Cervello, an A.T. Kearney company
Cervello Tool Evaluation Framework
11
Prioritized and weighted selection
criterion based on the desired future
state
Implementation and in-
depth expertise
Vendor ecosystem
perspective
Analyst reports
and research
CLIENT70%
CERVELLO15%
ECOSYSTEM10 %
MARKET5%
Client prioritizes and weights
key criterion and scores the
vendors based on the
criterion
Cervello provides our
perspective based on
experience and intelligence of
the vendors in the market place
Further refine the list based on
vendor responses to key criterion
as well as information from the
vendor ecosystem
Leverage Gartner, Forrester, and
other market analysts to establish a
baseline list of vendors to evaluate
weightingfactorfordrivingconsensusonadecision
START
FINISH
© 2019 Cervello, an A.T. Kearney company
Service Architecture Database
Mono-Cloud or Multi-Cloud Separation of Compute and Storage Support of ANSI SQL
PaaS or SaaS? Elasticity Support of Semi-structured data
‘Cloudified’ or ‘built for the
cloud’
Concurrency Support of multiple file format
Integration with cloud provider
services & tools
Performance Support of diverse data types
Cost Model & Transparency Scalability Handling of variable schema
Usability Support for parallel data upload Support of window functions
Management Overhead Support for streaming data Support of stored procedures
High Availability Security & Regulations Query Optimization
Service Maintenance Built-In Optimization Metadata & Statistics Maintenance
Configuration Pause/Resume mechanism Tuning Where Clauses
Service Limits Data Ingestion Tuning Joins
Initialization Time Support of mixed workloads Support of advanced analytics functions
Service updates without
disruption
Integration with tools such us dataflow ones, BI, ML, … Support for materialized views
Service Monitoring & Alerting Continuous Data protection Mechanisms for extensibility
Some Evaluation Criteria of a Cloud Data Warehouse
12
© 2019 Cervello, an A.T. Kearney company
Example of a Modern Data Warehouse: Snowflake Data Warehouse
• Snowflake is based on three key innovations:
1. Unique architecture: a unique architecture designed for the cloud and able to provide
complete elasticity for all your concurrent users and applications
2. Database engine that natively handles all your data both semi structured and structured
data without sacrificing performance nor flexibility
3. Technology that eliminates the need for manual data warehouse management and tuning.
No indexing, tuning, partitioning or vacuuming after loading data => effortless management
• Snowflake architecture consists of three layers, each one is physically decoupled from the
other layer and scales independently:
1. Data Storage layer uses cloud storage to store all data loaded into snowflake in a scalable
and inexpensive way
2. Compute layer comprises of virtual warehouses compute resources that execute data
processing tasks required for queries. The virtual warehouses have access to all of the
data in the storage layer
3. Cloud Services layer coordinates the entire system managing security, optimization and
metadata
13
© 2019 Cervello, an A.T. Kearney company
Snowflake’s multi-cluster, shared data architecture
14
© 2019 Cervello, an A.T. Kearney company 15
• Data Sources/Data Providers: On-premise or Cloud, Structured or Semi-Structured Data
• Dataflow Tools: Streaming Data Platforms, ETL and ELT platforms. E: Extract, L: Load, T: Transform
• Data Storage: S3/Azure Storage (missing in the architecture diagram!)
• Analytics Services: BI, UI, Data Science, …
Snowflake reference architecture
© 2019 Cervello, an A.T. Kearney company
Trying Snowflake data warehouse
• A couple unique features:
• Multi-cloud: Snowflake is the only cloud data warehouse that is offered on Amazon AWS, Microsoft Azure
and soon on Google Cloud Platform.
• Zero copy cloning: A technique used to quickly replicate a database, without any physical data copy, to
build a fully populated test environment. This can ease the burden of DEVOPs, as terabytes of data can be
cloned within seconds with subsequent inserts and updates allowed on the new data-set.
• Time Travel: Time Travel allows you to track the change of data over time. This feature is available to all
accounts and enabled by default to all. It allows you to access the historical data of a Table. One can find
out how the table looked at any point in time within the last 90 days.
• Fail-Safe: Fail-Safe ensures historical data is protected in event of disasters such as disk failures or any
other hardware failures. Snowflake provides 7 days of Fail-Safe protection of data which can be recovered
only by Snowflake in event of a disaster.
• Data Sharing: Provides access to both compute and data resources to external partners or subsidiaries on
a read-only basis. This avoids the need to build multiple Extract Transform and Load (ETL) pipelines to
external users, and avoids the need for Change Data Capture (CDC) routines when the warehouse data is
updated, as users always view the latest data.
• Try it yourself: most cloud data warehouses offer a free trial. Example:
• Snowflake Free trial! https://bit.ly/2WMVKp3
• You can build a short term POC in either AWS or Azure while using US $400 credit for 30 days for both
compute and storage. Snowflake on Google Cloud Platform (GCP) is being offered soon.
16
© 2019 Cervello, an A.T. Kearney company
Part II – Getting Value
17
© 2019 Cervello, an A.T. Kearney company
1. Data Era: Challenges + Opportunities
© 2019 Cervello, an A.T. Kearney company
Great Analogy
19
© 2019 Cervello, an A.T. Kearney company
2. Next Generation of Business Use Cases
§ Data as a Service
§ Self-Service Analytics
§ Advanced Analytics Lab
§ Real-Time Insight
§ Single View of Customer
20
Data Consumers
§ Partners, Suppliers, and Vendors
§ Executives, Managers, and Analysts
§ Data Scientists
§ Embedded Applications
§ Many Business Users
New Use Cases
© 2019 Cervello, an A.T. Kearney company
Medical Device Client: Data + Analytics Overview
21
© 2019 Cervello, an A.T. Kearney company
Medical Device Client: Data + Analytics Benefits
22
Cost
• $500k IT expense reduction
from cloud and Snowflake
migration and consolidated BI
toolset
• Continuous process
simplification
• $1M annual savings from freight
consolidation analytics
• 500 annual hours shifted to
higher value activities because
of end-to-end automating
reporting (e.g., fill rates, open
orders)
Revenue
• $800K in annual revenue
recovery through contract
enforcement analytics
• Improved patient + physician
experience (in software product)
with embedded reporting
Quality
• Data validation checks built into
daily processing
Future
• Smart medical devices (IoT)
© 2019 Cervello, an A.T. Kearney company
Technical + business
approach
• Business
engagement
• Best
practices/advisory
• System integration
Data asset
• Think Big (but
organized)
• Internal + external
• Structured/
unstructured,
Governed/loosely
governed
3. How To Drive Value
23
Modern technology
stack
• Flexible and modular
• Scalable on demand
• Machine Learning +
Artificial Intelligence
Process + organization
• Supply (Data Engineers) +
consumption
(Scientists/Analysts)
• Automated insights =
decision-making
opportunity
• Organizational opportunity
© 2019 Cervello, an A.T. Kearney company
Cervello Profile
We are a professional services firm focused on helping organizations win with data. We have a global presence with an ability to service customers across
industries. We are organized around three practices that are under pinned by our expertise in modern data architectures.
24
PERFORMANCE
MANAGEMENT
SUPPLIER + CUSTOMER
RELATIONSHIPS
DATA MONETIZATION +
PRODUCTS
DATA MANAGEMENT
& MACHINE LEARNING
We embed data management
processes and use machine
learning to improve data quality
Using leading cloud technology
platforms we develop and
design analytics-ready
connected data solutions
MODERN DATA
ARCHITECTURE & PLATFORMS
ANALYTICS & DATA SCIENCE
MODELING & PLANNING
MOBILE EXPERIENCE
We provide different methods
to consume data to facilitate
insights and actions inside and
outside the organization
BUSINESS DRIVEN
INSIGHTS & ACTIONS
HOWWEDELIVER
WHATWEDELIVER
STRATEGY
BUILD
SERVICES
RUN & COE
SERVICES
Our teams are located in Boston, New York, Dallas, London and Bangalore
Boston | New York | Dallas | London
Get in touch with contributors:
Thank You!
Learn more about Cervello at
mycervello.com
© 2019 Cervello, an A.T. Kearney company
Slim Baltagi
sbaltagi@gmail.com
https://www.linkedin.com/in/slimbaltagi/
@SlimBaltagi
25
Jim Leavitt
jleavitt@mycervello.com
https://www.linkedin.com/in/jimleavitt/

More Related Content

What's hot

(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...Hiram Fleitas León
 
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQLDataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQLDataStax
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseCloudera, Inc.
 
Data Warehouse in Cloud
Data Warehouse in CloudData Warehouse in Cloud
Data Warehouse in CloudPawan Bhargava
 
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth   predator appliances that chew up big dataPiranha vs. mammoth   predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big dataJack (Yaakov) Bezalel
 
BarbaraZigmanResume 2016
BarbaraZigmanResume 2016BarbaraZigmanResume 2016
BarbaraZigmanResume 2016bzigman
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lakepunedevscom
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopDavid Yahalom
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraCloudera, Inc.
 
Microsoft Power BI: AI Powered Analytics
Microsoft Power BI: AI Powered AnalyticsMicrosoft Power BI: AI Powered Analytics
Microsoft Power BI: AI Powered AnalyticsJuan Alvarado
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]shuwutong
 

What's hot (20)

SQL Server Disaster Recovery Implementation
SQL Server Disaster Recovery ImplementationSQL Server Disaster Recovery Implementation
SQL Server Disaster Recovery Implementation
 
Data lake
Data lakeData lake
Data lake
 
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...
(BI Advanced) Hiram Fleitas - SQL Server Machine Learning Predict Sentiment O...
 
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQLDataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
 
Disaster Recovery Site Implementation with MySQL
Disaster Recovery Site Implementation with MySQLDisaster Recovery Site Implementation with MySQL
Disaster Recovery Site Implementation with MySQL
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
 
Data Warehouse in Cloud
Data Warehouse in CloudData Warehouse in Cloud
Data Warehouse in Cloud
 
Piranha vs. mammoth predator appliances that chew up big data
Piranha vs. mammoth   predator appliances that chew up big dataPiranha vs. mammoth   predator appliances that chew up big data
Piranha vs. mammoth predator appliances that chew up big data
 
BarbaraZigmanResume 2016
BarbaraZigmanResume 2016BarbaraZigmanResume 2016
BarbaraZigmanResume 2016
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 
Data Federation
Data FederationData Federation
Data Federation
 
2022 02 Integration Bootcamp
2022 02 Integration Bootcamp2022 02 Integration Bootcamp
2022 02 Integration Bootcamp
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Microsoft Power BI: AI Powered Analytics
Microsoft Power BI: AI Powered AnalyticsMicrosoft Power BI: AI Powered Analytics
Microsoft Power BI: AI Powered Analytics
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
 

Similar to How to select a modern data warehouse and get the most out of it?

The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationDATAVERSITY
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
 
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCOCloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCOStorage Switzerland
 
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower CostHow Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower CostDana Gardner
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | QuboleVasu S
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User InformationDenodo
 
E newsletter promise_&_challenges_of_cloud storage-2
E newsletter promise_&_challenges_of_cloud storage-2E newsletter promise_&_challenges_of_cloud storage-2
E newsletter promise_&_challenges_of_cloud storage-2Anil Vasudeva
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data CentersGina Buck
 
oracle-adw-melts snowflake-report.pdf
oracle-adw-melts snowflake-report.pdforacle-adw-melts snowflake-report.pdf
oracle-adw-melts snowflake-report.pdfssuserf8f9b2
 
The Storage Side of Private Clouds
The Storage Side of Private CloudsThe Storage Side of Private Clouds
The Storage Side of Private CloudsDataCore Software
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
 
Does it only have to be ML + AI?
Does it only have to be ML + AI?Does it only have to be ML + AI?
Does it only have to be ML + AI?Harald Erb
 

Similar to How to select a modern data warehouse and get the most out of it? (20)

The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCOCloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
 
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower CostHow Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
 
Cloud Storage Benefits
Cloud Storage BenefitsCloud Storage Benefits
Cloud Storage Benefits
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User Information
 
E newsletter promise_&_challenges_of_cloud storage-2
E newsletter promise_&_challenges_of_cloud storage-2E newsletter promise_&_challenges_of_cloud storage-2
E newsletter promise_&_challenges_of_cloud storage-2
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
The new EDW
The new EDWThe new EDW
The new EDW
 
Data Analytics.pptx
Data Analytics.pptxData Analytics.pptx
Data Analytics.pptx
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data Centers
 
oracle-adw-melts snowflake-report.pdf
oracle-adw-melts snowflake-report.pdforacle-adw-melts snowflake-report.pdf
oracle-adw-melts snowflake-report.pdf
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
Cloud Storage for all
Cloud Storage for allCloud Storage for all
Cloud Storage for all
 
The Storage Side of Private Clouds
The Storage Side of Private CloudsThe Storage Side of Private Clouds
The Storage Side of Private Clouds
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
Does it only have to be ML + AI?
Does it only have to be ML + AI?Does it only have to be ML + AI?
Does it only have to be ML + AI?
 

More from Slim Baltagi

Modern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesModern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesSlim Baltagi
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsSlim Baltagi
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeSlim Baltagi
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitSlim Baltagi
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming AnalyticsSlim Baltagi
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Apache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiApache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiSlim Baltagi
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiSlim Baltagi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkSlim Baltagi
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuSlim Baltagi
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Slim Baltagi
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 

More from Slim Baltagi (20)

Modern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesModern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetes
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Apache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiApache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim Baltagi
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 

Recently uploaded

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 

Recently uploaded (20)

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 

How to select a modern data warehouse and get the most out of it?

  • 1. mycervello.com How to select a modern cloud data warehouse and get the most out of it 18th June 2019 Slim Baltagi Director, Big Data & ML NYC Advanced Analytics Meetup Jim Leavitt Vice President co-CEO & Founder of Cervello
  • 2. © 2019 Cervello, an A.T. Kearney company Agenda PART I (Slim Baltagi – 35 minutes) 1. Key Terms 2. Traditional Data Warehouses vs. Modern Data Warehouses 3. How to select a cloud data warehouse? PART II (Jim Leavitt – 20 minutes) 1. Data era 2. Business use cases 3. How to get/make the most of a modern data warehouse? PART III ( Discussion with attendees – 30 minutes) 2
  • 3. © 2019 Cervello, an A.T. Kearney company PART I – MODERN DATA WAREHOUSE 3
  • 4. © 2019 Cervello, an A.T. Kearney company 1. Key Terms: Cloud • What is exactly the cloud? According to Gartner, ‘A style of computing in which scalable and elastic IT-enabled capabilities are delivered as a service using Internet technologies’ • What are the models within the cloud and how do they apply to data warehousing? 4 Hardware: Datacenter Software: Data warehouse Management: Optimizing, tuning, maintenance Kitchen, Cook/Order, Serve, Eat, Clean On-premises You You You At home Infrastructure IaaS Example: EC2 Vendor You You At a vacation home Platform PaaS Example: Amazon Redshift, EMR Vendor Vendor You At a fast food restaurant Software SaaS Example: Snowflake Vendor Vendor Vendor At a full service restaurant
  • 5. © 2019 Cervello, an A.T. Kearney company 1. Key Terms: Data Warehouse • What is a data warehouse? ‘A subject-oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision-making process.’ Source: Building the data warehouse, a book by Bill Inmon, who is considered to be the father of data warehousing • Since the beginning, data warehousing was about the business making better and quicker decisions • Over the last 30 years, data warehouses evolved from traditional DBMS to Specialized Analytics DBMS to Out-of-the-box data warehouse appliances to Hadoop/Spark-based data warehouses to ‘cloudified’, to finally built-for-the cloud data warehouses. • A ’cloudified’ data warehouse is a cloud-based representation of a traditional data warehouse (such as Amazon Redshift and Microsoft SQL Data warehouse) while a data warehouse ‘built for the cloud’ (such as Google BigQuery and Snowflake) is a data warehouse designed from the ground up to run natively on the cloud and fully deliver on the elasticity of the cloud. • A key factor driving the evolution of data warehousing is the cloud. Why? • In this modern era of cloud, social networks and mobile, the ‘original’ definition of a data warehouse might need an update! How about additional attributes such as elastic, scalable (All data, All users), agile, shareable, conductive to exploratory analytics. What do you think? 5
  • 6. © 2019 Cervello, an A.T. Kearney company Aspect Traditional Data warehouse Modern Data Warehouse Offering mode Product Service Deployment mode On-premise, cloudified Built for the Cloud Data age Historical Historical, Live Data sources Traditional such as ERP, CRM, … Additional such as social media, sensor data, … Storage & compute Tightly coupled Decoupled Data processing Batch Batch, Micro-batch, Streaming Data format Structured data Structured, Semi-Structured & Unstructured data Data coverage Only your most critical data Possibly all of your available data Data transformation ETL ETL, ELT Usage Primarily BI BI, Advanced Analytics, Service Cost Model Pay upfront Pay-as-you-go Usability More complex Easier Management Burdens on the DBA Fully managed Time to market Usually longer Usually shorter Provisioning Usual delays for infrastructure Almost instant Time to insight Usually Longer Usually Shorter Agility Schema-on-write Schema-on-read, Schema-on-write Flexibility Static resources On-demand resources 1. Key Terms: Modern 6
  • 7. © 2019 Cervello, an A.T. Kearney company 2. Traditional Data Warehouses: Key Challenges 1. Scalability: growth in data volume, number of users and applications 2. Concurrency: as number of users increase, they can not operate simultaneously 3. Performance: slow running queries, … 4. Resilience: data backup/retention and node failure protection 5. Complexity: from initial implementation to ongoing maintenance 6. Lack of native support for semi-structured data: JSON and XML formats are popular for data exchange. In traditional data warehouses, data needs to be transformed first and schema needs to be defined before loading 7. High maintenance overhead in the form of constant indexing, tuning, sorting 8. Handling workload fluctuation: sizing servers for workload peaks and valleys 9. Waste of resources: expensive computing resources are wasted during off peak usage 10. Lack of support for processing streaming data 11. Upfront costs: hardware, software licenses, staffing, … 12. Project delays due to provisioning infrastructure Sure, you are familiar with real world scenarios such as End of Month Reporting, Intensive Load Process, Demanding Executive Dashboard Users, … 7
  • 8. © 2019 Cervello, an A.T. Kearney company 2. Modern Data Warehouses: How they overcome above challenges? Major modern data warehouses are cloud based. Most of them do overcome the key challenges of the traditional data warehouses. Unfortunately, overcoming these challenges is happening at different degrees and it is not always a simple check of a yes or no!! 1. Scalability: Scale up, down, or off quickly without delay. Yes for Snowflake. Low for Redshift, Yes for Azure Data Warehouse, Yes for Big Query ( but with limits) 2. Concurrency: Lots of users can operate simultaneously 3. Performance: processing bottlenecks and delays, slow running queries, … 4. Resilience: data backup/retention and node failure protection 5. Complexity: Cloud data warehouses are easier to use than on-premise data warehouses 6. Lack of native support of semi-structured data: Although cloud data warehouse do support semi-structured data such as JSON. This support varies from one data warehouse to another. oConcurrent throughput for JSON: Yes for Snowflake, No for Redshift, No for Azure Data Warehouse, Moderate for Big Query. oHigh performance for JSON scans: Yes for Snowflake, No for Redshift unless you use the additional Amazon Spectrum tool, No for Azure Data Warehouse, Moderate for Big Query. 8
  • 9. © 2019 Cervello, an A.T. Kearney company 2. Modern Data Warehouses 7. High maintenance overhead is not a major issue with managed infrastructure for cloud data warehouses, but the level of maintenance varies from one data warehouse to another. This fluctuates to creating indices and distribution keys to near zero management 8. Handling workload fluctuation: With elasticity, cloud data warehouses adapt to workload peaks and valleys 9. Waste of resources: No more wasted resources during off peak usage! Auto suspend, Auto resume? 10. Lack of support for streaming data: For example, loading and processing is possible with Snowpipe, a serverless compute service from Snowflake data warehouse 11. Upfront costs: No upfront costs as the model of cloud data warehouse is pay as you go 12. Project Delays due to provisioning infrastructure: Not Applicable for cloud data warehouse. With instant provisioning, projects don’t need to wait for infrastructure 9
  • 10. © 2019 Cervello, an A.T. Kearney company 3. How to select a Cloud Data Warehouse? • You’ll come across some marketing fluff when you are researching and selecting cloud data warehouses. Example: One cloud data warehouse claims to be 100X faster than the other. Another ‘modern’ one claims 100,000% cost savings compared to a legacy ‘one’! • Often cloud data warehouses are compared against a handful set of criteria only and sometimes even a single criteria: performance, using a workload derived from a so called an industry standard benchmark. • A fit for purpose comparison across key categories and evaluation criteria with a focus on your own use cases would be more practical. Example of a category: Data & Service Availability. Related criteria: Failure recovery, Disaster recovery, Data protection, Service monitoring & altering. • At Cervello, we came with a 4-Step Tool Evaluation Framework that integrates input from analysts, vendors, our own perspective and the client current requirements and future state. • Be aware that the most popular cloud data warehouse might not be the best fit for your organization! • Selecting a cloud data warehouse for your organization is all about figuring out what is the best fit for your business, your budget and your employees. 10
  • 11. © 2019 Cervello, an A.T. Kearney company Cervello Tool Evaluation Framework 11 Prioritized and weighted selection criterion based on the desired future state Implementation and in- depth expertise Vendor ecosystem perspective Analyst reports and research CLIENT70% CERVELLO15% ECOSYSTEM10 % MARKET5% Client prioritizes and weights key criterion and scores the vendors based on the criterion Cervello provides our perspective based on experience and intelligence of the vendors in the market place Further refine the list based on vendor responses to key criterion as well as information from the vendor ecosystem Leverage Gartner, Forrester, and other market analysts to establish a baseline list of vendors to evaluate weightingfactorfordrivingconsensusonadecision START FINISH
  • 12. © 2019 Cervello, an A.T. Kearney company Service Architecture Database Mono-Cloud or Multi-Cloud Separation of Compute and Storage Support of ANSI SQL PaaS or SaaS? Elasticity Support of Semi-structured data ‘Cloudified’ or ‘built for the cloud’ Concurrency Support of multiple file format Integration with cloud provider services & tools Performance Support of diverse data types Cost Model & Transparency Scalability Handling of variable schema Usability Support for parallel data upload Support of window functions Management Overhead Support for streaming data Support of stored procedures High Availability Security & Regulations Query Optimization Service Maintenance Built-In Optimization Metadata & Statistics Maintenance Configuration Pause/Resume mechanism Tuning Where Clauses Service Limits Data Ingestion Tuning Joins Initialization Time Support of mixed workloads Support of advanced analytics functions Service updates without disruption Integration with tools such us dataflow ones, BI, ML, … Support for materialized views Service Monitoring & Alerting Continuous Data protection Mechanisms for extensibility Some Evaluation Criteria of a Cloud Data Warehouse 12
  • 13. © 2019 Cervello, an A.T. Kearney company Example of a Modern Data Warehouse: Snowflake Data Warehouse • Snowflake is based on three key innovations: 1. Unique architecture: a unique architecture designed for the cloud and able to provide complete elasticity for all your concurrent users and applications 2. Database engine that natively handles all your data both semi structured and structured data without sacrificing performance nor flexibility 3. Technology that eliminates the need for manual data warehouse management and tuning. No indexing, tuning, partitioning or vacuuming after loading data => effortless management • Snowflake architecture consists of three layers, each one is physically decoupled from the other layer and scales independently: 1. Data Storage layer uses cloud storage to store all data loaded into snowflake in a scalable and inexpensive way 2. Compute layer comprises of virtual warehouses compute resources that execute data processing tasks required for queries. The virtual warehouses have access to all of the data in the storage layer 3. Cloud Services layer coordinates the entire system managing security, optimization and metadata 13
  • 14. © 2019 Cervello, an A.T. Kearney company Snowflake’s multi-cluster, shared data architecture 14
  • 15. © 2019 Cervello, an A.T. Kearney company 15 • Data Sources/Data Providers: On-premise or Cloud, Structured or Semi-Structured Data • Dataflow Tools: Streaming Data Platforms, ETL and ELT platforms. E: Extract, L: Load, T: Transform • Data Storage: S3/Azure Storage (missing in the architecture diagram!) • Analytics Services: BI, UI, Data Science, … Snowflake reference architecture
  • 16. © 2019 Cervello, an A.T. Kearney company Trying Snowflake data warehouse • A couple unique features: • Multi-cloud: Snowflake is the only cloud data warehouse that is offered on Amazon AWS, Microsoft Azure and soon on Google Cloud Platform. • Zero copy cloning: A technique used to quickly replicate a database, without any physical data copy, to build a fully populated test environment. This can ease the burden of DEVOPs, as terabytes of data can be cloned within seconds with subsequent inserts and updates allowed on the new data-set. • Time Travel: Time Travel allows you to track the change of data over time. This feature is available to all accounts and enabled by default to all. It allows you to access the historical data of a Table. One can find out how the table looked at any point in time within the last 90 days. • Fail-Safe: Fail-Safe ensures historical data is protected in event of disasters such as disk failures or any other hardware failures. Snowflake provides 7 days of Fail-Safe protection of data which can be recovered only by Snowflake in event of a disaster. • Data Sharing: Provides access to both compute and data resources to external partners or subsidiaries on a read-only basis. This avoids the need to build multiple Extract Transform and Load (ETL) pipelines to external users, and avoids the need for Change Data Capture (CDC) routines when the warehouse data is updated, as users always view the latest data. • Try it yourself: most cloud data warehouses offer a free trial. Example: • Snowflake Free trial! https://bit.ly/2WMVKp3 • You can build a short term POC in either AWS or Azure while using US $400 credit for 30 days for both compute and storage. Snowflake on Google Cloud Platform (GCP) is being offered soon. 16
  • 17. © 2019 Cervello, an A.T. Kearney company Part II – Getting Value 17
  • 18. © 2019 Cervello, an A.T. Kearney company 1. Data Era: Challenges + Opportunities
  • 19. © 2019 Cervello, an A.T. Kearney company Great Analogy 19
  • 20. © 2019 Cervello, an A.T. Kearney company 2. Next Generation of Business Use Cases § Data as a Service § Self-Service Analytics § Advanced Analytics Lab § Real-Time Insight § Single View of Customer 20 Data Consumers § Partners, Suppliers, and Vendors § Executives, Managers, and Analysts § Data Scientists § Embedded Applications § Many Business Users New Use Cases
  • 21. © 2019 Cervello, an A.T. Kearney company Medical Device Client: Data + Analytics Overview 21
  • 22. © 2019 Cervello, an A.T. Kearney company Medical Device Client: Data + Analytics Benefits 22 Cost • $500k IT expense reduction from cloud and Snowflake migration and consolidated BI toolset • Continuous process simplification • $1M annual savings from freight consolidation analytics • 500 annual hours shifted to higher value activities because of end-to-end automating reporting (e.g., fill rates, open orders) Revenue • $800K in annual revenue recovery through contract enforcement analytics • Improved patient + physician experience (in software product) with embedded reporting Quality • Data validation checks built into daily processing Future • Smart medical devices (IoT)
  • 23. © 2019 Cervello, an A.T. Kearney company Technical + business approach • Business engagement • Best practices/advisory • System integration Data asset • Think Big (but organized) • Internal + external • Structured/ unstructured, Governed/loosely governed 3. How To Drive Value 23 Modern technology stack • Flexible and modular • Scalable on demand • Machine Learning + Artificial Intelligence Process + organization • Supply (Data Engineers) + consumption (Scientists/Analysts) • Automated insights = decision-making opportunity • Organizational opportunity
  • 24. © 2019 Cervello, an A.T. Kearney company Cervello Profile We are a professional services firm focused on helping organizations win with data. We have a global presence with an ability to service customers across industries. We are organized around three practices that are under pinned by our expertise in modern data architectures. 24 PERFORMANCE MANAGEMENT SUPPLIER + CUSTOMER RELATIONSHIPS DATA MONETIZATION + PRODUCTS DATA MANAGEMENT & MACHINE LEARNING We embed data management processes and use machine learning to improve data quality Using leading cloud technology platforms we develop and design analytics-ready connected data solutions MODERN DATA ARCHITECTURE & PLATFORMS ANALYTICS & DATA SCIENCE MODELING & PLANNING MOBILE EXPERIENCE We provide different methods to consume data to facilitate insights and actions inside and outside the organization BUSINESS DRIVEN INSIGHTS & ACTIONS HOWWEDELIVER WHATWEDELIVER STRATEGY BUILD SERVICES RUN & COE SERVICES Our teams are located in Boston, New York, Dallas, London and Bangalore
  • 25. Boston | New York | Dallas | London Get in touch with contributors: Thank You! Learn more about Cervello at mycervello.com © 2019 Cervello, an A.T. Kearney company Slim Baltagi sbaltagi@gmail.com https://www.linkedin.com/in/slimbaltagi/ @SlimBaltagi 25 Jim Leavitt jleavitt@mycervello.com https://www.linkedin.com/in/jimleavitt/