Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The New Big Data
Scott Shaw
© 2020 Cloudera, Inc. All rights reserved. 2
DATA MANAGEMENT IS SPREAD ALL OVER
47% 21%24%26%32%
On-premises Single cloudM...
© 2020 Cloudera, Inc. All rights reserved. 3
“Enterprise IT doesn’t operate
at the speed of business.
Your IT group needs ...
© 2020 Cloudera, Inc. All rights reserved. 4
HOW TIMES HAVE CHANGED
2008
SCALE 1 JOB TO
1000s OF SERVERS
2020
SCALE 1 PLAT...
© 2020 Cloudera, Inc. All rights reserved. 5
CLOUDERA - THE ENTERPRISE DATA CLOUD COMPANY
01
Collect
03
Report
05
Predict
...
© 2020 Cloudera, Inc. All rights reserved. 6
BUSINESS USE CASES REQUIRE THE DATA LIFECYCLE
An integrated lifecycle is easi...
© 2020 Cloudera, Inc. All rights reserved. 7
CLOUDERA DATA PLATFORM
COMPONENT ARCHITECTURE
© 2020 Cloudera, Inc. All rights reserved. 9
THE ENTERPRISE DATA CLOUD
COMPONENTS
Traditional Platform Consumption:
• Data...
© 2020 Cloudera, Inc. All rights reserved. 10
KEY CONCEPTS & COMPONENTS
Environment
•1 Template
•1 Region
•1 VPC
•Multiple...
© 2020 Cloudera, Inc. All rights reserved. 11
KEY CONCEPTS & COMPONENTS
Typical user flow
Enterprise IT CDP Control Plane E...
© 2020 Cloudera, Inc. All rights reserved. 12
ENVIRONMENT
What is an environment?
Definition of where CDP creates
resources...
© 2020 Cloudera, Inc. All rights reserved. 13
DATA LAKE
What is a Data Lake?
A common set of Services (SDX)
within an Envi...
© 2020 Cloudera, Inc. All rights reserved. 14
DATA HUB CLUSTERS AND EXPERIENCES
What are the consumption options?
A Data H...
© 2020 Cloudera, Inc. All rights reserved. 15
CONTROL PLANE
What is the Control Plane?
The Control Plane is the common
set...
PRODUCT WALKTHROUGH
HYBRID ARCHITECTURE
© 2020 Cloudera, Inc. All rights reserved. 18
TARGET ARCHITECTURE: THE ENTERPRISE DATA CLOUD
CDP Public Cloud
(platform-as...
© 2020 Cloudera, Inc. All rights reserved. 19
OpenShift 101
Master Nodes
Worker Node
➔ OpenShift → Kubernetes++
➔ K8s → Sy...
THANK YOU
Upcoming SlideShare
Loading in …5
×

The new big data

Slides from Scott Shaw's September 2020 presentation to the St. Louis Big Data IDEA meetup on Big Data in 2020

  • Be the first to comment

The new big data

  1. 1. The New Big Data Scott Shaw
  2. 2. © 2020 Cloudera, Inc. All rights reserved. 2 DATA MANAGEMENT IS SPREAD ALL OVER 47% 21%24%26%32% On-premises Single cloudMulti cloudHybrid cloudPrivate cloud Gartner recently warned that “Data and analytics leaders must prepare for the complexities of multi cloud and intercloud deployments to avoid potential performance issues… unplanned cost overruns and ... difficulties with integration efforts.” HBR June 2019
  3. 3. © 2020 Cloudera, Inc. All rights reserved. 3 “Enterprise IT doesn’t operate at the speed of business. Your IT group needs to perform better than shadow IT.”Shadow IT as a % of overall IT spend CIO Magazine
  4. 4. © 2020 Cloudera, Inc. All rights reserved. 4 HOW TIMES HAVE CHANGED 2008 SCALE 1 JOB TO 1000s OF SERVERS 2020 SCALE 1 PLATFORM TO 1000s OF USERS
  5. 5. © 2020 Cloudera, Inc. All rights reserved. 5 CLOUDERA - THE ENTERPRISE DATA CLOUD COMPANY 01 Collect 03 Report 05 Predict 04 Serve 02 Curate Data Engineering Streaming & Data Flow Data Warehouse Operational Database Machine Learning & AI Security | Governance | Lineage | Management | Automation Manage and secure the data lifecycle in any cloud or datacenter
  6. 6. © 2020 Cloudera, Inc. All rights reserved. 6 BUSINESS USE CASES REQUIRE THE DATA LIFECYCLE An integrated lifecycle is easier to use, manage and secure SUPPLY CHAIN OPTIMIZATION COMPUTER VISION FOR QA PREDICTIVE MAINTENANCE PROCESS MONITORING DASHBOARDS REAL-TIME & TRANSACTIONAL DATA LIFECYCLE USE CASES ENTERPRISE DATA ENTERPRISE DATA CLOUD ENTERPRISE USE CASES CONNECTED PRODUCTS CONNECTED PRODUCTION CONNECTED SUPPLY CHAIN CONNECTED CONSUMER THROUGHPUT OPTIMIZATION SECURITY | GOVERNANCE | LINEAGE | MANAGEMENT | AUTOMATION
  7. 7. © 2020 Cloudera, Inc. All rights reserved. 7 CLOUDERA DATA PLATFORM
  8. 8. COMPONENT ARCHITECTURE
  9. 9. © 2020 Cloudera, Inc. All rights reserved. 9 THE ENTERPRISE DATA CLOUD COMPONENTS Traditional Platform Consumption: • Data Hub Clusters New analytic experiences: • Data Warehouse • Machine Learning • Data Engineering • Operational Database • More to come Control Plane services: • Workload Manager • Replication Manager • Data Catalog • Management Console
  10. 10. © 2020 Cloudera, Inc. All rights reserved. 10 KEY CONCEPTS & COMPONENTS Environment •1 Template •1 Region •1 VPC •Multiple Roles/Buckets Data Lake •SDX: Atlas, Ranger, Knox, IdBroker, CM •Associated with groups/users Data Hub Clusters / Experiences •DH templates •ML Env •DW Database Catalogs/Virtual Compute 1:1 1:N ENVIRONMENTS
  11. 11. © 2020 Cloudera, Inc. All rights reserved. 11 KEY CONCEPTS & COMPONENTS Typical user flow Enterprise IT CDP Control Plane Enterprise Cloud Resources (IAM, Network, VMs, Buckets, etc.) Management Console 1 Step 1 User connects to CDP with their enterprise identity Step 2 They create an environment and data lake for their enterprise 2 Environment Step 3 They create data hub clusters for traditional workloads Data Lake Atlas Ranger Knox IdBroker FreeIPA CM HMS 3 BI Team Cluster ETL Team Cluster 4 Node 1 Node 2 Node 3 Step 4 They create access points for containerized analytic experiences Node 1 Node 2 Node 3 Data Warehouse Experience Machine Learning Experience
  12. 12. © 2020 Cloudera, Inc. All rights reserved. 12 ENVIRONMENT What is an environment? Definition of where CDP creates resources in a customer environment. A long running permanent cluster called a Data Lake gets created here.
  13. 13. © 2020 Cloudera, Inc. All rights reserved. 13 DATA LAKE What is a Data Lake? A common set of Services (SDX) within an Environment that are shared across multiple Clusters/Experiences. These include Services for: • Security • Auditing • Governance • Data Discovery
  14. 14. © 2020 Cloudera, Inc. All rights reserved. 14 DATA HUB CLUSTERS AND EXPERIENCES What are the consumption options? A Data Hub Cluster is a customizable environment that runs like a traditional Hadoop cluster, but is designed to leverage Cloud Storage. An Experience is a container-based compute environment for specific purposes: ML, DW, DE, OD, DF
  15. 15. © 2020 Cloudera, Inc. All rights reserved. 15 CONTROL PLANE What is the Control Plane? The Control Plane is the common set of tools for management, workload analysis, data movement and data discovery across multiple environments
  16. 16. PRODUCT WALKTHROUGH
  17. 17. HYBRID ARCHITECTURE
  18. 18. © 2020 Cloudera, Inc. All rights reserved. 18 TARGET ARCHITECTURE: THE ENTERPRISE DATA CLOUD CDP Public Cloud (platform-as-a-service) Cloudera Runtime Control Plane Data Hub Virtual Private Clusters DW, ML, DE, … Self-Serve Experiences Data Hub Virtual Private Clusters DW, ML, DE, … Self-Serve Experiences CDP On-Prem (installable software) AzureAWS GCP Private Cloud CDP Datacenter
  19. 19. © 2020 Cloudera, Inc. All rights reserved. 19 OpenShift 101 Master Nodes Worker Node ➔ OpenShift → Kubernetes++ ➔ K8s → System to deploy, scale, manage apps ➔ Applications → exposed through services ➔ Service → collection of Pods ➔ Pods → collection of containers ➔ Containers → runtime environment Worker Node Worker Node Container Pod CPU RAM Disk CPU RAM Disk CPU RAM Disk Kubelet Kubelet Kubelet
  20. 20. THANK YOU

×