Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stl meetup cloudera platform - january 2020

Slides from the January meeting of the St. Louis Big Data IDEA. The subject of the meeting was where is Cloudera in 2020.

  • Be the first to comment

Stl meetup cloudera platform - january 2020

  1. 1. CLOUDERA DATA PLATFORM January, 2020
  2. 2. © 2019 Cloudera, Inc. All rights reserved. 2 DATA MANAGEMENT IS SPREAD ALL OVER Where organizations manage data Source: Harvard Business Review Analytic Services Survey – June 2019 47% On-premises 32% Private cloud 26% Hybrid cloud 26% Multi-cloud 26% Single cloud
  3. 3. © 2019 Cloudera, Inc. All rights reserved. 3 CIO Magazine
  4. 4. © 2019 Cloudera, Inc. All rights reserved. 4 up to 40%Shadow IT as a % of overall IT spend
  5. 5. © 2019 Cloudera, Inc. All rights reserved. 5 CIO Magazine
  6. 6. speed & agility security & control
  7. 7. © 2019 Cloudera, Inc. All rights reserved. 7 DATA TEAMS ARE HIGHLY SPECIALIZED App DevelopersData Engineers Compliance Mgrs.Data Architects BI Analysts Data Scientists Infrastructure Mgrs.
  8. 8. © 2019 Cloudera, Inc. All rights reserved. 8 App DevelopersData Engineers Compliance Mgrs.Data Architects BI Analysts Data Scientists Infrastructure Mgrs. SPECIALIZATION CREATES A DIVERSITY OF NEEDS Continuous availability, custom tooling Capacity guarantees to enable consist SLAs Capacity on demand to support bursty workloads Latest tools and hardware, unpredictable capacity Single-source-of-truth Privacy and verifiable auditReliability, cost, & scale
  9. 9. © 2019 Cloudera, Inc. All rights reserved. 9 “ONE-SIZE-FITS-ALL” PITS THE BUSINESS VS. IT VS
  10. 10. © 2019 Cloudera, Inc. All rights reserved. 10 “SHADOW IT” POINT SOLUTIONS LEAD TO CHAOS ??
  11. 11. © 2019 Cloudera, Inc. All rights reserved. 11 App Developers Data ArchitectsCompliance Mgrs. Infrastructure Mgrs. Centralized Data, Security, Governance and Management Data Engineers BI Analysts Data Scientists CUSTOM ENVIRONMENTS CUSTOM ENVIRONMENTS CUSTOM ENVIRONMENTS CUSTOM ENVIRONMENTS ANSWER: CENTRALIZE CONTROL + CUSTOMIZE ENVIRONMENTS
  12. 12. © 2019 Cloudera, Inc. All rights reserved. 12 A DATA PLATFORM OPTIMIZED FOR THE BEST OF BOTH Cloudera Data Platform SDX App Developers Data ArchitectsCompliance Mgrs. Infrastructure Mgrs. Centralized Data, Security, Governance and Management Data Engineers BI Analysts Data Scientists CUSTOM ENVIRONMENTS CUSTOM ENVIRONMENTS CUSTOM ENVIRONMENTS CUSTOM ENVIRONMENTS
  13. 13. © 2019 Cloudera, Inc. All rights reserved. 13 Cloudera Data Platform
  14. 14. © 2019 Cloudera, Inc. All rights reserved. 14 WHAT DOES ENTERPRISE IT NEED TO “SAY YES” TO THE BUSINESS?
  15. 15. © 2019 Cloudera, Inc. All rights reserved. 15 CLOUD EXPERIENCE ARCHITECTURE & TECHNOLOGY REQUIREMENTS COMPUTE & STORAGE KUBERNETES & CONTAINERS STREAMING & ML/AI
  16. 16. 16Confidential — Restricted CLOUDERA DATA PLATFORM
  17. 17. © 2019 Cloudera, Inc. All rights reserved. 17 3 INITIAL CDP CLOUD SERVICES Empowering Enterprise IT to deliver at the speed of business Data Hub Data Warehouse Machine Learning
  18. 18. © 2019 Cloudera, Inc. All rights reserved. 18 CDP USER INTERFACE
  19. 19. © 2019 Cloudera, Inc. All rights reserved. 19 • On-premises and public cloud • Multi-cloud and multi-function • Simple to use and secure by design • Manual and automated • Open and extensible • For data engineers and data scientists CLOUDERA DATA PLATFORM THE POWER OF “AND”
  20. 20. © 2019 Cloudera, Inc. All rights reserved. 20 NEWS Recent News • Open-source licensing • Streams management • CDP availability on AWS • CDP availability on Azure • CDP Data Center Edition Coming Soon • Workload XM on prem • CDP Data Hub adding Flow and Stream clusters
  21. 21. ARCHITECTURE & ROADMAP
  22. 22. © 2019 Cloudera, Inc. All rights reserved. 22 ONE PLATFORM – TWO FORM FACTORS CDP Public Cloud (platform-as-a-service) CDP On-Prem (installable software)
  23. 23. © 2019 Cloudera, Inc. All rights reserved. 23 CDP PUBLIC CLOUD ARCHITECTURE Management Console Management Console - A single pane of glass to manage one or more environments and the services that run within each environment Environment SDX Data Hub Clusters DW Clusters ML Clusters DataHub Clusters CDW Clusters CML Clusters Environment - A logical encapsulation of a customer network and the the services that run within that network (like an Azure virtual network) Cluster – A distributed computing service that running on VMs (Data Hub) or K8s (the experiences) and has access the shared data lake SDX – The data access control layer that sits on top of the backend object store and provides coherent data security and governance for all the applications running with the environment Data Catalog Workload Manager Replication Manager
  24. 24. © 2019 Cloudera, Inc. All rights reserved. 24 CDP ON-PREM ARCHITECTURE CDP Data Center Storage SDX Traditional Workloads Servers Containers CDP Private Cloud Container Cloud Data Hub CDWCML ... Management Console Workload Manager Data Catalog Replication Manager
  25. 25. © 2019 Cloudera, Inc. All rights reserved. 25 BURST TO CLOUD • Workload Manager identifies burstable workloads • Replication Manager replicates targeted datasets to cloud (data, schema, policies, & lineage) CDH / HDP / CDP Existing Apps Existing Data Existing Hardware Management Console Data Catalog Workload Manager Replication Manager CDP Cloud Environment SDX DW Clusters ML Clusters CDW Clusters CML Clusters DataHub Clusters
  26. 26. © 2019 Cloudera, Inc. All rights reserved. 26 CLOUDERA DATA PLATFORM – UNIQUE CAPABILITIES Cloud Optimized for IT & LoB (hybrid, multi-function, SDX, open, container-based cloud experiences) Cloud Burst (supplement on-prem capacity) Intelligent Replication (data, users, workloads) Best of CDH & HDP (Cloudera Runtime) ENTERPRISE DATA CLOUD
  27. 27. CDP ON PREM
  28. 28. © 2019 Cloudera, Inc. All rights reserved. 28 CDP Private Cloud Replication Manager CDP ON PREM CDP Data Center Storage SDX Traditional Workloads Servers ContainersData Hub CDW CML ... Management Console Workload Manager Data Catalog ... Operations Compute CDP DATA CENTER is the first step to private cloud
  29. 29. © 2019 Cloudera, Inc. All rights reserved. 29 NEW FEATURES IN CDP DATA CENTER New features for CDH 6 customers Ranger 2.0 • Dynamic row filtering & column masking • Attribute-based access control • SparkSQL fine-grained access control Atlas 2.0 • Advanced data discovery • Improved performance and scalability Hive 3 • Hive-on-Tez for better ETL performance • ACID transactions Ozone (Preview) • 10x scalability of HDFS Knox* • Gateway-based SSO Druid* • Low-latency DataMart for real-time and aggregate data Spark on Docker * • Simplified dependency management New features for HDP 3 customers Cloudera Manager • Virtual private clusters • Automated wire encryption setup • Fine-grained RBAC for administrators • Streamlined maintenance workflows Atlas 2.0 • Advanced data lineage • Faceted search Solr 7 • Relevance-based text search over unstructured data (text, pdf, .jpg, ...) Impala • Better fit for Data Mart migration use cases (interactive, BI style queries) Hue • Built-in SQL editor Kudu • Better performance for fast changing / updateable data Better at-rest Encryption • Key Trustee Server, NavEncrypt* * In future release
  30. 30. © 2019 Cloudera, Inc. All rights reserved. 30 CDP DATA CENTER ROADMAP CDP Data Center 7.0 (2H 2019) 1H 2020 • Cloudera Manager 7.0 • Hadoop 3.1 • Spark 2.4 • Hive 3.1 • Impala 3.2 • Oozie 5.1 • Hue 4.5 • Ranger 2.0 • Atlas 2.0 • Solr 7.4 • Tez 0.9 • HBase 2.2 • Phoenix 5.0 • Kudu 1.11 • Sqoop 1.4.7 • Parquet 1.10 • Avro 1.8 • ORC 1.5 • Zookeeper 3.5 • Kafka 2.3 • Key Trustee Server 7 • Ozone (Tech Preview) • Livy • Druid • Ranger KMS • Key HSM • Navigator Encrypt • Zeppelin • Knox • Accumulo
  31. 31. © 2019 Cloudera, Inc. All rights reserved. 31 CDP Private Cloud Data Hub CDW CML Management Console UPGRADING AN EXISTING CLUSTER: OPTION A Step 1: Upgrade an existing cluster to CDP Data Center, thus creating an SDX environment based on existing data Step 2: Install CDP Private Cloud and use the Experiences to build new applications Step 3: Use Workload Manager to intelligently migrate key workloads from the CDP Data Center cluster to the CDP Private Cloud Experiences CDP Data Center (SDX environment) Existing Apps Existing Data Existing Hardware CDH 5 / HDP 2 Existing Apps Existing Data Existing Hardware Upgrade CDH 6 / HDP 3 Existing Apps Existing Data Existing Hardware Upgrade Upgrade
  32. 32. © 2019 Cloudera, Inc. All rights reserved. 32 CDP Data Center (SDX environment) New Data New Hardware No bare metal apps CDH / HDP Existing Apps Existing Data Existing Hardware Intelligent Replication (data, metadata, policies) UPGRADING AN EXISTING CLUSTER: OPTION B Step 1: Install CDP Data Center on new hardware and use Replication Manager to replicate data, metadata, and policies from an existing cluster to create the SDX environment Step 2: Install CDP Private Cloud and use the Experiences to build new applications Step 3: Use Workload Manager to intelligently migrate key workloads from the CDH / HDP cluster to the CDP Private Cloud Experiences Intelligent Replication (workloads) CDP Private Cloud Data Hub CDW CML Management Console
  33. 33. © 2019 Cloudera, Inc. All rights reserved. 33 THANK YOU
  34. 34. © 2019 Cloudera, Inc. All rights reserved. 34 BACKUP
  35. 35. © 2019 Cloudera, Inc. All rights reserved. 35 OUR CUSTOMERS ARE ASKING FOR AN ENTERPRISE DATA CLOUD Hybrid, Multi-Cloud • Move data and applications without rewriting and retraining • Separate data management strategy from infrastructure strategy • Manage all environments from a single pane of glass Multi-Function & Open • Deploy one platform to address current and future workload needs • Connect disparate workload types to develop Edge2AI applications on one platform • Open source and open APIs Secure & Governed • Manage data security and governance centrally • Automate application security at all layers • Reduce time to value with enterprise-grade productivity tools Cloud Experience • Easy to use with self-serve capabilities • Elasticity and agility to meet changing demands of workloads and company • Simple to manage and maintain environments and applications
  36. 36. CDP PLATFORM DEMOS
  37. 37. © 2019 Cloudera, Inc. All rights reserved. 37 1) NOW EACH TEAM CAN CUSTOMIZE THEIR ENVIRONMENT Business users can • Upgrade software on their own schedule • Customize software in isolation • Control performance in isolation • Scale resources dynamically to simplify capacity planning • Pause and resume their environments without losing work • All without losing the ability to collaborate with other teams IT users can • Spin up custom clusters in 30 minutes – Without recreating the data lake – Without reconfiguring access rules – Without reconfiguring users – Without reconfiguring security • Tune cluster internals for advanced use cases
  38. 38. © 2019 Cloudera, Inc. All rights reserved. 38 2) NOW WE CAN RUN COST EFFECTIVELY IN THE CLOUD Business users can • Save money by unbundling infrastructure into thin servers + object storage • Save money by only paying for what they use • All without diminishing the operational support provided by IT IT users can • Automate cluster lifecycle to support ad-hoc and seasonal demand • Troubleshoot cluster internals when things go wrong • Manage global footprint of 100s of clusters without scaling support staff
  39. 39. © 2019 Cloudera, Inc. All rights reserved. 39 3) NOW WE CAN SAFELY ONBOARD 10X MORE USERS Business users can • Access platform via corporate SSO to simplify login process • Access only the data they require for their work • Leverage automated data profilers to detect sensitive data IT users can • Federate authenticated users and groups from the corporate identity provider • Not have to worry about Kerberos, LDAP • Comply with data privacy standards – Deny access by default – Control access at any granularity – Configure once for all CDP services • Comply with regulatory standards even with clusters coming and going • Troubleshoot workloads even with clusters coming and going
  40. 40. © 2019 Cloudera, Inc. All rights reserved. 40 4) NOW WE CAN MIGRATE TO CLOUD WITHOUT STARTING OVER Business users can • Migrate to cloud without retraining and rewriting • Burst targeted workloads to elastic infrastructure without waiting for full migration IT users can • Support deployments across any environment without retraining or rewriting • Manage global deployments from a single console
  41. 41. © 2019 Cloudera, Inc. All rights reserved. 41 HOW TIMES HAVE CHANGED 2008 SCALE 1 JOB TO 1000s OF SERVERS 2019 SCALE 1 PLATFORM TO 1000s OF USERS
  42. 42. © 2019 Cloudera, Inc. All rights reserved. 42 CDP HOME A single login to access the full platform, documentation, and support - all controlled through corporate SSO
  43. 43. © 2019 Cloudera, Inc. All rights reserved. 43 DATA HUB A familiar and highly customizable cluster service optimized for the separation of storage and compute
  44. 44. © 2019 Cloudera, Inc. All rights reserved. 44 DATA WAREHOUSE A data warehousing service optimized for concurrency, caching, and isolation
  45. 45. © 2019 Cloudera, Inc. All rights reserved. 45 A machine learning workspace service to connect teams of data scientists to enterprise data MACHINE LEARNING
  46. 46. © 2019 Cloudera, Inc. All rights reserved. 46 A single pane of glass to manage 100s of clusters all with different lifecycles - across multiple environments MANAGEMENT CONSOLE
  47. 47. © 2019 Cloudera, Inc. All rights reserved. 47 DATA CATALOG A centralized data stewardship tool for searching, organizing, securing, and governing data across environments
  48. 48. © 2019 Cloudera, Inc. All rights reserved. 48 WORKLOAD MANAGER A centralized management tool for analyzing and optimizing workloads within and across environments
  49. 49. © 2019 Cloudera, Inc. All rights reserved. 49 REPLICATION MANAGER A centralized management tool for replicating and migrating data, metadata, and policies between environments
  50. 50. CLOUDERA RUNTIME
  51. 51. © 2019 Cloudera, Inc. All rights reserved. 51 Component CDH 5.16 CDH 6.2 Runtime 7.x Apache Accumulo 1.7.2 1.9.0 [Roadmap] Apache Avro 1.7.6 1.8.2 1.8.2 Apache Flume 1.6.0 1.9.0 [Removed] Apache Hadoop 2.6.0 3.0.0 3.1 Apache HBase 1.2.0 2.1.1 2.2 HBase Indexer 1.5.0 1.5.0 1.5.0 Apache Hive 1.1.0 2.1.1 3.1 Hue 3.9.0 4.3.0 4.3 Apache Impala 2.12.0 3.2.0 3.2 Kite SDK 1.0.0 1.0.0 1.0.0 CDH VS. CLOUDERA RUNTIME (1 of 2)
  52. 52. © 2019 Cloudera, Inc. All rights reserved. 52 Component CDH 5.16 CDH 6.2 Runtime 7.0 Apache Kudu 1.7.0 1.9.0 1.11 Navigator 2.15 6.2 [Integrated into Atlas] Apache Oozie 4.1.0 5.1.0 5.1 Apache Parquet 1.5.0 1.9.0 1.10 Parquet-format 2.1.0 2.3.1 2.3.1 Apache Pig 0.12 0.17.0 [Removed] Apache Sentry 1.5.1 2.1.0 [Replaced by Ranger] Apache Solr 4.10.3 7.4.0 7.4 Apache Spark 1.6.0 2.4.0 2.4 Apache Sqoop 1.4.6 1.4.7 1.4.7 Apache ZooKeeper 3.4.5 3.4.5 3.4.6 CDH VS. CLOUDERA RUNTIME (2 of 2)
  53. 53. © 2019 Cloudera, Inc. All rights reserved. 53 Component HDP 2.6.5 HDP 3.1.4 Runtime 7.x Apache Accumulo 1.7.0 1.7.0 [Roadmap] Apache Atlas 0.8.0 1.1.0 2.0.0 Apache Flume 1.5.2 [Removed] [Removed] Apache Hadoop 2.7.3 3.1.1 3.1 Apache HBase 1.1.2 2.0.2 2.2 Apache Hive 1.2.1 / 2.1.0 3.1.0 3.1 Apache Knox 0.12 1.0.0 1.3 Apache Livy - 0.5.0 0.5 Apache Oozie 4.2.0 4.3.1 5.1 Apache Phoenix 4.7.0 5.0.0 5.0 HDP VS. CLOUDERA RUNTIME (1 of 2)
  54. 54. © 2019 Cloudera, Inc. All rights reserved. 54 Component HDP 2.6.5 HDP 3.1.4 Runtime 7.0 Apache Pig 0.16 0.16 [Removed] Apache Ranger 0.7.0 1.2.0 2.0 Apache Spark 1.6.3 / 2.3.2 2.3 2.4 Apache Sqoop 1.4.6 1.4.7 1.4.7 Apache Storm 1.1.0 1.2.1 [Removed] Apache TEZ 0.7.0 0.9.1 0.9 Apache Zeppelin 0.7.3 0.8.0 0.8 Apache ZooKeeper 3.4.6 3.4.6 3.4.6 HDP VS. CLOUDERA RUNTIME (2 of 2)
  55. 55. THANK YOU

×