SlideShare a Scribd company logo
1 of 50
Download to read offline
Cloudera Operational DB
(powered by Apache HBase and
Apache Phoenix)
Beyond the Tyranny of the Schema
December 2019
Timothy Spann
© 2019 Cloudera, Inc. All rights reserved. 2
Welcome to Future of Data - Princeton
@PaasDev
https://www.meetup.com/futureofdata-princeton/
From Big Data to AI to Streaming to Containers to
Cloud to Analytics to Cloud Storage to Fast Data to
Machine Learning to Microservices to ...
© 2019 Cloudera, Inc. All rights reserved. 3
Who Am I? Timothy Spann
Data in Motion Field Engineer
@PaasDev
DZone Zone Leader and Big Data MVB;
Princeton NJ Future of Data Meetup;
ex-Pivotal Field Engineer;
Author of Apache Kafka RefCard
https://github.com/tspannhw
https://www.datainmotion.dev/
© 2019 Cloudera, Inc. All rights reserved. 4
This Meetup Made Possible Thanks To:
Paul Vidal from https://www.meetup.com/futureofdata-philadelphia/ for CDP HBase
Environment and Cloud Magic
Josh Elser and Josiah Goodson for OpDB Slides and HBase Guidance
Milind Pandit from https://www.meetup.com/TechnologySolutionsHub
Mehul Shah from https://www.meetup.com/TechnologySolutionsHub
Vijay Garg from https://pga.fund/
Madhavi from https://www.nuwaysolutions.com/
https://www.slideshare.net/bunkertor/tracking-crime-as-it-occurs-with-apache-phoenix-apache-hbase-and-apache-nifi
© 2019 Cloudera, Inc. All rights reserved. 5
WHAT HAVE Apache HBase 2.0 & Apache Phoenix ENABLED?
• Operationalizing ML / AI to revolutionize
healthcare, public utilities, etc
• Serving real-time content at webscale
• Empowering big data analytics for operational
and offline uses
• Acting as a resilient store of record
CLOUDERA OPERATIONAL DB (powered by Apache HBase & Apache Phoenix)
Operational DB is DBMS used to manage dynamic and changing data in real time and enable
applications that drive the business
6© Cloudera, Inc. All rights reserved.
APACHE HBASE FAST FACTS
Largest Database
14 Petabytes
Best Known App
Siri
Fastest
Ingestion
20M Events/s
Users
750+
7© Cloudera, Inc. All rights reserved.
HBASE ARCHITECTURE
HMaster
Orchestration layer
ZooKeeper
Region
Server
DataNode
Data plane
ColFam ColFam
Col Col Col Col
R Val Val Val Val
R Val Val Val Val
ColFam ColFam
Col Col Col Col
R Val Val Val Val
R Val Val Val Val
Region
ColFam ColFam
Col Col Col Col
R Val Val Val Val
R Val Val Val Val
• Regions are table
segments
• Read and write path
are in the data plane
• DDL operations
• Region assignment
• Recovery orchestration
• Heartbeat
• Server
state
• Services client reads &
writes
• Maximizes in-memory
operations for low-latency
operations
• Provides data resiliency
8© Cloudera, Inc. All rights reserved.
SCHEMA-LESS DATA MODEL
• Column families defined at time of table creation
• Columns created as required (at time of data insertion)
• No limits to number of columns
• Tables can grow in two dimensions – columns and rows
• Compression & encoding applied at column family level
• No declaration of data types (i.e., a column can contain multiple data types)
Column Family Column Family
Column Column Column Column
RowKey Cell Cell Cell Cell
RowKey Cell Cell Cell Cell
© 2019 Cloudera, Inc. All rights reserved. 9
HIGHLY AVAILABLE OUT OF THE BOX (<1 MINUTE RECOVERY)
Region Server 1
HDFS (3 copies of data)
Region Server 2 Region Server 3
What happens when a region crashes
1. Region server crashes
2. Writes and reads time-out for regions
in impacted region server
3. Regions are redistributed to other
region servers
4. WAL is replayed in other region
servers
5. Reads & writes are able to continue to
impacted regions
Typical recovery period < 1 minute (for impacted regions only)
No manual intervention
10© Cloudera, Inc. All rights reserved.
SECURITY MODEL
Authentication • Kerberos
Role Based
Access control
• Permissions & Scope enable flexible role based access control
• Scope: Global, Namespace, Table, Column Family, Cell
DB security &
encryption
• Transparent encryption of data on the wire and data on disk
(HFile for data at rest, secure WAL for data in motion within
HBase)
• Logging & auditability: configurable & fixed event
11
stmt.executeUpdate(“UPSERT INTO TABLE_NAME
VALUES(rowKey, GREETINGS) ");
stmt.execute();
Phoenix
What Phoenix adds to HBase
Pros:
Cons:
• Maximally flexible & customizable
• SQL only for data remediation
• Unfamiliar to SQL developers
• Requires non-traditional data architecture
• Programmatic ANSI SQL support
• RDBMS-like data architecture
• Auto-applies performance best practices
• Can co-exist with HBase apps
• Reduced flexibility vis-à-vis vanilla HBase
• Phoenix specific data format means you
can’t use HBase APIs directly
Put put = new Put(Bytes.toBytes(rowKey));
put.addColumn(COLUMN_FAMILY_NAME, COLUMN_NAME,
Bytes.toBytes(GREETINGS));
table.put(put);
HBase
RDBMS-like, scale-out databaseFlexible, scale-out, no-sql database
12
Key Phoenix capabilities
• ANSI SQL including joins
• Flexible Schemas / Dynamic Columns
• Secondary Indexes
• Aggregation pushdowns
• Cross-language client support
• Query logging
• Security through Ranger (supports RBAC, ABAC,
etc)
• JDBC/ODBC connectivity for operational reporting
• Plugs in to any JDBC/ODBC-compatible BI tool
to enable self-service analytics and insight
Phoenix
Applications
13
ANSI SQL 92 Support
Supported today Roadmap
Standard SQL Data Types UNION
SELECT, UPSERT, DELETE Windowing Functions
JOINs: Inner and Outer Transactions
Subqueries Cross Joins
Secondary Indexes Authorization
GROUP BY, ORDER BY, HAVING Replication Management
AVG, COUNT, MIN, MAX, SUM Column Constraints and Defaults
Primary Keys, Constraints UDFs
CASE, COALESCE
VIEWs
Flexible Schema
UNION ALL
© 2019 Cloudera, Inc. All rights reserved. 14
CLOUD OPTIMIZED : HBASE backed by both HDFS and S3
Cloudera provides HBase backed by Amazon’s S3
● Cloudera Data Platform (CDP) provides an out-of-the-box solution that allows Apache HBase
deployments to use Amazon Simple Storage Service (S3) as its main persistence layer for saving
table data
● Amazon’s Simple Storage Service (S3) is an eventually consistent object store, and HBase requires a
consistent and atomic filesystem which means that it cannot directly use S3. Let's look at the topology.
© 2019 Cloudera, Inc. All rights reserved. 15
CLOUD OPTIMIZED : Cloudera HBASE backed by both HDFS and S3
Cloudera with CDP has built a solution where when you launch an Operational Database (HBase)
cluster on CDP, HBase StoreFiles (the backing files for HBase tables) are stored in S3 and HBase
write-ahead-logs (WAL) are stored in an HDFS instance run alongside HBase per usual.
© 2019 Cloudera, Inc. All rights reserved. 16
CLOUD OPTIMIZED : HBASE backed by both HDFS and S3
● Configuring HBase to use S3 for its StoreFiles has many benefits to our users.
● One such benefit is that users can decouple their storage and compute.
● If there are times in which no access to HBase is necessary, HBase can be cleanly
shut down and all compute resources reclaimed to eliminate any cost of compute.
● When HBase access is needed again, the HBase cluster can be recreated, pointing
to the same data in S3. Upon startup, HBase can re-initialize itself solely from the
data in S3.
© 2019 Cloudera, Inc. All rights reserved. 17
WHERE IS APACHE HBASE TODAY
• Large ecosystem (Nifi, Spark, Hive, Impala,
SOLR, Ranger, Atlas, etc)
• Supports NoSQL, SQL, Geospatial, Graph,
TimeSeries, Key Value and other modes1
1. In conjunction with other open source projects built on top of HBase
© 2019 Cloudera, Inc. All rights reserved. 18
Cloudera and Apache HBase
● The upstream community is pretty huge and very active with contributions coming
from multiple developers from Cloudera, Microsoft, Amazon, Alibaba, Apple Salesforce
and Xiaomi etc.
● Cloudera is a very active contributor to upstream HBase along with Apache Phoenix.
○ Currently > 8 PMCs and > 2 committers.
● CDP is based off latest HBase v2 and Phoenix v5.
© 2019 Cloudera, Inc. All rights reserved. 19
New features in HBase 2+
● Operational simplicity
○ Assignment Manager V2 (using Procedure Framework 2)
○ Offline compaction tool (outside regionservers to save I/O thrasing)
○ Replication: namespace & serial and for bulk-loads
● Performance
○ Off-heap cache improvements (Uses DirectByteByffers to manage buckets outside
of the JVM heap to eliminate impact of gc to get better read perf)
● Space Quotas (to support multi tenancy)
● S3 support
● Spark 2 integration
● Async Client
© 2019 Cloudera, Inc. All rights reserved. 20
• Provides familiar & easy interface for
developers
• Advanced multi-tenancy capabilities
• Support near 100% availability for mission
critical applications & many traditional
transactional apps
• Scale to billions of rows and millions of
columns
• Easily combine data sources that use a wide
variety of different structures and schemas
Storage for business apps that require big-data
Ingest Store Primary Use
Query &
Remediate
NO(T ONLY)SQL PHOENIX
© 2019 Cloudera, Inc. All rights reserved. 21
© 2019 Cloudera, Inc. All rights reserved. 22
© 2019 Cloudera, Inc. All rights reserved. 23
None of this command line mess:
24© Cloudera, Inc. All rights reserved.
HUE FOR SQL & DATA BROWSING FOR REMEDIATION
Supports SQL based insert, update,
delete query for data in HBase
Supports search, insert, update,
delete, DDL for HBase
SQL interface using Impala or Hive for query processing GUI based data browser
© 2019 Cloudera, Inc. All rights reserved. 25
Visualization
© 2019 Cloudera, Inc. All rights reserved. 26
HBase CRUD
27© Cloudera, Inc. All rights reserved.
ENABLING DATA-DRIVEN APPS
Fast
• Real time model serving w/ <5ms latency
• Limitless concurrency (>100M updates/sec)
Easy
• Stream and bulk ingest & processing
• Process automation
• Consolidate multiple databases
• Schema flexibility
• SQL & NoSQL interface
Scalable
• Multi-petabyte scale
• Unlimited Tenants
Highly-available
• Automatic recovery from server failure
• Advanced replication & synchronization
topographies
• Multiple backup methodologies
Multi-tenant capable
• Resource isolation
• Throttling & Quotas
Secure
• Role based access control
• Fine-grained authorizations (e.g., tenant, table,
column family, cell)
© 2019 Cloudera, Inc. All rights reserved. 28
NiFi Integration
with HBase
• PutHBaseRecord
• PutHBaseJSON
• PutHBaseCell
• FetchHBaseRow
• GetHBase
• ScanHBase
• DeleteHBaseRow
• DeleteHBaseCells
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
ELT/ETL Lookup Services
• HBase_1_1_2_ListLookupService
• HBase_2_RecordLookupService
• HBase_2_ClientService
• HBase_2_ClientMapCacheService
https://community.cloudera.com/t5/Community-Articles/Reading-OpenData-JSON-and-Stori
ng-into-Phoenix-Tables/ta-p/247323
© 2019 Cloudera, Inc. All rights reserved. 29
© 2019 Cloudera, Inc. All rights reserved. 30
© 2019 Cloudera, Inc. All rights reserved. 31
© 2019 Cloudera, Inc. All rights reserved. 32
© 2019 Cloudera, Inc. All rights reserved. 33
© 2019 Cloudera, Inc. All rights reserved. 34
© 2019 Cloudera, Inc. All rights reserved. 35
© 2019 Cloudera, Inc. All rights reserved. 36
© 2019 Cloudera, Inc. All rights reserved. 37
© 2019 Cloudera, Inc. All rights reserved. 38
© 2019 Cloudera, Inc. All rights reserved. 39
© 2019 Cloudera, Inc. All rights reserved. 40
© 2019 Cloudera, Inc. All rights reserved. 41
© 2019 Cloudera, Inc. All rights reserved. 42
© 2019 Cloudera, Inc. All rights reserved. 43
© 2019 Cloudera, Inc. All rights reserved. 44
© 2019 Cloudera, Inc. All rights reserved. 45
© 2019 Cloudera, Inc. All rights reserved. 46
https://github.com/tspannhw/HBase2
© 2019 Cloudera, Inc. All rights reserved. 47
© 2019 Cloudera, Inc. All rights reserved. 48
SPRING BOOT APPLICATION TO PHOENIX
https://community.cloudera.com/t5/Community-Articles/Creating-a-Spring-Boot-Java-8-Microservice-To-Read-Apache/ta-p/247379
https://github.com/tspannhw/phillycrime-springboot-phoenix
© 2019 Cloudera, Inc. All rights reserved. 49
© 2019 Cloudera, Inc. All rights reserved. 50
TH N Y U

More Related Content

What's hot

Emerging trends in data analytics
Emerging trends in data analyticsEmerging trends in data analytics
Emerging trends in data analyticsWei-Chiu Chuang
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data InsightsDataWorks Summit
 
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...DataWorks Summit
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionCloudera, Inc.
 
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopAccelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopDataWorks Summit
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidDataWorks Summit
 
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...Spark Summit
 
Insight into Hyperconverged Infrastructure
Insight into Hyperconverged Infrastructure Insight into Hyperconverged Infrastructure
Insight into Hyperconverged Infrastructure HTS Hosting
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTCloudera, Inc.
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsDataWorks Summit
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonDataWorks Summit/Hadoop Summit
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataDataWorks Summit
 
Spark + Flashblade: Spark Summit East talk by Brian Gold
Spark + Flashblade: Spark Summit East talk by Brian GoldSpark + Flashblade: Spark Summit East talk by Brian Gold
Spark + Flashblade: Spark Summit East talk by Brian GoldSpark Summit
 
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache RangerDataWorks Summit
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsDataWorks Summit
 
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTDenis Magda
 

What's hot (20)

Emerging trends in data analytics
Emerging trends in data analyticsEmerging trends in data analytics
Emerging trends in data analytics
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and HadoopAccelerate Your Big Data Analytics Efforts with SAS and Hadoop
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and Druid
 
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Insight into Hyperconverged Infrastructure
Insight into Hyperconverged Infrastructure Insight into Hyperconverged Infrastructure
Insight into Hyperconverged Infrastructure
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
 
Spark + Flashblade: Spark Summit East talk by Brian Gold
Spark + Flashblade: Spark Summit East talk by Brian GoldSpark + Flashblade: Spark Summit East talk by Brian Gold
Spark + Flashblade: Spark Summit East talk by Brian Gold
 
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments Using Apache Ranger
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
 
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
 

Similar to Cloudera Operational DB (Apache HBase & Apache Phoenix)

Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016Mladen Kovacevic
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
Apache Hive: From MapReduce to Enterprise-grade Big Data WarehousingApache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousingc-bslim
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)Todd Lipcon
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupCaserta
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batchboorad
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Datamichaelguia
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataMike Percy
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...HostedbyConfluent
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 

Similar to Cloudera Operational DB (Apache HBase & Apache Phoenix) (20)

Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
Introducing Apache Kudu (Incubating) - Montreal HUG May 2016
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
Apache Hive: From MapReduce to Enterprise-grade Big Data WarehousingApache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batch
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
 
Introducing Kudu
Introducing KuduIntroducing Kudu
Introducing Kudu
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 

More from Timothy Spann

April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...Timothy Spann
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-PipelinesTimothy Spann
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-ProfitsTimothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsTimothy Spann
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Timothy Spann
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI PipelinesTimothy Spann
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...Timothy Spann
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesTimothy Spann
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101Timothy Spann
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC MeetupTimothy Spann
 

More from Timothy Spann (20)

April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
 

Recently uploaded

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 

Recently uploaded (20)

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 

Cloudera Operational DB (Apache HBase & Apache Phoenix)

  • 1. Cloudera Operational DB (powered by Apache HBase and Apache Phoenix) Beyond the Tyranny of the Schema December 2019 Timothy Spann
  • 2. © 2019 Cloudera, Inc. All rights reserved. 2 Welcome to Future of Data - Princeton @PaasDev https://www.meetup.com/futureofdata-princeton/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  • 3. © 2019 Cloudera, Inc. All rights reserved. 3 Who Am I? Timothy Spann Data in Motion Field Engineer @PaasDev DZone Zone Leader and Big Data MVB; Princeton NJ Future of Data Meetup; ex-Pivotal Field Engineer; Author of Apache Kafka RefCard https://github.com/tspannhw https://www.datainmotion.dev/
  • 4. © 2019 Cloudera, Inc. All rights reserved. 4 This Meetup Made Possible Thanks To: Paul Vidal from https://www.meetup.com/futureofdata-philadelphia/ for CDP HBase Environment and Cloud Magic Josh Elser and Josiah Goodson for OpDB Slides and HBase Guidance Milind Pandit from https://www.meetup.com/TechnologySolutionsHub Mehul Shah from https://www.meetup.com/TechnologySolutionsHub Vijay Garg from https://pga.fund/ Madhavi from https://www.nuwaysolutions.com/ https://www.slideshare.net/bunkertor/tracking-crime-as-it-occurs-with-apache-phoenix-apache-hbase-and-apache-nifi
  • 5. © 2019 Cloudera, Inc. All rights reserved. 5 WHAT HAVE Apache HBase 2.0 & Apache Phoenix ENABLED? • Operationalizing ML / AI to revolutionize healthcare, public utilities, etc • Serving real-time content at webscale • Empowering big data analytics for operational and offline uses • Acting as a resilient store of record CLOUDERA OPERATIONAL DB (powered by Apache HBase & Apache Phoenix) Operational DB is DBMS used to manage dynamic and changing data in real time and enable applications that drive the business
  • 6. 6© Cloudera, Inc. All rights reserved. APACHE HBASE FAST FACTS Largest Database 14 Petabytes Best Known App Siri Fastest Ingestion 20M Events/s Users 750+
  • 7. 7© Cloudera, Inc. All rights reserved. HBASE ARCHITECTURE HMaster Orchestration layer ZooKeeper Region Server DataNode Data plane ColFam ColFam Col Col Col Col R Val Val Val Val R Val Val Val Val ColFam ColFam Col Col Col Col R Val Val Val Val R Val Val Val Val Region ColFam ColFam Col Col Col Col R Val Val Val Val R Val Val Val Val • Regions are table segments • Read and write path are in the data plane • DDL operations • Region assignment • Recovery orchestration • Heartbeat • Server state • Services client reads & writes • Maximizes in-memory operations for low-latency operations • Provides data resiliency
  • 8. 8© Cloudera, Inc. All rights reserved. SCHEMA-LESS DATA MODEL • Column families defined at time of table creation • Columns created as required (at time of data insertion) • No limits to number of columns • Tables can grow in two dimensions – columns and rows • Compression & encoding applied at column family level • No declaration of data types (i.e., a column can contain multiple data types) Column Family Column Family Column Column Column Column RowKey Cell Cell Cell Cell RowKey Cell Cell Cell Cell
  • 9. © 2019 Cloudera, Inc. All rights reserved. 9 HIGHLY AVAILABLE OUT OF THE BOX (<1 MINUTE RECOVERY) Region Server 1 HDFS (3 copies of data) Region Server 2 Region Server 3 What happens when a region crashes 1. Region server crashes 2. Writes and reads time-out for regions in impacted region server 3. Regions are redistributed to other region servers 4. WAL is replayed in other region servers 5. Reads & writes are able to continue to impacted regions Typical recovery period < 1 minute (for impacted regions only) No manual intervention
  • 10. 10© Cloudera, Inc. All rights reserved. SECURITY MODEL Authentication • Kerberos Role Based Access control • Permissions & Scope enable flexible role based access control • Scope: Global, Namespace, Table, Column Family, Cell DB security & encryption • Transparent encryption of data on the wire and data on disk (HFile for data at rest, secure WAL for data in motion within HBase) • Logging & auditability: configurable & fixed event
  • 11. 11 stmt.executeUpdate(“UPSERT INTO TABLE_NAME VALUES(rowKey, GREETINGS) "); stmt.execute(); Phoenix What Phoenix adds to HBase Pros: Cons: • Maximally flexible & customizable • SQL only for data remediation • Unfamiliar to SQL developers • Requires non-traditional data architecture • Programmatic ANSI SQL support • RDBMS-like data architecture • Auto-applies performance best practices • Can co-exist with HBase apps • Reduced flexibility vis-à-vis vanilla HBase • Phoenix specific data format means you can’t use HBase APIs directly Put put = new Put(Bytes.toBytes(rowKey)); put.addColumn(COLUMN_FAMILY_NAME, COLUMN_NAME, Bytes.toBytes(GREETINGS)); table.put(put); HBase RDBMS-like, scale-out databaseFlexible, scale-out, no-sql database
  • 12. 12 Key Phoenix capabilities • ANSI SQL including joins • Flexible Schemas / Dynamic Columns • Secondary Indexes • Aggregation pushdowns • Cross-language client support • Query logging • Security through Ranger (supports RBAC, ABAC, etc) • JDBC/ODBC connectivity for operational reporting • Plugs in to any JDBC/ODBC-compatible BI tool to enable self-service analytics and insight Phoenix Applications
  • 13. 13 ANSI SQL 92 Support Supported today Roadmap Standard SQL Data Types UNION SELECT, UPSERT, DELETE Windowing Functions JOINs: Inner and Outer Transactions Subqueries Cross Joins Secondary Indexes Authorization GROUP BY, ORDER BY, HAVING Replication Management AVG, COUNT, MIN, MAX, SUM Column Constraints and Defaults Primary Keys, Constraints UDFs CASE, COALESCE VIEWs Flexible Schema UNION ALL
  • 14. © 2019 Cloudera, Inc. All rights reserved. 14 CLOUD OPTIMIZED : HBASE backed by both HDFS and S3 Cloudera provides HBase backed by Amazon’s S3 ● Cloudera Data Platform (CDP) provides an out-of-the-box solution that allows Apache HBase deployments to use Amazon Simple Storage Service (S3) as its main persistence layer for saving table data ● Amazon’s Simple Storage Service (S3) is an eventually consistent object store, and HBase requires a consistent and atomic filesystem which means that it cannot directly use S3. Let's look at the topology.
  • 15. © 2019 Cloudera, Inc. All rights reserved. 15 CLOUD OPTIMIZED : Cloudera HBASE backed by both HDFS and S3 Cloudera with CDP has built a solution where when you launch an Operational Database (HBase) cluster on CDP, HBase StoreFiles (the backing files for HBase tables) are stored in S3 and HBase write-ahead-logs (WAL) are stored in an HDFS instance run alongside HBase per usual.
  • 16. © 2019 Cloudera, Inc. All rights reserved. 16 CLOUD OPTIMIZED : HBASE backed by both HDFS and S3 ● Configuring HBase to use S3 for its StoreFiles has many benefits to our users. ● One such benefit is that users can decouple their storage and compute. ● If there are times in which no access to HBase is necessary, HBase can be cleanly shut down and all compute resources reclaimed to eliminate any cost of compute. ● When HBase access is needed again, the HBase cluster can be recreated, pointing to the same data in S3. Upon startup, HBase can re-initialize itself solely from the data in S3.
  • 17. © 2019 Cloudera, Inc. All rights reserved. 17 WHERE IS APACHE HBASE TODAY • Large ecosystem (Nifi, Spark, Hive, Impala, SOLR, Ranger, Atlas, etc) • Supports NoSQL, SQL, Geospatial, Graph, TimeSeries, Key Value and other modes1 1. In conjunction with other open source projects built on top of HBase
  • 18. © 2019 Cloudera, Inc. All rights reserved. 18 Cloudera and Apache HBase ● The upstream community is pretty huge and very active with contributions coming from multiple developers from Cloudera, Microsoft, Amazon, Alibaba, Apple Salesforce and Xiaomi etc. ● Cloudera is a very active contributor to upstream HBase along with Apache Phoenix. ○ Currently > 8 PMCs and > 2 committers. ● CDP is based off latest HBase v2 and Phoenix v5.
  • 19. © 2019 Cloudera, Inc. All rights reserved. 19 New features in HBase 2+ ● Operational simplicity ○ Assignment Manager V2 (using Procedure Framework 2) ○ Offline compaction tool (outside regionservers to save I/O thrasing) ○ Replication: namespace & serial and for bulk-loads ● Performance ○ Off-heap cache improvements (Uses DirectByteByffers to manage buckets outside of the JVM heap to eliminate impact of gc to get better read perf) ● Space Quotas (to support multi tenancy) ● S3 support ● Spark 2 integration ● Async Client
  • 20. © 2019 Cloudera, Inc. All rights reserved. 20 • Provides familiar & easy interface for developers • Advanced multi-tenancy capabilities • Support near 100% availability for mission critical applications & many traditional transactional apps • Scale to billions of rows and millions of columns • Easily combine data sources that use a wide variety of different structures and schemas Storage for business apps that require big-data Ingest Store Primary Use Query & Remediate NO(T ONLY)SQL PHOENIX
  • 21. © 2019 Cloudera, Inc. All rights reserved. 21
  • 22. © 2019 Cloudera, Inc. All rights reserved. 22
  • 23. © 2019 Cloudera, Inc. All rights reserved. 23 None of this command line mess:
  • 24. 24© Cloudera, Inc. All rights reserved. HUE FOR SQL & DATA BROWSING FOR REMEDIATION Supports SQL based insert, update, delete query for data in HBase Supports search, insert, update, delete, DDL for HBase SQL interface using Impala or Hive for query processing GUI based data browser
  • 25. © 2019 Cloudera, Inc. All rights reserved. 25 Visualization
  • 26. © 2019 Cloudera, Inc. All rights reserved. 26 HBase CRUD
  • 27. 27© Cloudera, Inc. All rights reserved. ENABLING DATA-DRIVEN APPS Fast • Real time model serving w/ <5ms latency • Limitless concurrency (>100M updates/sec) Easy • Stream and bulk ingest & processing • Process automation • Consolidate multiple databases • Schema flexibility • SQL & NoSQL interface Scalable • Multi-petabyte scale • Unlimited Tenants Highly-available • Automatic recovery from server failure • Advanced replication & synchronization topographies • Multiple backup methodologies Multi-tenant capable • Resource isolation • Throttling & Quotas Secure • Role based access control • Fine-grained authorizations (e.g., tenant, table, column family, cell)
  • 28. © 2019 Cloudera, Inc. All rights reserved. 28 NiFi Integration with HBase • PutHBaseRecord • PutHBaseJSON • PutHBaseCell • FetchHBaseRow • GetHBase • ScanHBase • DeleteHBaseRow • DeleteHBaseCells https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html ELT/ETL Lookup Services • HBase_1_1_2_ListLookupService • HBase_2_RecordLookupService • HBase_2_ClientService • HBase_2_ClientMapCacheService https://community.cloudera.com/t5/Community-Articles/Reading-OpenData-JSON-and-Stori ng-into-Phoenix-Tables/ta-p/247323
  • 29. © 2019 Cloudera, Inc. All rights reserved. 29
  • 30. © 2019 Cloudera, Inc. All rights reserved. 30
  • 31. © 2019 Cloudera, Inc. All rights reserved. 31
  • 32. © 2019 Cloudera, Inc. All rights reserved. 32
  • 33. © 2019 Cloudera, Inc. All rights reserved. 33
  • 34. © 2019 Cloudera, Inc. All rights reserved. 34
  • 35. © 2019 Cloudera, Inc. All rights reserved. 35
  • 36. © 2019 Cloudera, Inc. All rights reserved. 36
  • 37. © 2019 Cloudera, Inc. All rights reserved. 37
  • 38. © 2019 Cloudera, Inc. All rights reserved. 38
  • 39. © 2019 Cloudera, Inc. All rights reserved. 39
  • 40. © 2019 Cloudera, Inc. All rights reserved. 40
  • 41. © 2019 Cloudera, Inc. All rights reserved. 41
  • 42. © 2019 Cloudera, Inc. All rights reserved. 42
  • 43. © 2019 Cloudera, Inc. All rights reserved. 43
  • 44. © 2019 Cloudera, Inc. All rights reserved. 44
  • 45. © 2019 Cloudera, Inc. All rights reserved. 45
  • 46. © 2019 Cloudera, Inc. All rights reserved. 46 https://github.com/tspannhw/HBase2
  • 47. © 2019 Cloudera, Inc. All rights reserved. 47
  • 48. © 2019 Cloudera, Inc. All rights reserved. 48 SPRING BOOT APPLICATION TO PHOENIX https://community.cloudera.com/t5/Community-Articles/Creating-a-Spring-Boot-Java-8-Microservice-To-Read-Apache/ta-p/247379 https://github.com/tspannhw/phillycrime-springboot-phoenix
  • 49. © 2019 Cloudera, Inc. All rights reserved. 49
  • 50. © 2019 Cloudera, Inc. All rights reserved. 50 TH N Y U