SlideShare a Scribd company logo
1 of 37
1© Cloudera, Inc. All rights reserved.
Hadoop Summit EU, 16 Apr 2015
Jonathan Hsieh| HBase Tech Lead @ Cloudera, Apache HBase PMC
Dima Spivak | HBase QE Lead @ Cloudera
Multi-tenant, Multi-cluster and
Multi-container Apache HBase
Deployments
2© Cloudera, Inc. All rights reserved.
• Jonathan Hsieh
• Tech Lead, HBase Team @ Cloudera
• Apache HBase PMC Member
• Apache Flume founder
• Contact
• jon@cloudera.com
• @jmhsieh
• Dima Spivak
• QE Lead, HBase Team @Cloudera
• Contact
• dspivak@cloudera.com
• @dimaspivak
Who are we?
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
3© Cloudera, Inc. All rights reserved.
What is Apache HBase?
Apache HBase is an
consistent, low latency,
random access, non-
relational database built
on Apache Hadoop.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
4© Cloudera, Inc. All rights reserved.
Some HBase Contributors, Users, and Providers
5© Cloudera, Inc. All rights reserved.
Challenges as usage increases
• How does one:
• Isolate different application workloads.
• Share datasets between different workloads.
• Prepare for geographic redundancy and availability.
• Manage cluster migrations.
• Test and prototype (multi-)cluster deployments.
• There are multiple solutions!
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
6© Cloudera, Inc. All rights reserved.
Multiple Multi- Solutions
Using more than one cluster for
an application.
Using one cluster for more than
one application.
Using one machine to run [one
or more] multi-node clusters.
Multi-Cluster Multi-Tenant Multi-Container
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
7© Cloudera, Inc. All rights reserved.
Multi-Cluster
Safety in numbers
8© Cloudera, Inc. All rights reserved.
Multi-Cluster Deployments
• Deploy multiple HBase cluster instances.
• Motivation:
• Isolating different workloads from each other.
• Geographic disaster recovery, redundancy, and availability.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
9© Cloudera, Inc. All rights reserved.
Isolation
• Isolation is usually done in were many apps share one data center.
• Two different workloads on the same dataset.
• Perform latency-sensitive workloads on the same set of data as analytic MR
workload.
• Two disjoint applications workloads and datasets.
• Deploy OpenTSDB on HBase in same data center, but as cluster to monitor
production HBase cluster.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
10© Cloudera, Inc. All rights reserved.
Isolation: Operational with Analytical access pattern
HBase Client
Get, Scan
HBase Replication
low latency
Isolated from full scans
high throughput
MapReduce
HBase Scanner
HBase Client
Put, Incr, Append
Bulk Import
HBase Client
HBase Replication
high throughput
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
11© Cloudera, Inc. All rights reserved.
Geographic Recovery, Redundancy, and Availability
• Run multiple HBase clusters in multiple data centers.
• Often using “Podding” schemes.
• Primarily for backups of data in case data center outages.
• Locality for Performance.
• Locality for Compliance.
• Availability while a datacenter is down.
• Deploy with:
• HBase replication - master master, master slave.
• Multicluster clients.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
12© Cloudera, Inc. All rights reserved.
Master-Master Replication
logs logs
logs
Replicating data reduces chances of data loss.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
13© Cloudera, Inc. All rights reserved.
HBase Multi-Cluster Client
• High Availability with Eventual
Consistency when using replication.
• Simple implementation.
• Hedged operations. If primary takes
too long, go to the failover cluster.
• Same HConnection interface just a
different factory
HConnectionManagerMultiClusterWrapper.get
Connection(conf)
• HBase.MCC to be available in Cloudera
Labs.
Work by Ted Malaska (Cloudera Solution Architect)
https://github.com/tmalaska/HBase.MCC
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
14© Cloudera, Inc. All rights reserved.
Multi-Tenant
We’re all in this together
15© Cloudera, Inc. All rights reserved.
Multi-tenant deployments
• Deploy multiple workloads on one cluster.
• Motivation:
• Better Resource utilization.
• Cost efficiency.
• Simpler operations.
• Shared data.
• Multiple services on one cluster.
• Running HBase, Spark, Impala and MR on the same cluster.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
16© Cloudera, Inc. All rights reserved.
Security and namespaces
• Challenges:
• Resource management, prioritizing and fairness.
• Authentication and Authorization.
• Mechanisms:
• HBase Security – Authentication, Authorization for commands via ACLs.
• Namespaces – Isolate administrative domains for ACLs.
• Proxy Impersonation – Thrift proxy doAs, and REST proxy doAs.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
17© Cloudera, Inc. All rights reserved.
Request Throttling
• Idea: some tables or users get a limited
budget of ops or throughput, while others
do not.
• Multiple workloads on one dataset.
• Production/real-time user: unthrottled.
• Analytic/adhoc workloads user: throttled.
• Caveat: if all users throttled, we may not use
all machine resources.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
18© Cloudera, Inc. All rights reserved.
Request Scheduling
• Idea: gets should have high priority while
scans should get deprioritized the more
they are used (HBASE-10994).
• Multiple workloads on one dataset .
• Production real-time gets: immediately
scheduled.
• Analytic scan workloads: delay
scheduled.
• All resources are used.
• Caveat: requires manual tuning .
1 1 2 1 1 3 1
1 1 21 1 31
Delayed by long
scan requests
Rescheduled so
new request get
priority
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
19© Cloudera, Inc. All rights reserved.
Performance Isolation inside a cluster
• Region Server Groups (under review).
• Limit performance impact load on one
table has on others (HBASE-6721).
• Multiple workloads on multiple data sets
on one HBase cluster.
• Two separate apps on one cluster.
Mixed workload
Isolated
workload
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
20© Cloudera, Inc. All rights reserved.
• Today, the easiest strategy for isolating latency-sensitive HBase deployment from
other services is static partitioning.
• Future:
• Improve IO isolation via YARN/Slider/Mesos.
• Separate HBase actions into separate processes.
• e.g. externalize compaction for better resource management.
Service Isolation
Yarn NM/MR
HBase RS
impalad
HDFS DN
Yarn NM/MR
HBase RS
impalad
HDFS DN
Yarn NM/MR
HBase RS
impalad
HDFS DN
Yarn NM/MR
HBase RS
impalad
HDFS DN
Yarn NM/MR
HBase RS
impalad
HDFS DN
Yarn NM/MR
HBase RS
impalad
HDFS DN
HBase RS
HDFS DN
Yarn NM/MR
impalad
HDFS DN
HBase RS
HDFS DN
HBase RS
HDFS DN
Yarn NM/MR
impalad
HDFS DN
Yarn NM/MR
impalad
HDFS DN
Multi service deployment Statically partitioned service deployment
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
21© Cloudera, Inc. All rights reserved.
Multi-Container
My name is Jonah
22© Cloudera, Inc. All rights reserved.
Multi-container deployments
• Run a distributed HBase cluster on a single host.
• Testing applications.
• Use cases requiring quick cluster stand-up.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
23© Cloudera, Inc. All rights reserved.
Linux containers
• cgroups (2.6.24+).
• Isolating resources (memory, CPU, networking).
• Namespace isolation (filesystems, process trees).
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
24© Cloudera, Inc. All rights reserved.
Virtual Machines vs Linux Containers
Hypervisor
Host Operating System
Guest OS Guest OS Guest OS Guest OS
Libraries Libraries Libraries Libraries
User
processes
User
processes
User
processes
User
processes
Virtual Machines
Host Operating System
Libraries
User
processes
User
processes
User
processes
User
processes
Containers
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
25© Cloudera, Inc. All rights reserved.
Docker
• User front-end for containers.
• Container management (start, stop,
pause).
• docker run
• Images (templates for containers).
• docker commit
• Registries (repository for images).
• docker push
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
26© Cloudera, Inc. All rights reserved.
Integration testing
• Automate long-running tests from hbase-it module.
• $ hbase org.apache.hadoop.hbase.IntegrationTest…
• Integration with fault injection framework (Chaos Monkey).
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
27© Cloudera, Inc. All rights reserved.
Starting container cluster
DNS server
dnsserver
(10.0.0.2)
Node
node-1
(10.0.0.3)
Node
node-2
(10.0.0.4)
Start cluster
Master Slave
Node
node-3
(10.0.0.5)
Slave
Node
node-4
(10.0.0.6)
Slave
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
28© Cloudera, Inc. All rights reserved.
Automation
• Replace fragile infrastructure.
• Setup distributed cluster as part of test execution.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
29© Cloudera, Inc. All rights reserved.
In progress
• Extend this workflow to upstream Apache HBase (HBASE-12721)
• Upstream integration testing (builds.apache.org)
• Multi-cluster use cases (e.g. MCC, replication)
• Upgrades
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
30© Cloudera, Inc. All rights reserved.
Conclusions
Multi multi multi
31© Cloudera, Inc. All rights reserved.
Summary
• Fancy table that summarizes our talk
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
Goal Multi Cluster Multi Tenant Multi-Container
Isolate workloads One cluster per workload. Region Server Groups. cgroups.
32© Cloudera, Inc. All rights reserved.
Summary
• Fancy table that summarizes our talk
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
Goal Multi Cluster Multi Tenant Multi-Container
Isolate workloads One cluster per workload. Region Server Groups. cgroups.
Multiple workloads on
same dataset
(real-time vs analytic
workload)
Separate cluster per
workload.
Request throttling,
request scheduling.
Containers as “VMs” or
microservices.
33© Cloudera, Inc. All rights reserved.
Summary
• Fancy table that summarizes our talk
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
Goal Multi Cluster Multi Tenant Multi-Container
Isolate workloads One cluster per workload. Region Server Groups. cgroups.
Multiple workloads on
same dataset
(real-time vs analytic
workload)
Separate cluster per
workload.
Request throttling,
request scheduling.
Containers as “VMs” or
microservices.
Reliability and
Availability
Disaster recovery,
master-master replication,
multi-cluster client.
Multiple tables with Region
Server Groups.
More realistic testing.
34© Cloudera, Inc. All rights reserved.
Summary
• Fancy table that summarizes our talkGoal Multi Cluster Multi Tenant Multi-Container
Isolate workloads One cluster per workload. Region Server Groups. cgroups.
Multiple workloads on
same dataset
(real-time vs analytic
workload)
Separate cluster per
workload.
Request throttling,
request scheduling.
Containers as “VMs” or
microservices.
Reliability and
Availability
Disaster recovery,
master-master replication,
multi-cluster client.
Multiple tables with Region
Server Groups.
More realistic testing.
Cost Savings Disaster recovery. One cluster, multiple use
cases.
One machine, multiple
nodes.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
35© Cloudera, Inc. All rights reserved.
Futures
• We are seeing more and more deployments that are multi cluster and/or multi-
tenant.
• Traditional workflows are giving way to hybrid ones
• More knobs to turn to optimize for performance and value
• Multi-container deployments are a way forward to make prototyping and testing
these deployments easier.
16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
36© Cloudera, Inc. All rights reserved.
Thank you!
37© Cloudera, Inc. All rights reserved.
HBaseCon 2015 is Coming!
Thurs., May 7, in San Francisco
Presentations from the world’s biggest HBase operators:
Bloomberg, Dropbox, eBay, Facebook, Google, Pinterest, Xiaomi, Yahoo!, more!
Seats are limited; register at hbasecon.com
Community Sponsor

More Related Content

What's hot

Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4jNeo4j
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks EDB
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overviewDataArt
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Event Hub & Azure Stream Analytics
Event Hub & Azure Stream AnalyticsEvent Hub & Azure Stream Analytics
Event Hub & Azure Stream AnalyticsDavide Mauri
 
Azure kubernetes service (aks)
Azure kubernetes service (aks)Azure kubernetes service (aks)
Azure kubernetes service (aks)Akash Agrawal
 
Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetJeno Yamma
 
SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memoryJulian Hyde
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 
Introducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task AutomationIntroducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task AutomationScyllaDB
 
(Draft) Kubernetes - A Comprehensive Overview
(Draft) Kubernetes - A Comprehensive Overview(Draft) Kubernetes - A Comprehensive Overview
(Draft) Kubernetes - A Comprehensive OverviewBob Killen
 
SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can helpChristian Tzolov
 
Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...
Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...
Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...HostedbyConfluent
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...Red Hat Developers
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 

What's hot (20)

Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Event Hub & Azure Stream Analytics
Event Hub & Azure Stream AnalyticsEvent Hub & Azure Stream Analytics
Event Hub & Azure Stream Analytics
 
Azure kubernetes service (aks)
Azure kubernetes service (aks)Azure kubernetes service (aks)
Azure kubernetes service (aks)
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat Sheet
 
SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memory
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
 
Docker architecture-04-1
Docker architecture-04-1Docker architecture-04-1
Docker architecture-04-1
 
Introducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task AutomationIntroducing Scylla Manager: Cluster Management and Task Automation
Introducing Scylla Manager: Cluster Management and Task Automation
 
(Draft) Kubernetes - A Comprehensive Overview
(Draft) Kubernetes - A Comprehensive Overview(Draft) Kubernetes - A Comprehensive Overview
(Draft) Kubernetes - A Comprehensive Overview
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can help
 
Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...
Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...
Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 

Viewers also liked

HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon
 
Making Apache Tomcat Multi-tenant, Elastic and Metered
Making Apache Tomcat Multi-tenant, Elastic and MeteredMaking Apache Tomcat Multi-tenant, Elastic and Metered
Making Apache Tomcat Multi-tenant, Elastic and MeteredPaul Fremantle
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTCloudera, Inc.
 
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementDataWorks Summit
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Kai Sasaki
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashAndrei Savu
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformCloudera, Inc.
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...DataWorks Summit/Hadoop Summit
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSCloudera, Inc.
 
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseConfiguring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseCloudera, Inc.
 
A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2Afkham Azeez
 
CFSummit: Data Science on Cloud Foundry
CFSummit: Data Science on Cloud FoundryCFSummit: Data Science on Cloud Foundry
CFSummit: Data Science on Cloud FoundryIan Huston
 
HBase Incremental Backup
HBase Incremental BackupHBase Incremental Backup
HBase Incremental BackupLee neal
 
Federated HDFS
Federated HDFSFederated HDFS
Federated HDFShuguk
 
Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human TimeDataWorks Summit
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Impetus Technologies
 

Viewers also liked (20)

HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
 
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFSToward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
 
Making Apache Tomcat Multi-tenant, Elastic and Metered
Making Apache Tomcat Multi-tenant, Elastic and MeteredMaking Apache Tomcat Multi-tenant, Elastic and Metered
Making Apache Tomcat Multi-tenant, Elastic and Metered
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
 
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance Measurement
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseConfiguring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
 
Managing a Multi-Tenant Data Lake
Managing a Multi-Tenant Data LakeManaging a Multi-Tenant Data Lake
Managing a Multi-Tenant Data Lake
 
A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2
 
CFSummit: Data Science on Cloud Foundry
CFSummit: Data Science on Cloud FoundryCFSummit: Data Science on Cloud Foundry
CFSummit: Data Science on Cloud Foundry
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
HBase Incremental Backup
HBase Incremental BackupHBase Incremental Backup
HBase Incremental Backup
 
Federated HDFS
Federated HDFSFederated HDFS
Federated HDFS
 
Interactive Analytics in Human Time
Interactive Analytics in Human TimeInteractive Analytics in Human Time
Interactive Analytics in Human Time
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
 

Similar to Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformInMobi Technology
 
Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015Apekshit Sharma
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Frank Munz
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadooplamont_lockwood
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kiteJoey Echeverria
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo OverviewBill Havanki
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of HadoopCloudera, Inc.
 
Trends in Supporting Production Apache HBase Clusters
Trends in Supporting Production Apache HBase ClustersTrends in Supporting Production Apache HBase Clusters
Trends in Supporting Production Apache HBase ClustersDataWorks Summit
 
Apache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's UpcomingApache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's Upcominghuguk
 

Similar to Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments (20)

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadoop
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo Overview
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of Hadoop
 
Trends in Supporting Production Apache HBase Clusters
Trends in Supporting Production Apache HBase ClustersTrends in Supporting Production Apache HBase Clusters
Trends in Supporting Production Apache HBase Clusters
 
Apache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's UpcomingApache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's Upcoming
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Recently uploaded (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments

  • 1. 1© Cloudera, Inc. All rights reserved. Hadoop Summit EU, 16 Apr 2015 Jonathan Hsieh| HBase Tech Lead @ Cloudera, Apache HBase PMC Dima Spivak | HBase QE Lead @ Cloudera Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
  • 2. 2© Cloudera, Inc. All rights reserved. • Jonathan Hsieh • Tech Lead, HBase Team @ Cloudera • Apache HBase PMC Member • Apache Flume founder • Contact • jon@cloudera.com • @jmhsieh • Dima Spivak • QE Lead, HBase Team @Cloudera • Contact • dspivak@cloudera.com • @dimaspivak Who are we? 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 3. 3© Cloudera, Inc. All rights reserved. What is Apache HBase? Apache HBase is an consistent, low latency, random access, non- relational database built on Apache Hadoop. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 4. 4© Cloudera, Inc. All rights reserved. Some HBase Contributors, Users, and Providers
  • 5. 5© Cloudera, Inc. All rights reserved. Challenges as usage increases • How does one: • Isolate different application workloads. • Share datasets between different workloads. • Prepare for geographic redundancy and availability. • Manage cluster migrations. • Test and prototype (multi-)cluster deployments. • There are multiple solutions! 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 6. 6© Cloudera, Inc. All rights reserved. Multiple Multi- Solutions Using more than one cluster for an application. Using one cluster for more than one application. Using one machine to run [one or more] multi-node clusters. Multi-Cluster Multi-Tenant Multi-Container 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 7. 7© Cloudera, Inc. All rights reserved. Multi-Cluster Safety in numbers
  • 8. 8© Cloudera, Inc. All rights reserved. Multi-Cluster Deployments • Deploy multiple HBase cluster instances. • Motivation: • Isolating different workloads from each other. • Geographic disaster recovery, redundancy, and availability. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 9. 9© Cloudera, Inc. All rights reserved. Isolation • Isolation is usually done in were many apps share one data center. • Two different workloads on the same dataset. • Perform latency-sensitive workloads on the same set of data as analytic MR workload. • Two disjoint applications workloads and datasets. • Deploy OpenTSDB on HBase in same data center, but as cluster to monitor production HBase cluster. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 10. 10© Cloudera, Inc. All rights reserved. Isolation: Operational with Analytical access pattern HBase Client Get, Scan HBase Replication low latency Isolated from full scans high throughput MapReduce HBase Scanner HBase Client Put, Incr, Append Bulk Import HBase Client HBase Replication high throughput 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 11. 11© Cloudera, Inc. All rights reserved. Geographic Recovery, Redundancy, and Availability • Run multiple HBase clusters in multiple data centers. • Often using “Podding” schemes. • Primarily for backups of data in case data center outages. • Locality for Performance. • Locality for Compliance. • Availability while a datacenter is down. • Deploy with: • HBase replication - master master, master slave. • Multicluster clients. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 12. 12© Cloudera, Inc. All rights reserved. Master-Master Replication logs logs logs Replicating data reduces chances of data loss. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 13. 13© Cloudera, Inc. All rights reserved. HBase Multi-Cluster Client • High Availability with Eventual Consistency when using replication. • Simple implementation. • Hedged operations. If primary takes too long, go to the failover cluster. • Same HConnection interface just a different factory HConnectionManagerMultiClusterWrapper.get Connection(conf) • HBase.MCC to be available in Cloudera Labs. Work by Ted Malaska (Cloudera Solution Architect) https://github.com/tmalaska/HBase.MCC 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 14. 14© Cloudera, Inc. All rights reserved. Multi-Tenant We’re all in this together
  • 15. 15© Cloudera, Inc. All rights reserved. Multi-tenant deployments • Deploy multiple workloads on one cluster. • Motivation: • Better Resource utilization. • Cost efficiency. • Simpler operations. • Shared data. • Multiple services on one cluster. • Running HBase, Spark, Impala and MR on the same cluster. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 16. 16© Cloudera, Inc. All rights reserved. Security and namespaces • Challenges: • Resource management, prioritizing and fairness. • Authentication and Authorization. • Mechanisms: • HBase Security – Authentication, Authorization for commands via ACLs. • Namespaces – Isolate administrative domains for ACLs. • Proxy Impersonation – Thrift proxy doAs, and REST proxy doAs. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 17. 17© Cloudera, Inc. All rights reserved. Request Throttling • Idea: some tables or users get a limited budget of ops or throughput, while others do not. • Multiple workloads on one dataset. • Production/real-time user: unthrottled. • Analytic/adhoc workloads user: throttled. • Caveat: if all users throttled, we may not use all machine resources. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 18. 18© Cloudera, Inc. All rights reserved. Request Scheduling • Idea: gets should have high priority while scans should get deprioritized the more they are used (HBASE-10994). • Multiple workloads on one dataset . • Production real-time gets: immediately scheduled. • Analytic scan workloads: delay scheduled. • All resources are used. • Caveat: requires manual tuning . 1 1 2 1 1 3 1 1 1 21 1 31 Delayed by long scan requests Rescheduled so new request get priority 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 19. 19© Cloudera, Inc. All rights reserved. Performance Isolation inside a cluster • Region Server Groups (under review). • Limit performance impact load on one table has on others (HBASE-6721). • Multiple workloads on multiple data sets on one HBase cluster. • Two separate apps on one cluster. Mixed workload Isolated workload 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 20. 20© Cloudera, Inc. All rights reserved. • Today, the easiest strategy for isolating latency-sensitive HBase deployment from other services is static partitioning. • Future: • Improve IO isolation via YARN/Slider/Mesos. • Separate HBase actions into separate processes. • e.g. externalize compaction for better resource management. Service Isolation Yarn NM/MR HBase RS impalad HDFS DN Yarn NM/MR HBase RS impalad HDFS DN Yarn NM/MR HBase RS impalad HDFS DN Yarn NM/MR HBase RS impalad HDFS DN Yarn NM/MR HBase RS impalad HDFS DN Yarn NM/MR HBase RS impalad HDFS DN HBase RS HDFS DN Yarn NM/MR impalad HDFS DN HBase RS HDFS DN HBase RS HDFS DN Yarn NM/MR impalad HDFS DN Yarn NM/MR impalad HDFS DN Multi service deployment Statically partitioned service deployment 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 21. 21© Cloudera, Inc. All rights reserved. Multi-Container My name is Jonah
  • 22. 22© Cloudera, Inc. All rights reserved. Multi-container deployments • Run a distributed HBase cluster on a single host. • Testing applications. • Use cases requiring quick cluster stand-up. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 23. 23© Cloudera, Inc. All rights reserved. Linux containers • cgroups (2.6.24+). • Isolating resources (memory, CPU, networking). • Namespace isolation (filesystems, process trees). 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 24. 24© Cloudera, Inc. All rights reserved. Virtual Machines vs Linux Containers Hypervisor Host Operating System Guest OS Guest OS Guest OS Guest OS Libraries Libraries Libraries Libraries User processes User processes User processes User processes Virtual Machines Host Operating System Libraries User processes User processes User processes User processes Containers 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 25. 25© Cloudera, Inc. All rights reserved. Docker • User front-end for containers. • Container management (start, stop, pause). • docker run • Images (templates for containers). • docker commit • Registries (repository for images). • docker push 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 26. 26© Cloudera, Inc. All rights reserved. Integration testing • Automate long-running tests from hbase-it module. • $ hbase org.apache.hadoop.hbase.IntegrationTest… • Integration with fault injection framework (Chaos Monkey). 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 27. 27© Cloudera, Inc. All rights reserved. Starting container cluster DNS server dnsserver (10.0.0.2) Node node-1 (10.0.0.3) Node node-2 (10.0.0.4) Start cluster Master Slave Node node-3 (10.0.0.5) Slave Node node-4 (10.0.0.6) Slave 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 28. 28© Cloudera, Inc. All rights reserved. Automation • Replace fragile infrastructure. • Setup distributed cluster as part of test execution. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 29. 29© Cloudera, Inc. All rights reserved. In progress • Extend this workflow to upstream Apache HBase (HBASE-12721) • Upstream integration testing (builds.apache.org) • Multi-cluster use cases (e.g. MCC, replication) • Upgrades 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 30. 30© Cloudera, Inc. All rights reserved. Conclusions Multi multi multi
  • 31. 31© Cloudera, Inc. All rights reserved. Summary • Fancy table that summarizes our talk 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak Goal Multi Cluster Multi Tenant Multi-Container Isolate workloads One cluster per workload. Region Server Groups. cgroups.
  • 32. 32© Cloudera, Inc. All rights reserved. Summary • Fancy table that summarizes our talk 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak Goal Multi Cluster Multi Tenant Multi-Container Isolate workloads One cluster per workload. Region Server Groups. cgroups. Multiple workloads on same dataset (real-time vs analytic workload) Separate cluster per workload. Request throttling, request scheduling. Containers as “VMs” or microservices.
  • 33. 33© Cloudera, Inc. All rights reserved. Summary • Fancy table that summarizes our talk 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak Goal Multi Cluster Multi Tenant Multi-Container Isolate workloads One cluster per workload. Region Server Groups. cgroups. Multiple workloads on same dataset (real-time vs analytic workload) Separate cluster per workload. Request throttling, request scheduling. Containers as “VMs” or microservices. Reliability and Availability Disaster recovery, master-master replication, multi-cluster client. Multiple tables with Region Server Groups. More realistic testing.
  • 34. 34© Cloudera, Inc. All rights reserved. Summary • Fancy table that summarizes our talkGoal Multi Cluster Multi Tenant Multi-Container Isolate workloads One cluster per workload. Region Server Groups. cgroups. Multiple workloads on same dataset (real-time vs analytic workload) Separate cluster per workload. Request throttling, request scheduling. Containers as “VMs” or microservices. Reliability and Availability Disaster recovery, master-master replication, multi-cluster client. Multiple tables with Region Server Groups. More realistic testing. Cost Savings Disaster recovery. One cluster, multiple use cases. One machine, multiple nodes. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 35. 35© Cloudera, Inc. All rights reserved. Futures • We are seeing more and more deployments that are multi cluster and/or multi- tenant. • Traditional workflows are giving way to hybrid ones • More knobs to turn to optimize for performance and value • Multi-container deployments are a way forward to make prototyping and testing these deployments easier. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  • 36. 36© Cloudera, Inc. All rights reserved. Thank you!
  • 37. 37© Cloudera, Inc. All rights reserved. HBaseCon 2015 is Coming! Thurs., May 7, in San Francisco Presentations from the world’s biggest HBase operators: Bloomberg, Dropbox, eBay, Facebook, Google, Pinterest, Xiaomi, Yahoo!, more! Seats are limited; register at hbasecon.com Community Sponsor

Editor's Notes

  1. Hbase is a project that solves this problem. In a sentence, Hbase is an open source, distributed, sorted map modeled after Google’s BigTable. Open-source: Apache HBase is an open source project with an Apache 2.0 license. Distributed: HBase is designed to use multiple machines to store and serve data. Sorted Map: HBase stores data as a map, and guarantees that adjacent keys will be stored next to each other on disk. HBase is modeled after BigTable, a system that is used for hundreds of applications at Google.
  2. Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.