Speakers: Dheeraj Kapur, Rajiv Chittajallu & Anish Mathew (Yahoo!)
In early 2013, Yahoo! introduced multi-tenancy to HBase to offer it as a platform service for all Hadoop users. A certain degree of customization per tenant (a user or a project) was achieved through RegionServer groups, namespaces, and customized configs for each tenant. This talk covers how to accommodate diverse needs to individual tenants on the cluster, as well as operational tips and techniques that allow Yahoo! to automate the management of multi-tenant clusters at petabyte scale without errors.
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
1. Harmonizing Multi-tenant HBase Clusters for
Managing Workload Diversity
PRESENTED BY Dheeraj Kapur, Rajiv Chittajallu, Anish Mathew⎪ May 5, 2014
2. Agenda
Topic Speaker(s)
Overview of Hadoop stack and Grid Infrastructure at Yahoo Rajiv Chittajallu
Application onboarding on Multi-Tenant HBase Dheeraj Kapur
Automation for Compaction/Splits and Monitoring Anish Mathew
Q&A All Presenters
4. Hadoop Usage at Yahoo
HBaseCon 2014
Browsers
Mobile Devices
Web Crawl
Knowledge
Graph
3rd Party
Yahoo Grid
Business Intelligence
Tools
(e.g. Tableau,
MicroStrategy)
Data
Collection
Asynchronous Data
Processing
Synchronous Serving
User
Events
WCC
Entity Feeds
Content Feeds
Source of truth for data*
Serving Systems
Home
Run
Search Mail
Mobile Flickr Media
Stream
Ads
Native
Ads
Display
Ads
Content
systems
Y!
NoSql
…
5. Grid Infrastructure at Yahoo
HBaseCon 2014
A multi-tenant, secure, distributed compute and storage
environment, based on Hadoop stack for large scale data
processing
10. Hbase @ Yahoo
HBaseCon 2014
• 7 clusters, 1500 region servers, 6 PB of data
• Diverse use cases, 500+ Tables, 100k regions
• Rolling Major compaction & Split and Group Rebalancing
• RegionServer groups, namespaces and multi region config System
11. Challenges
HBaseCon 2014
• Customer onboarding and provisioning
• Access management and Table provisioning
• Deployments
• Customizing group configs
• Rolling Major Compaction and Splits
• Group Balancing
16. Customer Onboarding & Provisioning
HBaseCon 2014
• Two identical environments (Prod and Non-Prod)
• Applications are on boarded to Non-Prod for
performance/Integration testing
• Once ready, provisioned on prod
• Performance results help in production onboarding
17. Namespaces
HBaseCon 2014
• Allow tenants to create/drop/modify their own tables
• Only super admin used to do it before
• Quota Management
• Security administration
• Commands : alter_namespace, create_namespace, describe_namespace,
drop_namespace, list_namespace, list_namespace_tables
18. RegionServer Groups
HBaseCon 2014
• Missing QoS in Hbase 0.94
• Isolation is required in Multi-tenant env
• Multi configs are required for different apps
• Commands : group_add, group_balance, group_get, group_list,
group_list_tables, group_list_transitions, group_move_servers,
group_move_tables, group_of_server, group_of_table, group_remove
19. Multi Region Configs
HBaseCon 2014
SVN Jenkins Build
Farm
Master Repository
Slave Repository
Colo B
Slave Repository
Colo A
HBase Cluster A
HBase Cluster B
Fetch Group List
Generate Multi Configs
Merge Default
Config & Push
multi config
Sync
Configs Download
Host Maps
and Multi
Region Config
20. Compaction
HBaseCon 2014
• Minor & Major
• Minor picks up couple of smaller files and rewrite as one
• Major drop deletes or expire cells and picks up all files and rewrite
as one
24. Managed Compactions and Splits
HBaseCon 2014
• Flexible Scheduling
• Custom Logic per table and workload
25. Compaction and Splits Scheduler
HBaseCon 2014
Metrics
Mysql
Metrics
Analyze
Region
Specific
Metrics
Server
Metrics
Scheduling
Parameters HBaseCtl:
Scheduler
HDFS
Publish
HBase Cluster A
HBase Cluster B
HBase Cluster CUpdate Compaction/Split Statistics
Zookeeper
Coordination &
Intermediate Store
26. Group Balancing
• Scheduled group balance followed by rolling major compaction
• Based on Data Locality
– Find data locality of each block of store files
– Move region to server where the maximum blocks are located
• Helps after cluster upgrades and restarts
• After config changes for a region group
HBaseCon 2014
30. Monitoring Cont.. ( Metrics for Customers)
HBaseCon 2014
Simon
System
Other
Systems for
Analysis &
Reporting
Jenkins Job :
Merges and
Formats Metrics
HBase
HBase
Master
HDFS
Master
Grid
Snodes
Customer
Dashboards
Upload data
to HDFS
Memory Dump
from Master
Region Server
Metrics
Push compiled
metrics to
snodes
Fetch metrics
31. Monitoring cont.. ( OpenTSDB )
HBaseCon 2014
• Evaluating
• Work required to make is production ready at Yahoo