SlideShare a Scribd company logo
1 of 24
Apache HBase Internals
you Hoped you Never
Needed to Understand
Josh Elser
Future of Data, NYC
2016/10/11
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Engineer at Hortonworks, Member of the Apache Software Foundation
Top-Level Projects
• Apache Accumulo®
• Apache Calcite™
• Apache Commons ™
• Apache HBase ®
• Apache Phoenix ™
ASF Incubator
• Apache Fluo ™
• Apache Gossip ™
• Apache Pirk ™
• Apache Rya ™
• Apache Slider ™
These Apache project names are trademarks or registered
trademarks of the Apache Software Foundation.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache HBase for storing your data!
CC BY 3.0 US: http://hbase.apache.org/
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What happens when things go wrong?
CC BY-ND 2.0: https://www.flickr.com/photos/widnr/6588151679
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The BigTable Architecture
 BigTable’s architecture is simple
 Debugging a distributed system is not simple
 How can we break down a complex system?
 How do we write resilient software?
• Log-Structured Merge Tree
• Write-Ahead Logs
• Distributed Coordination
• Row-based, Auto-Sharding
• Strong Consistency
• Read Isolation
• Coprocessors
• Security (AuthN/AuthZ)
• Backups
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Naming Conventions
 Servers
– Hostname, Port, and Timestamp
– RegionServer: r01n01.domain.com,16201,1475691463147
– Master: r02n01.domain.com,16000,1475691462616
 Regions
– Table, Start RowKey, Region ID (timestamp), Replica ID, Encoded name
– T1,x04x00x00,1470324608597.c04d94cd4ee9797da2fb906b4dcd2e3c.
– Or simply c04d94cd4ee9797da2fb906b4dcd2e3c
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Regions
 A sorted “shard” of a table
 At least one “column family”
– Physical partitions
 Each family can have zero to many files
 Hosted by at most one RegionServer
– Can have many hosting RS’s for reads
 In-memory locks for certain intra-row operations
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Region Assignment
 Coordinated by the HBase Master
 A Region must only be hosted by one RegionServer
 State tracked in hbase:meta
– hbck to fix issues
 Region splits/merges make a hard problem even harder
 Moving towards ProcedureV2
Closed Offline Opening OpenPending Open
Normal Region Assignment States
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The File System
 HDFS “Compatible”
– Distributed, durable, ”write leases”
 Physical storage of HBase Tables (HFiles)
 Write-ahead logs
 A parent directory in that FileSystem (hbase.rootdir)
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The File System
Physical Separation by HBase Namespace
/hbase/data/
/hbase/data/default/<table1>
/hbase/data/default/.tabledesc/.tableinfo…
/hbase/data/default/<table2>/<region_id1>
/hbase/data/default/<table2>/<region_id2>
/hbase/data/my_custom_ns/<table3>/…
/hbase/data/hbase/meta/…
/hbase/archive/…
/hbase/WALs/<regionserver_name>/…
/hbase/oldWALs/…
/hbase/corrupt/…
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The File System for one Region
/hbase/data/default/<table2>/<region_id1>
…/.regioninfo
…/.tmp
…/<family1>/<hfile>
…/<family1>/<hfile>
…/<family2>/<hfile>
…/<family3>/<hfile>
…/recovered.edits/<number>.seqid
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Writes into HBase
 Mutations inserted into sorted in-memory structure and WAL
– Fast lookups of recent data
– Append-only log for durability and speed
 Mutations are collected by destination Region
 Beware of hot-spotting
 Data in memory eventually flush’ed into sorted (H)files
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Compactions and Flushes
 Flush: Taking Key-Values from the In-Memory map and creating an HFile
 Minor Compaction: Rewriting a subset of HFiles for a Region into one HFile
 Major Compaction: Rewriting all HFiles for a Region into one HFile
 Compactions balance improved query performance with cost of rewriting data
– Compactions are good!
– Must understand SLA’s to properly tune compactions
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reads into HBase
 Merge-Sort over multiple streams of data
– Memory
– Disk (many files)
 hbase:meta is the definitive source of where to find Regions
RowKey Region
hbase:meta
RegionServer
ZooKeeper
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache ZooKeeper™
 Distributed coordination is really hard
 Obvious use cases
– Service Discovery
– Cluster Membership
– “Root Table”
 Non-obvious use cases
– Assignment (sometimes)
– Region Recovery
– WAL Splitting
– Cluster Replication
– Distributed Procedures
– HBase Snapshots
Apache ZooKeeper is a trademark of the Apache Software Foundation
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache ZooKeeper™
 Discovery/Leader ZNodes
– /hbase/rs/…
– /hbase/master/…
– /hbase/backup-masters/…
 Consensus
– /hbase/splitWAL/…
– /hbase/flush-table-proc/...
– /hbase/table-lock/...
– /hbase/region-in-transition/...
– /hbase/recovering-regions/...
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Distributed Procedures
 Resiliency in an unreliable system
– How do we create a table?
 “Procedure V2”
– Resilient, finite state machine
 HBase operations represented as
”procedures”
 Clients are agnostic of Master state
– Clients track procedure state
https://issues.apache.org/jira/secure/attachment/12679960/ProcedureV2.pdf
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Distributed Procedures
 Procedures are durable via Write-Ahead Log
– /hbase/MasterProcWALs/…
 Procedures only executed by the active HBase Master
 Reusable framework for the future
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase RPCs
 Internal and External HBase
Communication
 Half-Sync/Half-Async Model
 Many knobs to tweak
 Listener
 Readers
 Scheduler
 Call Queues
 Call Runners/Handlers
Overview Components
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase RPCs
L
i
s
t
e
n
e
r
Reader
Reader
Reader
Reader
S
c
h
e
d
u
l
e
r
Call Queues Handlers
Priority
Read
Write
Replication
Request to Execution
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Disaster Recovery
 Multiple tools to ensure copies of data in the face of catastrophic failure
 CopyTable
– MapReduce job which reads all data from a source, writing to destination
 Snapshots
– A collection of Regions, their HFiles, and metadata
 Backup & Restore
– HBASE-7912, current targeted for HBase-2.0.0
– Incremental and full backup/restore
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kerberos
 Strong authentication for untrusted networks
 ”Standard” across Apache Hadoop and friends
 Requirements:
– Forward/Reverse DNS
– Unlimited Strength Java Cryptography Extension
 SASL used to build RPC systems
 “Practical Kerberos with Apache HBase” https://goo.gl/y0d9ZO
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Finding an Hypothesis
 Logs logs logs
 Application and System
 Metrics exposed by JMX
 Graphing solutions
– Ambari Metrics Server + Grafana
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
jelser@hortonworks.com / elserj@apache.org

More Related Content

What's hot

HBase state of the union
HBase   state of the unionHBase   state of the union
HBase state of the unionenissoz
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...DataWorks Summit/Hadoop Summit
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicasenissoz
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
 
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...Trieu Nguyen
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clustermas4share
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesHBaseCon
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_finalasterix_smartplatf
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clustersenissoz
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017larsgeorge
 
Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseNick Dimiduk
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 

What's hot (20)

HBase state of the union
HBase   state of the unionHBase   state of the union
HBase state of the union
 
Apache Hive on ACID
Apache Hive on ACIDApache Hive on ACID
Apache Hive on ACID
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
April 2014 HUG : Apache Phoenix
April 2014 HUG : Apache PhoenixApril 2014 HUG : Apache Phoenix
April 2014 HUG : Apache Phoenix
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_final
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
The Heterogeneous Data lake
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBase
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 

Viewers also liked

HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtMichael Stack
 
Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)tatsuya6502
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBaseCarol McDonald
 
HBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpHBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpFwardNetwork
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseMapR Technologies
 
Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)tatsuya6502
 

Viewers also liked (7)

HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)Apache HBase 入門 (第2回)
Apache HBase 入門 (第2回)
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 
HBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpHBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejp
 
Spark + HBase
Spark + HBase Spark + HBase
Spark + HBase
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBase
 
Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)Apache HBase 入門 (第1回)
Apache HBase 入門 (第1回)
 

Similar to Apache HBase Internals you hoped you Never Needed to Understand

HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseCloudera, Inc.
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016alanfgates
 
Hadoop and HBase in the Real World
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real WorldCloudera, Inc.
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0DataWorks Summit
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Ankit Singhal
 
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasHBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasDataWorks Summit
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Hadoop and h base in the real world
Hadoop and h base in the real worldHadoop and h base in the real world
Hadoop and h base in the real worldJoey Echeverria
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...Big Data Spain
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 

Similar to Apache HBase Internals you hoped you Never Needed to Understand (20)

HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016
 
Hadoop and HBase in the Real World
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real World
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasHBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region Replicas
 
ApacheCon-HBase-2016
ApacheCon-HBase-2016ApacheCon-HBase-2016
ApacheCon-HBase-2016
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Hadoop and h base in the real world
Hadoop and h base in the real worldHadoop and h base in the real world
Hadoop and h base in the real world
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 

More from Josh Elser

Effective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo IteratorsEffective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo IteratorsJosh Elser
 
Apache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 OverviewApache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 OverviewJosh Elser
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Josh Elser
 
Designing and Testing Accumulo Iterators
Designing and Testing Accumulo IteratorsDesigning and Testing Accumulo Iterators
Designing and Testing Accumulo IteratorsJosh Elser
 
Alternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java APIAlternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java APIJosh Elser
 
Data-Center Replication with Apache Accumulo
Data-Center Replication with Apache AccumuloData-Center Replication with Apache Accumulo
Data-Center Replication with Apache AccumuloJosh Elser
 
RPInventory 2-25-2010
RPInventory 2-25-2010RPInventory 2-25-2010
RPInventory 2-25-2010Josh Elser
 

More from Josh Elser (7)

Effective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo IteratorsEffective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo Iterators
 
Apache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 OverviewApache Accumulo 1.8.0 Overview
Apache Accumulo 1.8.0 Overview
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
 
Designing and Testing Accumulo Iterators
Designing and Testing Accumulo IteratorsDesigning and Testing Accumulo Iterators
Designing and Testing Accumulo Iterators
 
Alternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java APIAlternatives to Apache Accumulo’s Java API
Alternatives to Apache Accumulo’s Java API
 
Data-Center Replication with Apache Accumulo
Data-Center Replication with Apache AccumuloData-Center Replication with Apache Accumulo
Data-Center Replication with Apache Accumulo
 
RPInventory 2-25-2010
RPInventory 2-25-2010RPInventory 2-25-2010
RPInventory 2-25-2010
 

Recently uploaded

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 

Recently uploaded (20)

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 

Apache HBase Internals you hoped you Never Needed to Understand

  • 1. Apache HBase Internals you Hoped you Never Needed to Understand Josh Elser Future of Data, NYC 2016/10/11
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Engineer at Hortonworks, Member of the Apache Software Foundation Top-Level Projects • Apache Accumulo® • Apache Calcite™ • Apache Commons ™ • Apache HBase ® • Apache Phoenix ™ ASF Incubator • Apache Fluo ™ • Apache Gossip ™ • Apache Pirk ™ • Apache Rya ™ • Apache Slider ™ These Apache project names are trademarks or registered trademarks of the Apache Software Foundation.
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache HBase for storing your data! CC BY 3.0 US: http://hbase.apache.org/
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What happens when things go wrong? CC BY-ND 2.0: https://www.flickr.com/photos/widnr/6588151679
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The BigTable Architecture  BigTable’s architecture is simple  Debugging a distributed system is not simple  How can we break down a complex system?  How do we write resilient software? • Log-Structured Merge Tree • Write-Ahead Logs • Distributed Coordination • Row-based, Auto-Sharding • Strong Consistency • Read Isolation • Coprocessors • Security (AuthN/AuthZ) • Backups
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Naming Conventions  Servers – Hostname, Port, and Timestamp – RegionServer: r01n01.domain.com,16201,1475691463147 – Master: r02n01.domain.com,16000,1475691462616  Regions – Table, Start RowKey, Region ID (timestamp), Replica ID, Encoded name – T1,x04x00x00,1470324608597.c04d94cd4ee9797da2fb906b4dcd2e3c. – Or simply c04d94cd4ee9797da2fb906b4dcd2e3c
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Regions  A sorted “shard” of a table  At least one “column family” – Physical partitions  Each family can have zero to many files  Hosted by at most one RegionServer – Can have many hosting RS’s for reads  In-memory locks for certain intra-row operations
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Region Assignment  Coordinated by the HBase Master  A Region must only be hosted by one RegionServer  State tracked in hbase:meta – hbck to fix issues  Region splits/merges make a hard problem even harder  Moving towards ProcedureV2 Closed Offline Opening OpenPending Open Normal Region Assignment States
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The File System  HDFS “Compatible” – Distributed, durable, ”write leases”  Physical storage of HBase Tables (HFiles)  Write-ahead logs  A parent directory in that FileSystem (hbase.rootdir)
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The File System Physical Separation by HBase Namespace /hbase/data/ /hbase/data/default/<table1> /hbase/data/default/.tabledesc/.tableinfo… /hbase/data/default/<table2>/<region_id1> /hbase/data/default/<table2>/<region_id2> /hbase/data/my_custom_ns/<table3>/… /hbase/data/hbase/meta/… /hbase/archive/… /hbase/WALs/<regionserver_name>/… /hbase/oldWALs/… /hbase/corrupt/…
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The File System for one Region /hbase/data/default/<table2>/<region_id1> …/.regioninfo …/.tmp …/<family1>/<hfile> …/<family1>/<hfile> …/<family2>/<hfile> …/<family3>/<hfile> …/recovered.edits/<number>.seqid
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Writes into HBase  Mutations inserted into sorted in-memory structure and WAL – Fast lookups of recent data – Append-only log for durability and speed  Mutations are collected by destination Region  Beware of hot-spotting  Data in memory eventually flush’ed into sorted (H)files
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Compactions and Flushes  Flush: Taking Key-Values from the In-Memory map and creating an HFile  Minor Compaction: Rewriting a subset of HFiles for a Region into one HFile  Major Compaction: Rewriting all HFiles for a Region into one HFile  Compactions balance improved query performance with cost of rewriting data – Compactions are good! – Must understand SLA’s to properly tune compactions
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Reads into HBase  Merge-Sort over multiple streams of data – Memory – Disk (many files)  hbase:meta is the definitive source of where to find Regions RowKey Region hbase:meta RegionServer ZooKeeper
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache ZooKeeper™  Distributed coordination is really hard  Obvious use cases – Service Discovery – Cluster Membership – “Root Table”  Non-obvious use cases – Assignment (sometimes) – Region Recovery – WAL Splitting – Cluster Replication – Distributed Procedures – HBase Snapshots Apache ZooKeeper is a trademark of the Apache Software Foundation
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache ZooKeeper™  Discovery/Leader ZNodes – /hbase/rs/… – /hbase/master/… – /hbase/backup-masters/…  Consensus – /hbase/splitWAL/… – /hbase/flush-table-proc/... – /hbase/table-lock/... – /hbase/region-in-transition/... – /hbase/recovering-regions/...
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Distributed Procedures  Resiliency in an unreliable system – How do we create a table?  “Procedure V2” – Resilient, finite state machine  HBase operations represented as ”procedures”  Clients are agnostic of Master state – Clients track procedure state https://issues.apache.org/jira/secure/attachment/12679960/ProcedureV2.pdf
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Distributed Procedures  Procedures are durable via Write-Ahead Log – /hbase/MasterProcWALs/…  Procedures only executed by the active HBase Master  Reusable framework for the future
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBase RPCs  Internal and External HBase Communication  Half-Sync/Half-Async Model  Many knobs to tweak  Listener  Readers  Scheduler  Call Queues  Call Runners/Handlers Overview Components
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBase RPCs L i s t e n e r Reader Reader Reader Reader S c h e d u l e r Call Queues Handlers Priority Read Write Replication Request to Execution
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Disaster Recovery  Multiple tools to ensure copies of data in the face of catastrophic failure  CopyTable – MapReduce job which reads all data from a source, writing to destination  Snapshots – A collection of Regions, their HFiles, and metadata  Backup & Restore – HBASE-7912, current targeted for HBase-2.0.0 – Incremental and full backup/restore
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kerberos  Strong authentication for untrusted networks  ”Standard” across Apache Hadoop and friends  Requirements: – Forward/Reverse DNS – Unlimited Strength Java Cryptography Extension  SASL used to build RPC systems  “Practical Kerberos with Apache HBase” https://goo.gl/y0d9ZO
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Finding an Hypothesis  Logs logs logs  Application and System  Metrics exposed by JMX  Graphing solutions – Ambari Metrics Server + Grafana
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You jelser@hortonworks.com / elserj@apache.org

Editor's Notes

  1. Architecture wise: BigTable as a system is well understood and simple. A decade since the paper. Distributed systems are complex! Easier to reason about if we consider them as smaller units.
  2. Important to be able to grep! Know what to look for. DNS important to make sure consistent naming across all nodes.
  3. HBase needs a distributed a resilient filesystem (see also Azure tech). Data that is written+sync’ed must be present! Relies on one-writer per file (hdfs leases) HBase Tables: Not just Key-Values (hfiles) but also serialized table metadata. WALs durabilty is key here
  4. /hbase/data = All table data /hbase/archive = Hfiles before deletion /hbase/WALs = Write-ahead logs /hbase/oldWALs = WALs before deletion /hbase/corrupt = Corrupt WALs
  5. .regioninfo = metadata about this region .tmp = general temporary space (compactions) recovered.edits = artifact of WAL recovery
  6. Compactions == fewer files, more efficient lookups
  7. “What happens when meta is unassigned?”
  8. ZooKeeper provides authentication and authorization as well (for HBase, no auth or Kerberos auth via SASL). ACLs are used to prevent users from changing sensitive data in ZK – only HBase nodes can change them.
  9. Resilience is hard. How do we make sure that an operation will succeed if servers fail? How do we determine between previous failed attempts and users trying to concurrently perform the same operation Table creation: unique name, directories in HDFS, create intial region in HDFS, update meta, enable the table, etc.
  10. ProcV2 implementation is tricky/complicated, but provides an internal API to make operations easy to implement and reason about in the future. Easy to inspect state. Model is proven in Accumulo’s FATE
  11. Lots of knobs because we want to be able to optimize things like throughput, latency, and fairness, which are often mutually exclusive
  12. Listener does Socket accept, dispatches to Readers. Readers read a number of bytes off the wire (the Selector channel). Sends the deserialized request to the Scheduler which gets it placed on a call queue, which a handler will eventually process.
  13. Aka “you dun goofed up” CopyTable – slow, requires src and destination to be up. Not really.. Desirable Snapshots – Great for one off’s. Can grow DFS usage though. Requires coordination of a flush for full backup B&R – Snapshots with ability track WALs for incremental backups since last full backup
  14. Brutally-sparse Kerberos talk
  15. JMX – JvisualVM, hbase web Uis, hadoop metrics 2 sink (AMS)