SlideShare a Scribd company logo
1 of 18
Introduction to HBase
Anil Gupta
@bigdatanoob
What is NoSql?
RDBMS vs NoSql
HBase
HBase Components
Architecture
HBase Cluster
HBase Data Model
Key -> Value
Region
Outline
NoSQL is acronym for Not Only SQL. These databases are
non-relational. This term was coined in 1998.
They do not use SQL as their primary language.
NoSQL is not a replacement of Relational
Database.
NoSQL is designed for distributed data stores
NoSQL was designed to store semi-structured
and sparse data
NoSQL RDBMS
Hardware Farm of Commodity(upto
several thousand)
1-3 High End or
Proprietary(costly)
Data Type Semi-structured and
Sparse
Structured and dense
Data Size PetaBytes(1015) TeraBytes(1012 bytes)
Auto-Sharding Yes No
Flexible Schema Yes No
Referential Integrity No Yes
Support for Joins No Yes
Support for Aggregations Basic Advance
HBase is an open-source, distributed, versioned,
key-value database modeled after Google's
Bigtable.
is optional for
HBase has real-time read/writes(in milliseconds)
HBase is highly fault tolerant(HA) and scalable
+ Random Read/Write
access= + Apache
Zookeeper
Selling Points of HBase
Highly Scalable
Auto-sharding
Strongly Consistent
Out of the box support for Historical Data
Very high read throughput
Readily compatible with Hadoop
Highly Fault-tolerant(HA)
HBase Components
1. HBase Master(HMaster): HMaster is the
Master Server.
 HMaster is responsible for monitoring all
RegionServers
 Performs load balancing a.k.a sharding
 Assigns regions to RegionServers
 All the metadata changes go through Master
 Periodically checks and cleans up the .META.
table
 Multiple HMaster can run in cluster but only one
HMaster will be active at any time.
HBase Components(cont.)
2. RegionServer(HRegionServer):
HRegionServer is the implementation of the
worker module.
 Runs as Java Service on worker nodes.
 Machine running a RegionServer is considered
a worker node.
 Serves get/put/scan requests
 Responsible for splitting and compacting regions
 Runs on DataNode
 Multiple RegionServers run in a cluster
Zookeeper in HBase
ZooKeeper: It allows distributed processes to
coordinate with each other through a shared
hierarchical name space. It is distributed and
highly reliable service.
In HBase it is responsible for following:
 Provide availability status of RegionServers
 To ensure single active HMaster in the cluster
 Provide location of “-ROOT-” table
 Selection of new HMaster in case of failure of
an active HMaster
HBase Architecture
HBase Cluster
Worker
Node
Worker
Node
Worker Node
DataNodeDataNode
TaskTracker
HRegionServe
r
DataNode
TaskTracker
HRegionServe
r
Worker Node
DataNode
Worker Node
DataNode
RegionServer
Worker Node
DataNode
Worker Node
DataNode
Worker Node
DataNode
Worker Node
DataNode
Name
Node
HMaster
Zoo
keeper
HMaster
RegionServer
RegionServer
RegionServerRegionServer
RegionServerRegionServer
Name
Node
Column Family and Column Qualifier
Column Family: Columns Qualifiers in HBase are grouped
into column families.
The colon character (:) delimits the column qualifier family
from the column family.
Combination of <Column Family>: <Column Qualifier> is
equivalent to a Column name.
Physically, all column qualifiers of a column family are stored
together on the file system.
• Column Qualifiers within a family are sorted lexicographically and
stored together
Example: txn:amt , Here “txn” is the Column Family and “amt” is
the Column Qualifier.
HBase Data Model
• Table maintains data in lexicographic order by RowKey.
• Everything except table names are stored as byte array
• Only column families are defined at the creation time of table
 Each family can have any number of columns(to a
maximum of few millions)
 Each row can have different columns in a column family
 Each column consists of any number of versions
 Columns only exist when inserted because HBase does
not have NULL values
(RowKey, Column Family:Column Qualifier,
Timestamp) is a “Key” in HBase.
“Value” is stored corresponding to a “Key”
Timestamp is used to support storing of Historical
Data
Table is always indexed on RowKey
Key -> Value in HBase
Region
Tables in HBase are divided into multiple Regions.
1 Region = 1 Partition of Table
Regions are hosted by RegionServers
1 RegionServer can host 100’s of Regions
RegionServer can host Regions from multiple
tables.
After a major compaction, every region has 1 HFile
for each column family.
Random Facts About
HBase
Data in HBase is stored in HFile Format
Values are stored as Byte Array in HFiles
HLog is the file format used for storing “Write
Ahead Logging” in HBase.
References
http://hbase.apache.org/
https://hadoop.apache.org/
http://www.larsgeorge.com/2009/10/hbase-
architecture-101-storage.html
Questions?

More Related Content

What's hot (20)

Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Hadoop
HadoopHadoop
Hadoop
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
6.hive
6.hive6.hive
6.hive
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 
Scaling HBase for Big Data
Scaling HBase for Big DataScaling HBase for Big Data
Scaling HBase for Big Data
 
MongoDB
MongoDBMongoDB
MongoDB
 

Viewers also liked

Viewers also liked (6)

HiveServer2
HiveServer2HiveServer2
HiveServer2
 
HiveServer2 for Apache Hive
HiveServer2 for Apache HiveHiveServer2 for Apache Hive
HiveServer2 for Apache Hive
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
GFS
GFSGFS
GFS
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 

Similar to Introduction To HBase

CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPERKrishnaVeni451953
 
Hbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsHbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsRavindra kumar
 
Hbase introduction
Hbase introductionHbase introduction
Hbase introductionyangwm
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseRishabh Dugar
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 

Similar to Introduction To HBase (20)

HBASE Overview
HBASE OverviewHBASE Overview
HBASE Overview
 
Hbase
HbaseHbase
Hbase
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Hbase.pptx
Hbase.pptxHbase.pptx
Hbase.pptx
 
Hbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsHbase Quick Review Guide for Interviews
Hbase Quick Review Guide for Interviews
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Hbase introduction
Hbase introductionHbase introduction
Hbase introduction
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
01 hbase
01 hbase01 hbase
01 hbase
 
Big data hbase
Big data hbase Big data hbase
Big data hbase
 
Hbase
HbaseHbase
Hbase
 
Hbase
HbaseHbase
Hbase
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 

Recently uploaded

Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics
 

Recently uploaded (20)

Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024
 

Introduction To HBase

  • 1. Introduction to HBase Anil Gupta @bigdatanoob
  • 2. What is NoSql? RDBMS vs NoSql HBase HBase Components Architecture HBase Cluster HBase Data Model Key -> Value Region Outline
  • 3. NoSQL is acronym for Not Only SQL. These databases are non-relational. This term was coined in 1998. They do not use SQL as their primary language. NoSQL is not a replacement of Relational Database. NoSQL is designed for distributed data stores NoSQL was designed to store semi-structured and sparse data
  • 4. NoSQL RDBMS Hardware Farm of Commodity(upto several thousand) 1-3 High End or Proprietary(costly) Data Type Semi-structured and Sparse Structured and dense Data Size PetaBytes(1015) TeraBytes(1012 bytes) Auto-Sharding Yes No Flexible Schema Yes No Referential Integrity No Yes Support for Joins No Yes Support for Aggregations Basic Advance
  • 5. HBase is an open-source, distributed, versioned, key-value database modeled after Google's Bigtable. is optional for HBase has real-time read/writes(in milliseconds) HBase is highly fault tolerant(HA) and scalable + Random Read/Write access= + Apache Zookeeper
  • 6. Selling Points of HBase Highly Scalable Auto-sharding Strongly Consistent Out of the box support for Historical Data Very high read throughput Readily compatible with Hadoop Highly Fault-tolerant(HA)
  • 7. HBase Components 1. HBase Master(HMaster): HMaster is the Master Server.  HMaster is responsible for monitoring all RegionServers  Performs load balancing a.k.a sharding  Assigns regions to RegionServers  All the metadata changes go through Master  Periodically checks and cleans up the .META. table  Multiple HMaster can run in cluster but only one HMaster will be active at any time.
  • 8. HBase Components(cont.) 2. RegionServer(HRegionServer): HRegionServer is the implementation of the worker module.  Runs as Java Service on worker nodes.  Machine running a RegionServer is considered a worker node.  Serves get/put/scan requests  Responsible for splitting and compacting regions  Runs on DataNode  Multiple RegionServers run in a cluster
  • 9. Zookeeper in HBase ZooKeeper: It allows distributed processes to coordinate with each other through a shared hierarchical name space. It is distributed and highly reliable service. In HBase it is responsible for following:  Provide availability status of RegionServers  To ensure single active HMaster in the cluster  Provide location of “-ROOT-” table  Selection of new HMaster in case of failure of an active HMaster
  • 11. HBase Cluster Worker Node Worker Node Worker Node DataNodeDataNode TaskTracker HRegionServe r DataNode TaskTracker HRegionServe r Worker Node DataNode Worker Node DataNode RegionServer Worker Node DataNode Worker Node DataNode Worker Node DataNode Worker Node DataNode Name Node HMaster Zoo keeper HMaster RegionServer RegionServer RegionServerRegionServer RegionServerRegionServer Name Node
  • 12. Column Family and Column Qualifier Column Family: Columns Qualifiers in HBase are grouped into column families. The colon character (:) delimits the column qualifier family from the column family. Combination of <Column Family>: <Column Qualifier> is equivalent to a Column name. Physically, all column qualifiers of a column family are stored together on the file system. • Column Qualifiers within a family are sorted lexicographically and stored together Example: txn:amt , Here “txn” is the Column Family and “amt” is the Column Qualifier.
  • 13. HBase Data Model • Table maintains data in lexicographic order by RowKey. • Everything except table names are stored as byte array • Only column families are defined at the creation time of table  Each family can have any number of columns(to a maximum of few millions)  Each row can have different columns in a column family  Each column consists of any number of versions  Columns only exist when inserted because HBase does not have NULL values
  • 14. (RowKey, Column Family:Column Qualifier, Timestamp) is a “Key” in HBase. “Value” is stored corresponding to a “Key” Timestamp is used to support storing of Historical Data Table is always indexed on RowKey Key -> Value in HBase
  • 15. Region Tables in HBase are divided into multiple Regions. 1 Region = 1 Partition of Table Regions are hosted by RegionServers 1 RegionServer can host 100’s of Regions RegionServer can host Regions from multiple tables. After a major compaction, every region has 1 HFile for each column family.
  • 16. Random Facts About HBase Data in HBase is stored in HFile Format Values are stored as Byte Array in HFiles HLog is the file format used for storing “Write Ahead Logging” in HBase.