SlideShare a Scribd company logo
1 of 21
HBASE
THE SCALABLE DATA STORE
Sampath Rachakonda
Agenda
 Evolution of HBASE
 Overview
 Data Model
 Architecture
 Hbase and Zookeper
Evolution of HBASE
 File-Systems
 Tapes → Linear Access or Sequential Access.
 Disc → Random Access
Seek Time
Transfer Rate
 DBMS
 RDBMS
 Now NOSQL
Hadoop
 It comprises mainly two
things HDFS and
MapReduce.
 HDFS is scalable, fault
tolerant, and high
performance DFS can
run on commodity
hardware.
 Map-Reduce is software
framework for
distributed computation.
 Master/Slave
`
 Limitations
Batch processing
Sequential Data
look-up
Not intended for
real time querying
No Support for
Random Access
NOSQL
 Massive Data Volumes
 Schema Evolution
As it is almost impossible for fixed Schema for
web scale database.
With NOSQL Schema changes can be gradually
introduced into systems.
 Extreme Query Load
Bottleneck is Joins
Why HBASE ?
 Column-Oriented Stores
 Distributed – Designed to serve large tables
 Horizontally Scalable
 High Performance & Availability
 Storage System
 The base goal of HBASE is Billions of Rows,
Millions of Columns and Thousand of versions
 Supports random real time CRUD operations unlike
HDFS
Who uses Hbase ?
 Facebook
 Adobe
 Twitter
 Yahoo
 Meetup
 Netflix
 Many More..
When to use HBASE ?
 Good for large amounts of data
100's of millions or billions of rows
Have to have enough hardware Large Amounts of
client requests
Single Random Selects and range scans by key
Great for variable schema
Analytical
HBASE Data Model
 Data is stored in Tables
 Tables contain rows
Rows are referenced by Unique key
Key is array of bytes anything can be a key.
 Rows made of columns are grouped in column
families
Data is stored in cells and identified by row x column-family x
column
 Tables are sorted by the row key in lexicographical
order.
HBASE Families
 Rows are grouped as families
Labeled as “Family:column”
 Example: “user:name”
Different features are applied to families
 Stored together – HFile/StoreFile
 Compression
 Table Schema defines its Column Families
Each family can consist of any number of columns
and Versions
Column exists when inserted, NULLS are free.
Columns with family are sorted and stored
together.
HBASE Timestamps
 Cells Values are versioned and 3 versions are kept
by default.
 Versions are stored in decreasing time-stamp order.
 Reads the latest first – which will be our current
value.
 Value will be
Value = Table + RowKey + Family + Column +
TimeStamp
 Index will be always unique
HBASE Cells Example
 Example of how values are stored
Row Key Time stamp Name Family Address Family
first_name last_name number address
row1 t1 Bob Smith
t5 10 First Lane
t10 30 Other Lane
t15 7 Last Street
row2 t20 Mary Tompson
t22 77 One Street
t30 Thompson
HBASE Architecture
 Table is made up of regions
 Region is a range of rows sorted together
Dynamically splits as they become too big and
merge when they are too small
 Master Server is responsible for managing Hbase
cluster (i.e.., Region Servers)
 Hbase stores its data into HDFS which makes to rely
it on high tolerant and high availability and fault
tolerance features.
 Zookeper is used for distributed coordination.
HBASE Architecture
 As Follows:
HBASE Regions
 Region is a range of keys start key to end key
exclusive
 Initially there will be one region as addition of data
exceed the configured maximum (256 MB default)
the region will be split
 No of regions per server varies from 10 to 10000 as
per hardware per region server.
 Splitting data into regions help us in different ways:
Fast Recovery when a region fails
Load Balancing when a server overloaded
Splitting is fast
HBASE Data Storage
 When data is added it will be written on to WAL
(Write Ahead Log) and also in memory (Memstore)
 When the data exceeds maximum value then it is
flushed out of WAL to HFile
 RegionServer still serves read-writes during the
flush operations, writing values to WAL &
Memstore.
 Hfile is nothing much than a Key-Value map.
 As HDFS doesn't support updates to an existing file
therefore HFiles are immutable.
 Delete Marker is saved to indicate whether record is
available or removed.
HBASE Data Storage(Contd.)
 Periodic Data Computations are performed to
control no of Hfiles and to keep cluster balanced
Minor Complication:
 Smaller Hfiles are merged into larger Hfiles
Fast as data is already sorted
Delete Markers are not applied
Major Complication:
 Scanning for all the entries and apply deletes as
necessary
 Merge all Hfiles of a region into a single file lies
within a column family
HBASE Master
 Manages Regions and their locations
Assigns Regions
Balances workload
Recovers if any region server is unavailable
Uses Zookeeper for distributed coordination
service
 Clients directly communicate with Region Servers
 Performs Schema Management and changes
Adding/Removing tables and Column Families
HBASE and Zookeeper
 HBASE uses zookeeper for region assignments
 Zookeeper is a centralized server for maintaining
configuration information, Naming, Providing
distributed synchronization, and providing group
service.
 File like API, performs operations on directories and
files (Znodes)
 Clients connect with a session to zookeeper
Session is maintained via Heart-Beat
Clients listening for updates will be notified of the
deleted nodes and new nodes.
HBASE and Zookeeper(Contd.)
 Each region server creates a Ephemeral Node.
Master monitors these nodes to discover available
region servers and for server failures.
 Use Zookeeper to make sure that only one master is
registered
 HBASE cannot exist in distributed without
Zookeeper.
HBASE Access
 Hbase Shell
 Native JAVA API
Fastest and very capable options.
 Avro Server
Requires running Avro Server.
 Hbql
SQL like syntax for HBASE

More Related Content

What's hot

Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
Takrim Ul Islam Laskar
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented database
Kanike Krishna
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 

What's hot (20)

HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
6.hive
6.hive6.hive
6.hive
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Apache HBase - Just the Basics
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the Basics
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented database
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 

Viewers also liked

Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
ArangoDB Database
 

Viewers also liked (18)

NoSQL & HBase overview
NoSQL & HBase overviewNoSQL & HBase overview
NoSQL & HBase overview
 
TriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in ProductionTriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in Production
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
 
Hbase at Salesforce.com
Hbase at Salesforce.comHbase at Salesforce.com
Hbase at Salesforce.com
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Apache hbase overview (20160427)
Apache hbase overview (20160427)Apache hbase overview (20160427)
Apache hbase overview (20160427)
 
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
Hbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesHbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databases
 

Similar to HBASE Overview

Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
Cloudera, Inc.
 
Hbase introduction
Hbase introductionHbase introduction
Hbase introduction
yangwm
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Yiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
yongboy
 

Similar to HBASE Overview (20)

Hbase
HbaseHbase
Hbase
 
Hbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsHbase Quick Review Guide for Interviews
Hbase Quick Review Guide for Interviews
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use Cases
 
Hbase
HbaseHbase
Hbase
 
Hbase.pptx
Hbase.pptxHbase.pptx
Hbase.pptx
 
Hbase
HbaseHbase
Hbase
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
Big data hbase
Big data hbase Big data hbase
Big data hbase
 
Hbase introduction
Hbase introductionHbase introduction
Hbase introduction
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Hbase
HbaseHbase
Hbase
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
01 hbase
01 hbase01 hbase
01 hbase
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 

HBASE Overview

  • 1. HBASE THE SCALABLE DATA STORE Sampath Rachakonda
  • 2. Agenda  Evolution of HBASE  Overview  Data Model  Architecture  Hbase and Zookeper
  • 3. Evolution of HBASE  File-Systems  Tapes → Linear Access or Sequential Access.  Disc → Random Access Seek Time Transfer Rate  DBMS  RDBMS  Now NOSQL
  • 4. Hadoop  It comprises mainly two things HDFS and MapReduce.  HDFS is scalable, fault tolerant, and high performance DFS can run on commodity hardware.  Map-Reduce is software framework for distributed computation.  Master/Slave `  Limitations Batch processing Sequential Data look-up Not intended for real time querying No Support for Random Access
  • 5. NOSQL  Massive Data Volumes  Schema Evolution As it is almost impossible for fixed Schema for web scale database. With NOSQL Schema changes can be gradually introduced into systems.  Extreme Query Load Bottleneck is Joins
  • 6. Why HBASE ?  Column-Oriented Stores  Distributed – Designed to serve large tables  Horizontally Scalable  High Performance & Availability  Storage System  The base goal of HBASE is Billions of Rows, Millions of Columns and Thousand of versions  Supports random real time CRUD operations unlike HDFS
  • 7. Who uses Hbase ?  Facebook  Adobe  Twitter  Yahoo  Meetup  Netflix  Many More..
  • 8. When to use HBASE ?  Good for large amounts of data 100's of millions or billions of rows Have to have enough hardware Large Amounts of client requests Single Random Selects and range scans by key Great for variable schema Analytical
  • 9. HBASE Data Model  Data is stored in Tables  Tables contain rows Rows are referenced by Unique key Key is array of bytes anything can be a key.  Rows made of columns are grouped in column families Data is stored in cells and identified by row x column-family x column  Tables are sorted by the row key in lexicographical order.
  • 10. HBASE Families  Rows are grouped as families Labeled as “Family:column”  Example: “user:name” Different features are applied to families  Stored together – HFile/StoreFile  Compression  Table Schema defines its Column Families Each family can consist of any number of columns and Versions Column exists when inserted, NULLS are free. Columns with family are sorted and stored together.
  • 11. HBASE Timestamps  Cells Values are versioned and 3 versions are kept by default.  Versions are stored in decreasing time-stamp order.  Reads the latest first – which will be our current value.  Value will be Value = Table + RowKey + Family + Column + TimeStamp  Index will be always unique
  • 12. HBASE Cells Example  Example of how values are stored Row Key Time stamp Name Family Address Family first_name last_name number address row1 t1 Bob Smith t5 10 First Lane t10 30 Other Lane t15 7 Last Street row2 t20 Mary Tompson t22 77 One Street t30 Thompson
  • 13. HBASE Architecture  Table is made up of regions  Region is a range of rows sorted together Dynamically splits as they become too big and merge when they are too small  Master Server is responsible for managing Hbase cluster (i.e.., Region Servers)  Hbase stores its data into HDFS which makes to rely it on high tolerant and high availability and fault tolerance features.  Zookeper is used for distributed coordination.
  • 15. HBASE Regions  Region is a range of keys start key to end key exclusive  Initially there will be one region as addition of data exceed the configured maximum (256 MB default) the region will be split  No of regions per server varies from 10 to 10000 as per hardware per region server.  Splitting data into regions help us in different ways: Fast Recovery when a region fails Load Balancing when a server overloaded Splitting is fast
  • 16. HBASE Data Storage  When data is added it will be written on to WAL (Write Ahead Log) and also in memory (Memstore)  When the data exceeds maximum value then it is flushed out of WAL to HFile  RegionServer still serves read-writes during the flush operations, writing values to WAL & Memstore.  Hfile is nothing much than a Key-Value map.  As HDFS doesn't support updates to an existing file therefore HFiles are immutable.  Delete Marker is saved to indicate whether record is available or removed.
  • 17. HBASE Data Storage(Contd.)  Periodic Data Computations are performed to control no of Hfiles and to keep cluster balanced Minor Complication:  Smaller Hfiles are merged into larger Hfiles Fast as data is already sorted Delete Markers are not applied Major Complication:  Scanning for all the entries and apply deletes as necessary  Merge all Hfiles of a region into a single file lies within a column family
  • 18. HBASE Master  Manages Regions and their locations Assigns Regions Balances workload Recovers if any region server is unavailable Uses Zookeeper for distributed coordination service  Clients directly communicate with Region Servers  Performs Schema Management and changes Adding/Removing tables and Column Families
  • 19. HBASE and Zookeeper  HBASE uses zookeeper for region assignments  Zookeeper is a centralized server for maintaining configuration information, Naming, Providing distributed synchronization, and providing group service.  File like API, performs operations on directories and files (Znodes)  Clients connect with a session to zookeeper Session is maintained via Heart-Beat Clients listening for updates will be notified of the deleted nodes and new nodes.
  • 20. HBASE and Zookeeper(Contd.)  Each region server creates a Ephemeral Node. Master monitors these nodes to discover available region servers and for server failures.  Use Zookeeper to make sure that only one master is registered  HBASE cannot exist in distributed without Zookeeper.
  • 21. HBASE Access  Hbase Shell  Native JAVA API Fastest and very capable options.  Avro Server Requires running Avro Server.  Hbql SQL like syntax for HBASE