HBASE Overview

HBASE
THE SCALABLE DATA STORE
Sampath Rachakonda

Agenda
 Evolution of HBASE
 Overview
 Data Model
 Architecture
 Hbase and Zookeper

Evolution of HBASE
 File-Systems
 Tapes → Linear Access or Sequential Access.
 Disc → Random Access
Seek Time
Transfer Rate
 DBMS
 RDBMS
 Now NOSQL

Hadoop
 It comprises mainly two
things HDFS and
MapReduce.
 HDFS is scalable, fault
tolerant, and high
performance DFS can
run on commodity
hardware.
 Map-Reduce is software
framework for
distributed computation.
 Master/Slave
`
 Limitations
Batch processing
Sequential Data
look-up
Not intended for
real time querying
No Support for
Random Access

NOSQL
 Massive Data Volumes
 Schema Evolution
As it is almost impossible for fixed Schema for
web scale database.
With NOSQL Schema changes can be gradually
introduced into systems.
 Extreme Query Load
Bottleneck is Joins

Why HBASE ?
 Column-Oriented Stores
 Distributed – Designed to serve large tables
 Horizontally Scalable
 High Performance & Availability
 Storage System
 The base goal of HBASE is Billions of Rows,
Millions of Columns and Thousand of versions
 Supports random real time CRUD operations unlike
HDFS

Who uses Hbase ?
 Facebook
 Adobe
 Twitter
 Yahoo
 Meetup
 Netflix
 Many More..

When to use HBASE ?
 Good for large amounts of data
100's of millions or billions of rows
Have to have enough hardware Large Amounts of
client requests
Single Random Selects and range scans by key
Great for variable schema
Analytical

HBASE Data Model
 Data is stored in Tables
 Tables contain rows
Rows are referenced by Unique key
Key is array of bytes anything can be a key.
 Rows made of columns are grouped in column
families
Data is stored in cells and identified by row x column-family x
column
 Tables are sorted by the row key in lexicographical
order.

HBASE Families
 Rows are grouped as families
Labeled as “Family:column”
 Example: “user:name”
Different features are applied to families
 Stored together – HFile/StoreFile
 Compression
 Table Schema defines its Column Families
Each family can consist of any number of columns
and Versions
Column exists when inserted, NULLS are free.
Columns with family are sorted and stored
together.

HBASE Timestamps
 Cells Values are versioned and 3 versions are kept
by default.
 Versions are stored in decreasing time-stamp order.
 Reads the latest first – which will be our current
value.
 Value will be
Value = Table + RowKey + Family + Column +
TimeStamp
 Index will be always unique

HBASE Cells Example
 Example of how values are stored
Row Key Time stamp Name Family Address Family
first_name last_name number address
row1 t1 Bob Smith
t5 10 First Lane
t10 30 Other Lane
t15 7 Last Street
row2 t20 Mary Tompson
t22 77 One Street
t30 Thompson

HBASE Architecture
 Table is made up of regions
 Region is a range of rows sorted together
Dynamically splits as they become too big and
merge when they are too small
 Master Server is responsible for managing Hbase
cluster (i.e.., Region Servers)
 Hbase stores its data into HDFS which makes to rely
it on high tolerant and high availability and fault
tolerance features.
 Zookeper is used for distributed coordination.

HBASE Architecture
 As Follows:

HBASE Regions
 Region is a range of keys start key to end key
exclusive
 Initially there will be one region as addition of data
exceed the configured maximum (256 MB default)
the region will be split
 No of regions per server varies from 10 to 10000 as
per hardware per region server.
 Splitting data into regions help us in different ways:
Fast Recovery when a region fails
Load Balancing when a server overloaded
Splitting is fast

HBASE Data Storage
 When data is added it will be written on to WAL
(Write Ahead Log) and also in memory (Memstore)
 When the data exceeds maximum value then it is
flushed out of WAL to HFile
 RegionServer still serves read-writes during the
flush operations, writing values to WAL &
Memstore.
 Hfile is nothing much than a Key-Value map.
 As HDFS doesn't support updates to an existing file
therefore HFiles are immutable.
 Delete Marker is saved to indicate whether record is
available or removed.

HBASE Data Storage(Contd.)
 Periodic Data Computations are performed to
control no of Hfiles and to keep cluster balanced
Minor Complication:
 Smaller Hfiles are merged into larger Hfiles
Fast as data is already sorted
Delete Markers are not applied
Major Complication:
 Scanning for all the entries and apply deletes as
necessary
 Merge all Hfiles of a region into a single file lies
within a column family

HBASE Master
 Manages Regions and their locations
Assigns Regions
Balances workload
Recovers if any region server is unavailable
Uses Zookeeper for distributed coordination
service
 Clients directly communicate with Region Servers
 Performs Schema Management and changes
Adding/Removing tables and Column Families

HBASE and Zookeeper
 HBASE uses zookeeper for region assignments
 Zookeeper is a centralized server for maintaining
configuration information, Naming, Providing
distributed synchronization, and providing group
service.
 File like API, performs operations on directories and
files (Znodes)
 Clients connect with a session to zookeeper
Session is maintained via Heart-Beat
Clients listening for updates will be notified of the
deleted nodes and new nodes.

HBASE and Zookeeper(Contd.)
 Each region server creates a Ephemeral Node.
Master monitors these nodes to discover available
region servers and for server failures.
 Use Zookeeper to make sure that only one master is
registered
 HBASE cannot exist in distributed without
Zookeeper.

HBASE Access
 Hbase Shell
 Native JAVA API
Fastest and very capable options.
 Avro Server
Requires running Avro Server.
 Hbql
SQL like syntax for HBASE

HBASE Overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to HBASE Overview

Similar to HBASE Overview (20)

HBASE Overview