SlideShare a Scribd company logo
1 of 32
Big Data – 4 V’s
NoSQL 
• NoSQL is all about scalability 
• Scaling to size 
• Scaling to complexity 
• Deliver Heavy R/W workloads 
• Data duplication and denormalization are first-class 
citizens
RDBMS vs NoSQL
No SQL Types
Database Chart
CAP Theorem
Re Check.. 
• What is CAP theorem? 
• Does NoSQL supports Transaction? 
• NoSQL Types?
HBase 
• Scalable, distributed data store 
• Sorted map of maps / Key- Value store 
• Open source avatar of Google’s Bigtable 
• Sparse 
• Multi dimensional 
• Tightly integrated with Hadoop 
• Not a RDBMS
Architecture 
HDFS((DataNodes) 
Storage 
ZooKeeper 
Membership management 
RegionServers 
Serve the regions 
HBase Masters 
Janitorial work
Column Oriented
Distributed
Variable number of columns
Important Terms 
• Table 
• Consists of rows and columns 
• Row 
• Has a bunch of columns. 
• Identified by a rowkey (primary’ key) 
• Column Qualifier 
• Dynamic column name 
• Column Family 
• Column groups - logical and physical (Similar access pattern) 
• Cell 
• The actual element that contains the data for a row-column insertion 
• Version 
• Every cell has multiple versions
Logical & Tall(v/s(Wide(tab Plehsy(sstiocraal gSet(rfuocottuprreint 
CF1 CF2 
r1 c1:v1 c1:v9 c6:v2 
r2 c1:v2 c3:v6 
r3 c2:v3 c5:v6 
r4 c2:v4 
r5 c1:v1 c3:v5 c7:v8 
HFile for CF1 HFile for CF2 
r1:CF1:c1:t1:v1 
r2:CF1:c1:t2:v2 
r2:CF1:c3:t3:v6 
r3:CF1:c2:t1:v3 
r4:CF1:c2:t1:v4 
r5:CF1:c1:t2:v1 
r5:CF1:c3:t3:v5 
r1:CF2:c1:t1:v9 
r1:CF2:c6:t4:v2 
r3:CF2:c5:t4:v6 
r5:CF2:c7:t3:v8 
Result object returned for a Get() on row r5 
r5:CF1:c1:t2:v1 
r5:CF1:c3:t3:v5 
r5:cf2:c7:t3:v8 
KeyValue objects 
Cell 
Value 
Time 
Stamp 
Col 
Qual 
Col 
Fam 
Row 
Key 
Key Value 
Logical representation of an HBase table. 
We'll look at what it means to Get() row r5 from this table. 
Actual physical storage of the table 
Structure of a KeyValue object
(J)Ruby Shell Commands 
• General 
• DDL 
• Create 
• Describe 
• Namespace 
• DML 
• Put 
• Get 
• Scan 
• Delete 
• Tools 
• Replication 
• Snapshot 
• Security 
• Visibility 
Creating Table: 
create 'DEVICE_DETAIL','BASIC_INFO','CONTRACT_INFO' 
Data Generation : 
put 'DEVICE_DETAIL','Device1','BASIC_INFO:IP_ADDR','10.10.10.10' 
put 'DEVICE_DETAIL','Device2','BASIC_INFO:IP_ADDR','20.20.20.20' 
Descripting Table: 
describe 'DEVICE_DETAIL' 
Alert Info : 
alter 'DEVICE_DETAIL',{NAME => 'CONTRACT_INFO',VERSIONS => 3 } 
Update Data: 
put 'DEVICE_DETAIL','Device2','CONTRACT_INFO:CONTRACT_NUMBER','22222222' 
Multi- Version Example : 
get 'DEVICE_DETAIL','Device2', {COLUMN=>'CONTRACT_INFO:CONTRACT_NUMBER', VERSIONS=>2} 
Scan Info: 
scan 'DEVICE_DETAIL’ 
Scan with Filter : 
scan 'DEVICE_DETAIL' , { COLUMNS => 'CONTRACT_INFO:STATUS', LIMIT => 10, FILTER => 
"ValueFilter( =, 'binary:IN_ACTIVE' )" } 
Delete Info: 
delete 'DEVICE_DETAIL','Device2','CONTRACT_INFO:STATUS'
Java API 
• HTable 
• HBaseAdmin 
• HTablePool 
• Get 
• Put 
• Delete 
• Scan 
• Increment 
• HTableDescriptor 
• HTableInterface 
• Result 
• ResultScanner 
• KeyValue 
HTable table = new HTable(configuration, hbasetablename); 
Put row = new Put(Bytes.toBytes(rowKey)); 
row.add(Bytes.toBytes(columnFamily), Bytes.toBytes(key), 
Bytes.toBytes(value)); 
Get getKey = new Get(Bytes.toBytes(key)); 
Result result = table.get(getKey);
Spark HBase 
// create configuration 
val config = HBaseConfiguration.create() 
config.set("hbase.zookeeper.quorum", "localhost") 
config.set("hbase.zookeeper.property.clientPort","2181") 
config.set("hbase.mapreduce.inputtable", "hbaseTableName") 
// read data 
val hbaseData = sparkContext.hadoopRDD(new JobConf(config), classOf[TableInputFormat], 
classOf[ImmutableBytesWritable], classOf[Result]) 
// count rows 
println(hbaseData.count)
HBase Architecture
Write & Read Logic
SQL
Re Check.. 
• Column family? 
• HBase components? 
• Name few Shell commands? 
• Version in HBase?
Reference Slides
Use Case 
• Canonical(use(case:(storing(crawl(data(and(indices(for(search 
14 
1 
Web Search 
powered by Bigtable 
Crawlers 
Crawlers 
1 Crawlers constantly scour the Internet for new pages. 
Those pages are stored as individual records in Bigtable. 3 
2 A MapReduce job runs over the entire table, generating 
search indexes for the Web Search application. 
4 
2 
5 
Indexing the Internet 
Searching the Internet 
3 The user initiates a Web Search request. 
4 The Web Search application queries the Search Indexes 
and retries matching documents directly from Bigtable. 
5 Search results are presented to the user. 
Internets Bigtable 
Crawlers 
Crawlers 
MapReduce 
You 
Search 
InSdeeaxrch 
InSdeeaxrch 
Index 
Web Search
Hbase Architecture
Replications
CAP Theorem

More Related Content

What's hot

Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseHBaseCon
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightHBaseCon
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...Cloudera, Inc.
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drilltshiran
 
Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseNick Dimiduk
 
Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future HBaseCon
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoHyunsik Choi
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon
 
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11Hyunsik Choi
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseCloudera, Inc.
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...Cloudera, Inc.
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopChicago Hadoop Users Group
 

What's hot (20)

Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBase
 
Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajo
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 

Viewers also liked

Apache hbase overview (20160427)
Apache hbase overview (20160427)Apache hbase overview (20160427)
Apache hbase overview (20160427)Steve Min
 
The Hive Overview
The Hive OverviewThe Hive Overview
The Hive Overviewthehivecs
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoopnvvrajesh
 
Base de données graphe et Neo4j
Base de données graphe et Neo4jBase de données graphe et Neo4j
Base de données graphe et Neo4jBoris Guarisma
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 
Instruction Manual Minelab X-TERRA 705 Metal Detector French Language (4901-0...
Instruction Manual Minelab X-TERRA 705 Metal Detector French Language (4901-0...Instruction Manual Minelab X-TERRA 705 Metal Detector French Language (4901-0...
Instruction Manual Minelab X-TERRA 705 Metal Detector French Language (4901-0...Serious Detecting
 
Graphes et détection de fraude : exemple de l'assurance
Graphes et détection de fraude : exemple de l'assuranceGraphes et détection de fraude : exemple de l'assurance
Graphes et détection de fraude : exemple de l'assuranceLinkurious
 
Introduction à Neo4j
Introduction à Neo4jIntroduction à Neo4j
Introduction à Neo4jNeo4j
 
Présentation des bases de données orientées graphes
Présentation des bases de données orientées graphesPrésentation des bases de données orientées graphes
Présentation des bases de données orientées graphesKoffi Sani
 

Viewers also liked (10)

Apache hbase overview (20160427)
Apache hbase overview (20160427)Apache hbase overview (20160427)
Apache hbase overview (20160427)
 
The Hive Overview
The Hive OverviewThe Hive Overview
The Hive Overview
 
HBASE Overview
HBASE OverviewHBASE Overview
HBASE Overview
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Base de données graphe et Neo4j
Base de données graphe et Neo4jBase de données graphe et Neo4j
Base de données graphe et Neo4j
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Instruction Manual Minelab X-TERRA 705 Metal Detector French Language (4901-0...
Instruction Manual Minelab X-TERRA 705 Metal Detector French Language (4901-0...Instruction Manual Minelab X-TERRA 705 Metal Detector French Language (4901-0...
Instruction Manual Minelab X-TERRA 705 Metal Detector French Language (4901-0...
 
Graphes et détection de fraude : exemple de l'assurance
Graphes et détection de fraude : exemple de l'assuranceGraphes et détection de fraude : exemple de l'assurance
Graphes et détection de fraude : exemple de l'assurance
 
Introduction à Neo4j
Introduction à Neo4jIntroduction à Neo4j
Introduction à Neo4j
 
Présentation des bases de données orientées graphes
Présentation des bases de données orientées graphesPrésentation des bases de données orientées graphes
Présentation des bases de données orientées graphes
 

Similar to NoSQL & HBase overview

Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityMapR Technologies
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtMichael Stack
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Jeremy Walsh
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & developmentShashwat Shriparv
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) BigDataEverywhere
 
Performing Data Science with HBase
Performing Data Science with HBasePerforming Data Science with HBase
Performing Data Science with HBaseWibiData
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill Carol McDonald
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Rohit Agrawal
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBaseCarol McDonald
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Aman Sinha
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectMao Geng
 
Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql databasebigdatagurus_meetup
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPTony Rogerson
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkCloudera, Inc.
 
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLAdding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLPiotr Pruski
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...Modern Data Stack France
 
Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011mubarakss
 

Similar to NoSQL & HBase overview (20)

Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & development
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
 
Performing Data Science with HBase
Performing Data Science with HBasePerforming Data Science with HBase
Performing Data Science with HBase
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 
מיכאל
מיכאלמיכאל
מיכאל
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBase
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql database
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTP
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLAdding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011
 

More from Venkata Naga Ravi

More from Venkata Naga Ravi (12)

Microservices with Docker
Microservices with Docker Microservices with Docker
Microservices with Docker
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Quick Trip with Docker
Quick Trip with DockerQuick Trip with Docker
Quick Trip with Docker
 
Glint with Apache Spark
Glint with Apache SparkGlint with Apache Spark
Glint with Apache Spark
 
Flocker
FlockerFlocker
Flocker
 
Big Data Benchmarking
Big Data BenchmarkingBig Data Benchmarking
Big Data Benchmarking
 
Go Lang
Go LangGo Lang
Go Lang
 
Kubernetes
KubernetesKubernetes
Kubernetes
 
Software Defined Network - SDN
Software Defined Network - SDNSoftware Defined Network - SDN
Software Defined Network - SDN
 
Virtual Container - Docker
Virtual Container - Docker Virtual Container - Docker
Virtual Container - Docker
 
Java 8 Lambda and Streams
Java 8 Lambda and StreamsJava 8 Lambda and Streams
Java 8 Lambda and Streams
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 

Recently uploaded

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 

Recently uploaded (20)

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 

NoSQL & HBase overview

  • 1.
  • 2. Big Data – 4 V’s
  • 3. NoSQL • NoSQL is all about scalability • Scaling to size • Scaling to complexity • Deliver Heavy R/W workloads • Data duplication and denormalization are first-class citizens
  • 8. Re Check.. • What is CAP theorem? • Does NoSQL supports Transaction? • NoSQL Types?
  • 9. HBase • Scalable, distributed data store • Sorted map of maps / Key- Value store • Open source avatar of Google’s Bigtable • Sparse • Multi dimensional • Tightly integrated with Hadoop • Not a RDBMS
  • 10.
  • 11. Architecture HDFS((DataNodes) Storage ZooKeeper Membership management RegionServers Serve the regions HBase Masters Janitorial work
  • 15. Important Terms • Table • Consists of rows and columns • Row • Has a bunch of columns. • Identified by a rowkey (primary’ key) • Column Qualifier • Dynamic column name • Column Family • Column groups - logical and physical (Similar access pattern) • Cell • The actual element that contains the data for a row-column insertion • Version • Every cell has multiple versions
  • 16. Logical & Tall(v/s(Wide(tab Plehsy(sstiocraal gSet(rfuocottuprreint CF1 CF2 r1 c1:v1 c1:v9 c6:v2 r2 c1:v2 c3:v6 r3 c2:v3 c5:v6 r4 c2:v4 r5 c1:v1 c3:v5 c7:v8 HFile for CF1 HFile for CF2 r1:CF1:c1:t1:v1 r2:CF1:c1:t2:v2 r2:CF1:c3:t3:v6 r3:CF1:c2:t1:v3 r4:CF1:c2:t1:v4 r5:CF1:c1:t2:v1 r5:CF1:c3:t3:v5 r1:CF2:c1:t1:v9 r1:CF2:c6:t4:v2 r3:CF2:c5:t4:v6 r5:CF2:c7:t3:v8 Result object returned for a Get() on row r5 r5:CF1:c1:t2:v1 r5:CF1:c3:t3:v5 r5:cf2:c7:t3:v8 KeyValue objects Cell Value Time Stamp Col Qual Col Fam Row Key Key Value Logical representation of an HBase table. We'll look at what it means to Get() row r5 from this table. Actual physical storage of the table Structure of a KeyValue object
  • 17. (J)Ruby Shell Commands • General • DDL • Create • Describe • Namespace • DML • Put • Get • Scan • Delete • Tools • Replication • Snapshot • Security • Visibility Creating Table: create 'DEVICE_DETAIL','BASIC_INFO','CONTRACT_INFO' Data Generation : put 'DEVICE_DETAIL','Device1','BASIC_INFO:IP_ADDR','10.10.10.10' put 'DEVICE_DETAIL','Device2','BASIC_INFO:IP_ADDR','20.20.20.20' Descripting Table: describe 'DEVICE_DETAIL' Alert Info : alter 'DEVICE_DETAIL',{NAME => 'CONTRACT_INFO',VERSIONS => 3 } Update Data: put 'DEVICE_DETAIL','Device2','CONTRACT_INFO:CONTRACT_NUMBER','22222222' Multi- Version Example : get 'DEVICE_DETAIL','Device2', {COLUMN=>'CONTRACT_INFO:CONTRACT_NUMBER', VERSIONS=>2} Scan Info: scan 'DEVICE_DETAIL’ Scan with Filter : scan 'DEVICE_DETAIL' , { COLUMNS => 'CONTRACT_INFO:STATUS', LIMIT => 10, FILTER => "ValueFilter( =, 'binary:IN_ACTIVE' )" } Delete Info: delete 'DEVICE_DETAIL','Device2','CONTRACT_INFO:STATUS'
  • 18. Java API • HTable • HBaseAdmin • HTablePool • Get • Put • Delete • Scan • Increment • HTableDescriptor • HTableInterface • Result • ResultScanner • KeyValue HTable table = new HTable(configuration, hbasetablename); Put row = new Put(Bytes.toBytes(rowKey)); row.add(Bytes.toBytes(columnFamily), Bytes.toBytes(key), Bytes.toBytes(value)); Get getKey = new Get(Bytes.toBytes(key)); Result result = table.get(getKey);
  • 19. Spark HBase // create configuration val config = HBaseConfiguration.create() config.set("hbase.zookeeper.quorum", "localhost") config.set("hbase.zookeeper.property.clientPort","2181") config.set("hbase.mapreduce.inputtable", "hbaseTableName") // read data val hbaseData = sparkContext.hadoopRDD(new JobConf(config), classOf[TableInputFormat], classOf[ImmutableBytesWritable], classOf[Result]) // count rows println(hbaseData.count)
  • 21. Write & Read Logic
  • 22. SQL
  • 23. Re Check.. • Column family? • HBase components? • Name few Shell commands? • Version in HBase?
  • 25.
  • 26.
  • 27.
  • 28. Use Case • Canonical(use(case:(storing(crawl(data(and(indices(for(search 14 1 Web Search powered by Bigtable Crawlers Crawlers 1 Crawlers constantly scour the Internet for new pages. Those pages are stored as individual records in Bigtable. 3 2 A MapReduce job runs over the entire table, generating search indexes for the Web Search application. 4 2 5 Indexing the Internet Searching the Internet 3 The user initiates a Web Search request. 4 The Web Search application queries the Search Indexes and retries matching documents directly from Bigtable. 5 Search results are presented to the user. Internets Bigtable Crawlers Crawlers MapReduce You Search InSdeeaxrch InSdeeaxrch Index Web Search
  • 31.

Editor's Notes

  1. Most NoSQL stores lack true ACID transactions, although a few recent systems, such as FairCom c-treeACE, Google Spanner (though technically a NewSQL database) and FoundationDB, have made them central to their designs. Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value Eventually consistent services are often classified as providing BASE (Basically Available, Soft state, Eventual consistency) semantics, in contrast to traditional ACID (Atomicity, Consistency, Isolation, Durability) guarantees.
  2. http://blog.monitis.com/2011/05/22/picking-the-right-nosql-database-tool/
  3. Eric Brewer’s CAP theorem says that if you want consistency, availability, and partition tolerance, you have to settle for two out of three. (For a distributed system, partition tolerance means the system will continue to work unless there is a total network failure. A few nodes can fail and the system keeps going.) Consistency means that each client always has the same view of the data. Availability means that all clients can always read and write. Partition tolerance means that the system works well across physical network partitions.
  4. http://localhost:60010/master-status
  5. Eric Brewer’s CAP theorem says that if you want consistency, availability, and partition tolerance, you have to settle for two out of three. (For a distributed system, partition tolerance means the system will continue to work unless there is a total network failure. A few nodes can fail and the system keeps going.) Consistency means that each client always has the same view of the data. Availability means that all clients can always read and write. Partition tolerance means that the system works well across physical network partitions.