SlideShare a Scribd company logo
1 of 29
Download to read offline
HBase In
Production
Hey we’re hiring!
Contents
● Bronto Overview
● HBase Architecture
● Operations
● Table Design
● Questions?
Bronto Overview
Bronto Software provides a cloud-based
marketing platform for organizations to drive
revenue through their email, mobile and social
campaigns
Bronto Contd.
● ESP for E-Commerce retailers
● Our customers are marketers
● Charts, graphs, reports
● Market segmentation
● Automation
● We are also hiring
Where We Use HBase
● High volume scenarios
● Realtime data
● Batch processing
● HDFS staging area
● Sorting/Indexing not a priority
○ We are working on this
HBase Overview
● Implementation of Google’s BigTable
● Sparse, sorted, versioned map
● Built on top of HDFS
● Row level ACID
● Get, Put, Scan
● Assorted RMW operations
Tables Overview
Tables are sorted (lexicographically) key value
pairs of uninterpreted byte[]s. Keyspace is
divided up into regions of keys. Each region is
hosted by exactly one machine.
R3R1
Server 1
Key Value
a byte[]
aa byte[]
b byte[]
bb byte[]
c byte[]
ca byte[]
R1: [a, b)
R2: [b, c)
R3: [c, d)
R2
Server 1
Table Overview
Operations
● Layers of complexity
● Normal failure modes
○ Hardware dies (or combust)
○ Human error
● JVM
● HDFS considerations
● Lots of knobs
Cascading Failure
1. High write volume fragments heap
2. GC promotion failure
3. Stop the world GC
4. ZK timeout
5. Receive YouAreDeadException, die
6. Failover
7. Goto 1
Useful Tunings
● MSLAB enabled
● hbase.regionserver.handler.count
○ Increasing puts more IO load on RS
○ 50 is our sweet spot
● JVM tuning
○ UseConcMarkSweepGC
○ UseParNewGC
Monitoring Tools
● Nagios for hardware checks
● Cloudera Manager
○ Reporting and health checks
○ Apache Ambari and MapR provide similar tools
● Hannibal + custom scripts
○ Identify hot regions for splitting
Table Design
● Table design is deceptively simple
● Main Considerations:
○ Row key structure
○ Number of column families
● Know your queries in advance
Additional Context
● SAAS environment
○ “Twitter clone” model won’t work
● Thousands of users millions, of attributes
● Skewed customer base
○ Biggest clients have 10MM+ contacts
○ Smallest have thousands
Row Keys
● Most important decision
● The only (native) index in HBase
● Random reads and writes are fast
○ Sorted on disk and in memory
○ Bloom filters speed read performance (not in use)
Hotspotting
● Associated with monotonically increasing
keys
○ MySql AUTO_INCREMENT
● Writes lock onto one region at a time
● Consequences:
○ Flush and compaction storms
○ $500K cluster limited by $10K machine
Row Key Advice
● Read/Write ratio should drive design
○ We pay a write time penalty for faster reads
● Identify queries you need to support
● Consider composite keys instead of indexes
● Bucketed/Salted keys are an option
○ Distribute writes across N buckets
○ Rebucketing is difficult
○ Requires N reads, slow workers
Variable Width Keys
customer_hash::email
● Allows scans for a single customer
● Hashed id distributes customers
● Sorted by email address
○ Could also use reverse domain for gmail, yahoo, etc.
Fixed Width Keys
site::contact::create::email
● FuzzyRowFilter
○ Can fix site, contact, and reverse_create
○ Can search for any email address
○ Could use a fixed width encoding for domain
■ Search for just gmail, yahoo, etc
● Distributes sites and users
● Contacts sorted by create date
Column Families
● Groupings of named columns
● Versioning, compression, TTL
● Different than BigTable
○ BigTable: 100s
○ HBase: 1 or 2
Column Family Example
Id d {VERSIONS => 2} s7 {TTL => 604800}
a (address) p (phone) o:3-27 (open) c:3-20 (click)
dfajkdh byte[] byte[]:555-5555 byte[]
hnvdzu9 byte[]:1234 St. XXXX
hnvdzu9 byte[]:1233 St.
hnvdzu9 XXXX byte[]
er9asyjk byte[]: 324 Ave
Column Family Example
● PROTIP: Keep CF and qualifier names short
○ They are repeated on disk for every cell
● “d” supports 2 versions of each column, maps to demographics
● “s7” has seven day TTL, maps to stats kept for 7 days.
MemStore
HDFS
s2s1 s3
f1
Column Families In Depth
MemStore
HDFS
s2s1
f2
my_table,,1328551097416.12921
bbc0c91869f88ba6a044a6a1c50.
● StoreFile(s) for each CF
in region
● Sparse
● One memstore per CF
○ Must flush together
● Compactions happen at
region level
(Region)
(family) (family)
Compactions
● Rewrites StoreFiles
○ Improves read performance
○ IO Intensive
● Region scope
● Used to take > 50 hours
● Custom script took it down to 18
○ Can (theoretically) run during the day
MemStore
HDFS
S1
f1
my_table,,
1328551097416.12921
bbc0c91869f88ba6a044a6a1c5
0.
(Region)
MemStore
HDFS
s
2
s
1 s3
f1
my_table,,1328551097416.12921
bbc0c91869f88ba6a044a6a1c50.
(Region)
Compaction Before and After
s4 s5 s6
Before After
K-Way
Merge
The Table From Hell
● 19 Column Families
● 60% of our region count
● Skewed write pattern
○ KB size store files
○ Frequent compaction storms
○ hbase.hstore.compaction.min.size (HBASE-5461)
● Moved to it’s own cluster
And yet...
● Cluster remained operational
○ Table is still in use today
● Met read and write demand
● Regions only briefly active
○ Rowkeys by date and customer
What saved us
● Keyed by customer and date
● Effectively write once
○ Kept “active” region count low
● Custom compaction script
○ Skipped old regions
● More hardware
● Were able to selectively migrate
Column Family Advice
● Bad choice for fine grained partitioning
● Good for
○ Similarly typed data
○ Varying versioning/retention requirements
● Prefer intra row scans
○ CF and qualifiers are sorted
○ ColumnRangeFilter
Questions?

More Related Content

What's hot

Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Eric Evans
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simpleDori Waldman
 
HBase, dances on the elephant back.
HBase, dances on the elephant back.HBase, dances on the elephant back.
HBase, dances on the elephant back.Roman Nikitchenko
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloudAvinash Ramineni
 
Datomic rtree-pres
Datomic rtree-presDatomic rtree-pres
Datomic rtree-presjsofra
 
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GRGlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GRTheophanis Kontogiannis
 
Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.TigerGraph
 
Sharding - patterns & antipatterns, Константин Осипов, Алексей Рыбак
Sharding -  patterns & antipatterns, Константин Осипов, Алексей РыбакSharding -  patterns & antipatterns, Константин Осипов, Алексей Рыбак
Sharding - patterns & antipatterns, Константин Осипов, Алексей РыбакOntico
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudShubham Tagra
 
Tms training
Tms trainingTms training
Tms trainingChi Lee
 
A journey through cosmos - 5th el
A journey through cosmos - 5th elA journey through cosmos - 5th el
A journey through cosmos - 5th elAvinashRamakanth
 
Improve Presto Architectural Decisions with Shadow Cache
 Improve Presto Architectural Decisions with Shadow Cache Improve Presto Architectural Decisions with Shadow Cache
Improve Presto Architectural Decisions with Shadow CacheAlluxio, Inc.
 
Redis Overview
Redis OverviewRedis Overview
Redis OverviewHoang Long
 
Thrashing allocation frames.43
Thrashing allocation frames.43Thrashing allocation frames.43
Thrashing allocation frames.43myrajendra
 

What's hot (19)

Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
HBase, dances on the elephant back.
HBase, dances on the elephant back.HBase, dances on the elephant back.
HBase, dances on the elephant back.
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
 
Datomic rtree-pres
Datomic rtree-presDatomic rtree-pres
Datomic rtree-pres
 
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GRGlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
 
RubiX
RubiXRubiX
RubiX
 
Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.
 
Sharding - patterns & antipatterns, Константин Осипов, Алексей Рыбак
Sharding -  patterns & antipatterns, Константин Осипов, Алексей РыбакSharding -  patterns & antipatterns, Константин Осипов, Алексей Рыбак
Sharding - patterns & antipatterns, Константин Осипов, Алексей Рыбак
 
Hadoop at datasift
Hadoop at datasiftHadoop at datasift
Hadoop at datasift
 
Hadoop at datasift
Hadoop at datasiftHadoop at datasift
Hadoop at datasift
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
 
Tms training
Tms trainingTms training
Tms training
 
A journey through cosmos - 5th el
A journey through cosmos - 5th elA journey through cosmos - 5th el
A journey through cosmos - 5th el
 
In-memory database
In-memory databaseIn-memory database
In-memory database
 
Falando de MySQL
Falando de MySQLFalando de MySQL
Falando de MySQL
 
Improve Presto Architectural Decisions with Shadow Cache
 Improve Presto Architectural Decisions with Shadow Cache Improve Presto Architectural Decisions with Shadow Cache
Improve Presto Architectural Decisions with Shadow Cache
 
Redis Overview
Redis OverviewRedis Overview
Redis Overview
 
Thrashing allocation frames.43
Thrashing allocation frames.43Thrashing allocation frames.43
Thrashing allocation frames.43
 

Viewers also liked

HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designphanleson
 
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Daniel Abadi
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best PracticesVenu Anuganti
 
[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming OverviewStratio
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseenissoz
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databasesArangoDB Database
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)alexbaranau
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented DatabaseSuvradeep Rudra
 
Hbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesHbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesLuis Cipriani
 

Viewers also liked (20)

HBASE Overview
HBASE OverviewHBASE Overview
HBASE Overview
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
 
Hbase at Salesforce.com
Hbase at Salesforce.comHbase at Salesforce.com
Hbase at Salesforce.com
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
Hbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesHbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databases
 

Similar to HBase in Production: Tips for Table Design and Operations

Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBjhugg
 
Understanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreMariaDB plc
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBAmazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Amazon Web Services
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Application Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceApplication Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceScott Mansfield
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in productionvalstadsve
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon RedshiftJeff Patti
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive Omid Vahdaty
 
Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Jelena Zanko
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSqlOmid Vahdaty
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyGuillaume Lefranc
 
Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)Scott Mansfield
 

Similar to HBase in Production: Tips for Table Design and Operations (20)

Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
Understanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStore
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDB
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Application Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceApplication Caching: The Hidden Microservice
Application Caching: The Hidden Microservice
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in production
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon Redshift
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
 
Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
 
Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討
 
Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)
 

More from trihug

TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Rangertrihug
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparktrihug
 
TriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache SentryTriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache Sentrytrihug
 
TriHUG talk on Spark and Shark
TriHUG talk on Spark and SharkTriHUG talk on Spark and Shark
TriHUG talk on Spark and Sharktrihug
 
Impala presentation
Impala presentationImpala presentation
Impala presentationtrihug
 
Practical pig
Practical pigPractical pig
Practical pigtrihug
 
Financial services trihug
Financial services trihugFinancial services trihug
Financial services trihugtrihug
 
TriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris ShainTriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris Shaintrihug
 
TriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan GatesTriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan Gatestrihug
 
TriHUG November Pig Talk by Alan Gates
TriHUG November Pig Talk by Alan GatesTriHUG November Pig Talk by Alan Gates
TriHUG November Pig Talk by Alan Gatestrihug
 
MapR, Implications for Integration
MapR, Implications for IntegrationMapR, Implications for Integration
MapR, Implications for Integrationtrihug
 

More from trihug (11)

TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
 
TriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache SentryTriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache Sentry
 
TriHUG talk on Spark and Shark
TriHUG talk on Spark and SharkTriHUG talk on Spark and Shark
TriHUG talk on Spark and Shark
 
Impala presentation
Impala presentationImpala presentation
Impala presentation
 
Practical pig
Practical pigPractical pig
Practical pig
 
Financial services trihug
Financial services trihugFinancial services trihug
Financial services trihug
 
TriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris ShainTriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris Shain
 
TriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan GatesTriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan Gates
 
TriHUG November Pig Talk by Alan Gates
TriHUG November Pig Talk by Alan GatesTriHUG November Pig Talk by Alan Gates
TriHUG November Pig Talk by Alan Gates
 
MapR, Implications for Integration
MapR, Implications for IntegrationMapR, Implications for Integration
MapR, Implications for Integration
 

Recently uploaded

定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleanscorenetworkseo
 

Recently uploaded (20)

定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleans
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 

HBase in Production: Tips for Table Design and Operations

  • 2. Contents ● Bronto Overview ● HBase Architecture ● Operations ● Table Design ● Questions?
  • 3. Bronto Overview Bronto Software provides a cloud-based marketing platform for organizations to drive revenue through their email, mobile and social campaigns
  • 4. Bronto Contd. ● ESP for E-Commerce retailers ● Our customers are marketers ● Charts, graphs, reports ● Market segmentation ● Automation ● We are also hiring
  • 5. Where We Use HBase ● High volume scenarios ● Realtime data ● Batch processing ● HDFS staging area ● Sorting/Indexing not a priority ○ We are working on this
  • 6. HBase Overview ● Implementation of Google’s BigTable ● Sparse, sorted, versioned map ● Built on top of HDFS ● Row level ACID ● Get, Put, Scan ● Assorted RMW operations
  • 7. Tables Overview Tables are sorted (lexicographically) key value pairs of uninterpreted byte[]s. Keyspace is divided up into regions of keys. Each region is hosted by exactly one machine.
  • 8. R3R1 Server 1 Key Value a byte[] aa byte[] b byte[] bb byte[] c byte[] ca byte[] R1: [a, b) R2: [b, c) R3: [c, d) R2 Server 1 Table Overview
  • 9. Operations ● Layers of complexity ● Normal failure modes ○ Hardware dies (or combust) ○ Human error ● JVM ● HDFS considerations ● Lots of knobs
  • 10. Cascading Failure 1. High write volume fragments heap 2. GC promotion failure 3. Stop the world GC 4. ZK timeout 5. Receive YouAreDeadException, die 6. Failover 7. Goto 1
  • 11. Useful Tunings ● MSLAB enabled ● hbase.regionserver.handler.count ○ Increasing puts more IO load on RS ○ 50 is our sweet spot ● JVM tuning ○ UseConcMarkSweepGC ○ UseParNewGC
  • 12. Monitoring Tools ● Nagios for hardware checks ● Cloudera Manager ○ Reporting and health checks ○ Apache Ambari and MapR provide similar tools ● Hannibal + custom scripts ○ Identify hot regions for splitting
  • 13. Table Design ● Table design is deceptively simple ● Main Considerations: ○ Row key structure ○ Number of column families ● Know your queries in advance
  • 14. Additional Context ● SAAS environment ○ “Twitter clone” model won’t work ● Thousands of users millions, of attributes ● Skewed customer base ○ Biggest clients have 10MM+ contacts ○ Smallest have thousands
  • 15. Row Keys ● Most important decision ● The only (native) index in HBase ● Random reads and writes are fast ○ Sorted on disk and in memory ○ Bloom filters speed read performance (not in use)
  • 16. Hotspotting ● Associated with monotonically increasing keys ○ MySql AUTO_INCREMENT ● Writes lock onto one region at a time ● Consequences: ○ Flush and compaction storms ○ $500K cluster limited by $10K machine
  • 17. Row Key Advice ● Read/Write ratio should drive design ○ We pay a write time penalty for faster reads ● Identify queries you need to support ● Consider composite keys instead of indexes ● Bucketed/Salted keys are an option ○ Distribute writes across N buckets ○ Rebucketing is difficult ○ Requires N reads, slow workers
  • 18. Variable Width Keys customer_hash::email ● Allows scans for a single customer ● Hashed id distributes customers ● Sorted by email address ○ Could also use reverse domain for gmail, yahoo, etc.
  • 19. Fixed Width Keys site::contact::create::email ● FuzzyRowFilter ○ Can fix site, contact, and reverse_create ○ Can search for any email address ○ Could use a fixed width encoding for domain ■ Search for just gmail, yahoo, etc ● Distributes sites and users ● Contacts sorted by create date
  • 20. Column Families ● Groupings of named columns ● Versioning, compression, TTL ● Different than BigTable ○ BigTable: 100s ○ HBase: 1 or 2
  • 21. Column Family Example Id d {VERSIONS => 2} s7 {TTL => 604800} a (address) p (phone) o:3-27 (open) c:3-20 (click) dfajkdh byte[] byte[]:555-5555 byte[] hnvdzu9 byte[]:1234 St. XXXX hnvdzu9 byte[]:1233 St. hnvdzu9 XXXX byte[] er9asyjk byte[]: 324 Ave Column Family Example ● PROTIP: Keep CF and qualifier names short ○ They are repeated on disk for every cell ● “d” supports 2 versions of each column, maps to demographics ● “s7” has seven day TTL, maps to stats kept for 7 days.
  • 22. MemStore HDFS s2s1 s3 f1 Column Families In Depth MemStore HDFS s2s1 f2 my_table,,1328551097416.12921 bbc0c91869f88ba6a044a6a1c50. ● StoreFile(s) for each CF in region ● Sparse ● One memstore per CF ○ Must flush together ● Compactions happen at region level (Region) (family) (family)
  • 23. Compactions ● Rewrites StoreFiles ○ Improves read performance ○ IO Intensive ● Region scope ● Used to take > 50 hours ● Custom script took it down to 18 ○ Can (theoretically) run during the day
  • 25. The Table From Hell ● 19 Column Families ● 60% of our region count ● Skewed write pattern ○ KB size store files ○ Frequent compaction storms ○ hbase.hstore.compaction.min.size (HBASE-5461) ● Moved to it’s own cluster
  • 26. And yet... ● Cluster remained operational ○ Table is still in use today ● Met read and write demand ● Regions only briefly active ○ Rowkeys by date and customer
  • 27. What saved us ● Keyed by customer and date ● Effectively write once ○ Kept “active” region count low ● Custom compaction script ○ Skipped old regions ● More hardware ● Were able to selectively migrate
  • 28. Column Family Advice ● Bad choice for fine grained partitioning ● Good for ○ Similarly typed data ○ Varying versioning/retention requirements ● Prefer intra row scans ○ CF and qualifiers are sorted ○ ColumnRangeFilter