SlideShare a Scribd company logo
1 of 38
HBase Low Latency
Nick Dimiduk, Hortonworks (@xefyr)
Nicolas Liochon, Scaled Risk (@nkeywal)
Hadoop Summit June 4, 2014
Agenda
• Latency, what is it, how to measure it
• Write path
• Read path
• Next steps
What’s low latency
Latency is about percentiles
• Average != 50% percentile
• There are often order of magnitudes between « average » and « 95
percentile »
• Post 99% = « magical 1% ». Work in progress here.
• Meaning from micro seconds (High Frequency
Trading) to seconds (interactive queries)
• In this talk milliseconds
Measure latency
bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation
• More options related to HBase: autoflush, replicas, …
• Latency measured in micro second
• Easier for internal analysis
YCSB - Yahoo! Cloud Serving Benchmark
• Useful for comparison between databases
• Set of workload already defined
Write path
• Two parts
• Single put (WAL)
• The client just sends the put
• Multiple puts from the client (new behavior since 0.96)
• The client is much smarter
• Four stages to look at for latency
• Start (establish tcp connections, etc.)
• Steady: when expected conditions are met
• Machine failure: expected as well
• Overloaded system
Single put: communication & scheduling
• Client: TCP connection to the server
• Shared: multitheads on the same client are using the same TCP connection
• Pooling is possible and does improve the performances in some circonstances
• hbase.client.ipc.pool.size
• Server: multiple calls from multiple threads on multiple machines
• Can become thousand of simultaneous queries
• Scheduling is required
Single put: real work
• The server must
• Write into the WAL queue
• Sync the WAL queue (HDFS flush)
• Write into the memstore
• WALs queue is shared between all the regions/handlers
• Sync is avoided if another handlers did the work
• Your handler may flush more data than expected
Simple put: A small run
Percentile Time in ms
Mean 1.21
50% 0.95
95% 1.50
99% 2.12
Latency sources
• Candidate one: network
• 0.5ms within a datacenter
• Much less between nodes in the same rack
Percentile Time in ms
Mean 0.13
50% 0.12
95% 0.15
99% 0.47
Latency sources
• Candidate two: HDFS Flush
• We can still do better: HADOOP-7714 & sons.
Percentile Time in ms
Mean 0.33
50% 0.26
95% 0.59
99% 1.24
Latency sources
• Millisecond world: everything can go wrong
• JVM
• Network
• OS Scheduler
• File System
• All this goes into the post 99% percentile
• Requires monitoring
• Usually using the latest version helps
Latency sources
• Split (and presplits)
• Autosharding is great!
• Puts have to wait
• Impacts: seconds
• Balance
• Regions move
• Triggers a retry for the client
• hbase.client.pause = 100ms since HBase 0.96
• Garbage Collection
• Impacts: 10’s of ms, even with a good config
• Covered with the read path of this talk
From steady to loaded and overloaded
• Number of concurrent tasks is a factor of
• Number of cores
• Number of disks
• Number of remote machines used
• Difficult to estimate
• Queues are doomed to happen
• hbase.regionserver.handler.count
• So for low latency
• Replable scheduler since HBase 0.98 (HBASE-8884). Requires specific code.
• RPC Priorities: work in progress (HBASE-11048)
From loaded to overloaded
• MemStore takes too much room: flush, then blocksquite quickly
• hbase.regionserver.global.memstore.size.lower.limit
• hbase.regionserver.global.memstore.size
• hbase.hregion.memstore.block.multiplier
• Too many Hfiles: block until compactions keep up
• hbase.hstore.blockingStoreFiles
• Too many WALs files: Flush and block
• hbase.regionserver.maxlogs
Machine failure
• Failure
• Dectect
• Reallocate
• Replay WAL
• Replaying WAL is NOT required for puts
• hbase.master.distributed.log.replay
• (default true in 1.0)
• Failure = Dectect + Reallocate + Retry
• That’s in the range of ~1s for simple failures
• Silent failures leads puts you in the 10s range if the hardware does not help
• zookeeper.session.timeout
Single puts
• Millisecond range
• Spikes do happen in steady mode
• 100ms
• Causes: GC, load, splits
Streaming puts
Htable#setAutoFlushTo(false)
Htable#put
Htable#flushCommit
• As simple puts, but
• Puts are grouped and send in background
• Load is taken into account
• Does not block
Multiple puts
hbase.client.max.total.tasks (default 100)
hbase.client.max.perserver.tasks (default 5)
hbase.client.max.perregion.tasks (default 1)
• Decouple the client from a latency spike of a region server
• Increase the throughput by 50% compared to old multiput
• Makes split and GC more transparent
Conclusion on write path
• Single puts can be very fast
• It’s not a « hard real time » system: there are spikes
• Most latency spikes can be hidden when streaming puts
• Failure are NOT that difficult for the write path
• No WAL to replay
And now for the read path
Read path
• Get/short scan are assumed for low-latency operations
• Again, two APIs
• Single get: HTable#get(Get)
• Multi-get: HTable#get(List<Get>)
• Four stages, same as write path
• Start (tcp connection, …)
• Steady: when expected conditions are met
• Machine failure: expected as well
• Overloaded system: you may need to add machines or tune your workload
Multi get / Client
Group Gets by
RegionServer
Execute them
one by one
Multi get / Server
Multi get / Server
Access latency magnidesStorage hierarchy: a different view
Dean/2009
Memory is 100000x
faster than disk!
Disk seek = 10ms
Known unknowns
• For each candidate HFile
• Exclude by file metadata
• Timestamp
• Rowkey range
• Exclude by bloom filter
StoreFileScanner#
shouldUseScanner()
Unknown knowns
• Merge sort results polled from Stores
• Seek each scanner to a reference KeyValue
• Retrieve candidate data from disk
• Multiple HFiles => mulitple seeks
• hbase.storescanner.parallel.seek.enable=true
• Short Circuit Reads
• dfs.client.read.shortcircuit=true
• Block locality
• Happy clusters compact!
HFileBlock#
readBlockData()
BlockCache
• Reuse previously read data
• Maximize cache hit rate
• Larger cache
• Temporal access locality
• Physical access locality
BlockCache#getBlock()
BlockCache Showdown
• LruBlockCache
• Default, onheap
• Quite good most of the time
• Evictions impact GC
• BucketCache
• Offheap alternative
• Serialization overhead
• Large memory configurations
http://www.n10k.com/blog/block
cache-showdown/
L2 off-heap BucketCache
makes a strong showing
Latency enemies: Garbage Collection
• Use heap. Not too much. With CMS.
• Max heap
• 30GB (compressed pointers)
• 8-16GB if you care about 9’s
• Healthy cluster load
• regular, reliable collections
• 25-100ms pause on regular interval
• Overloaded RegionServer suffers GC overmuch
Off-heap to the rescue?
• BucketCache (0.96, HBASE-7404)
• Network interfaces (HBASE-9535)
• MemStore et al (HBASE-10191)
Latency enemies: Compactions
• Fewer HFiles => fewer seeks
• Evict data blocks!
• Evict Index blocks!!
• hfile.block.index.cacheonwrite
• Evict bloom blocks!!!
• hfile.block.bloom.cacheonwrite
• OS buffer cache to the rescue
• Compactected data is still fresh
• Better than going all the way back to disk
Failure
• Detect + Reassign + Replay
• Strong consistency requires replay
• Locality drops to 0
• Cache starts from scratch
Hedging our bets
• HDFS Hedged reads (2.4, HDFS-5776)
• Reads on secondary DataNodes
• Strongly consistent
• Works at the HDFS level
• Timeline consistency (HBASE-10070)
• Reads on « Replica Region »
• Not strongly consistent
Read latency in summary
• Steady mode
• Cache hit: < 1 ms
• Cache miss: + 10 ms per seek
• Writing while reading => cache churn
• GC: 25-100ms pause on regular interval
Network request + (1 - P(cache hit)) * (10 ms * seeks)
• Same long tail issues as write
• Overloaded: same scheduling issues as write
• Partial failures hurt a lot
HBase ranges for 99% latency
Put
Streamed
Multiput Get Timeline get
Steady milliseconds milliseconds milliseconds milliseconds
Failure seconds seconds seconds milliseconds
GC
10’s of
milliseconds milliseconds
10’s of
milliseconds milliseconds
What’s next
• Less GC
• Use less objects
• Offheap
• Compressed BlockCache (HBASE-8894)
• Prefered location (HBASE-4755)
• The « magical 1% »
• Most tools stops at the 99% latency
• What happens after is much more complex
Thanks!
Nick Dimiduk, Hortonworks (@xefyr)
Nicolas Liochon, Scaled Risk (@nkeywal)
Hadoop Summit June 4, 2014

More Related Content

What's hot

Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberFlink Forward
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...HostedbyConfluent
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Hadoopの概念と基本的知識
Hadoopの概念と基本的知識Hadoopの概念と基本的知識
Hadoopの概念と基本的知識Ken SASAKI
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive TutorialSandeep Patil
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Vinoth Chandar
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Yongho Ha
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 

What's hot (20)

Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
HBase at LINE 2017
HBase at LINE 2017HBase at LINE 2017
HBase at LINE 2017
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Hadoopの概念と基本的知識
Hadoopの概念と基本的知識Hadoopの概念と基本的知識
Hadoopの概念と基本的知識
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 

Viewers also liked

Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low LatencyNick Dimiduk
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy Carol
Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy CarolScaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy Carol
Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy CarolHakka Labs
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBaseCon
 
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)tatsuya6502
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at XiaomiHBaseCon
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBaseCon
 
Нейронечёткая классификация слабо формализуемых данных | Тимур Гильмуллин
Нейронечёткая классификация слабо формализуемых данных | Тимур ГильмуллинНейронечёткая классификация слабо формализуемых данных | Тимур Гильмуллин
Нейронечёткая классификация слабо формализуемых данных | Тимур ГильмуллинPositive Hack Days
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Amazon Web Services
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101Nick Dimiduk
 

Viewers also liked (17)

Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy Carol
Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy CarolScaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy Carol
Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy Carol
 
Scheduling Policies in YARN
Scheduling Policies in YARNScheduling Policies in YARN
Scheduling Policies in YARN
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at Xiaomi
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
 
Нейронечёткая классификация слабо формализуемых данных | Тимур Гильмуллин
Нейронечёткая классификация слабо формализуемых данных | Тимур ГильмуллинНейронечёткая классификация слабо формализуемых данных | Тимур Гильмуллин
Нейронечёткая классификация слабо формализуемых данных | Тимур Гильмуллин
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101
 

Similar to HBase Low Latency

HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014Nick Dimiduk
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduseScott Miao
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015polo li
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014Modern Data Stack France
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseCloudera, Inc.
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Bob Pusateri
 
Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)jmhsieh
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Alexey Rybak
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveGluster.org
 
Caching: A Guided Tour - 10/12/2010
Caching: A Guided Tour - 10/12/2010Caching: A Guided Tour - 10/12/2010
Caching: A Guided Tour - 10/12/2010Jason Ragsdale
 
Intro to big data choco devday - 23-01-2014
Intro to big data   choco devday - 23-01-2014Intro to big data   choco devday - 23-01-2014
Intro to big data choco devday - 23-01-2014Hassan Islamov
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Cosmin Lehene
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleYifeng Jiang
 

Similar to HBase Low Latency (20)

HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep Dive
 
Caching: A Guided Tour - 10/12/2010
Caching: A Guided Tour - 10/12/2010Caching: A Guided Tour - 10/12/2010
Caching: A Guided Tour - 10/12/2010
 
Intro to big data choco devday - 23-01-2014
Intro to big data   choco devday - 23-01-2014Intro to big data   choco devday - 23-01-2014
Intro to big data choco devday - 23-01-2014
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

HBase Low Latency

  • 1. HBase Low Latency Nick Dimiduk, Hortonworks (@xefyr) Nicolas Liochon, Scaled Risk (@nkeywal) Hadoop Summit June 4, 2014
  • 2. Agenda • Latency, what is it, how to measure it • Write path • Read path • Next steps
  • 3. What’s low latency Latency is about percentiles • Average != 50% percentile • There are often order of magnitudes between « average » and « 95 percentile » • Post 99% = « magical 1% ». Work in progress here. • Meaning from micro seconds (High Frequency Trading) to seconds (interactive queries) • In this talk milliseconds
  • 4. Measure latency bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation • More options related to HBase: autoflush, replicas, … • Latency measured in micro second • Easier for internal analysis YCSB - Yahoo! Cloud Serving Benchmark • Useful for comparison between databases • Set of workload already defined
  • 5. Write path • Two parts • Single put (WAL) • The client just sends the put • Multiple puts from the client (new behavior since 0.96) • The client is much smarter • Four stages to look at for latency • Start (establish tcp connections, etc.) • Steady: when expected conditions are met • Machine failure: expected as well • Overloaded system
  • 6. Single put: communication & scheduling • Client: TCP connection to the server • Shared: multitheads on the same client are using the same TCP connection • Pooling is possible and does improve the performances in some circonstances • hbase.client.ipc.pool.size • Server: multiple calls from multiple threads on multiple machines • Can become thousand of simultaneous queries • Scheduling is required
  • 7. Single put: real work • The server must • Write into the WAL queue • Sync the WAL queue (HDFS flush) • Write into the memstore • WALs queue is shared between all the regions/handlers • Sync is avoided if another handlers did the work • Your handler may flush more data than expected
  • 8. Simple put: A small run Percentile Time in ms Mean 1.21 50% 0.95 95% 1.50 99% 2.12
  • 9. Latency sources • Candidate one: network • 0.5ms within a datacenter • Much less between nodes in the same rack Percentile Time in ms Mean 0.13 50% 0.12 95% 0.15 99% 0.47
  • 10. Latency sources • Candidate two: HDFS Flush • We can still do better: HADOOP-7714 & sons. Percentile Time in ms Mean 0.33 50% 0.26 95% 0.59 99% 1.24
  • 11. Latency sources • Millisecond world: everything can go wrong • JVM • Network • OS Scheduler • File System • All this goes into the post 99% percentile • Requires monitoring • Usually using the latest version helps
  • 12. Latency sources • Split (and presplits) • Autosharding is great! • Puts have to wait • Impacts: seconds • Balance • Regions move • Triggers a retry for the client • hbase.client.pause = 100ms since HBase 0.96 • Garbage Collection • Impacts: 10’s of ms, even with a good config • Covered with the read path of this talk
  • 13. From steady to loaded and overloaded • Number of concurrent tasks is a factor of • Number of cores • Number of disks • Number of remote machines used • Difficult to estimate • Queues are doomed to happen • hbase.regionserver.handler.count • So for low latency • Replable scheduler since HBase 0.98 (HBASE-8884). Requires specific code. • RPC Priorities: work in progress (HBASE-11048)
  • 14. From loaded to overloaded • MemStore takes too much room: flush, then blocksquite quickly • hbase.regionserver.global.memstore.size.lower.limit • hbase.regionserver.global.memstore.size • hbase.hregion.memstore.block.multiplier • Too many Hfiles: block until compactions keep up • hbase.hstore.blockingStoreFiles • Too many WALs files: Flush and block • hbase.regionserver.maxlogs
  • 15. Machine failure • Failure • Dectect • Reallocate • Replay WAL • Replaying WAL is NOT required for puts • hbase.master.distributed.log.replay • (default true in 1.0) • Failure = Dectect + Reallocate + Retry • That’s in the range of ~1s for simple failures • Silent failures leads puts you in the 10s range if the hardware does not help • zookeeper.session.timeout
  • 16. Single puts • Millisecond range • Spikes do happen in steady mode • 100ms • Causes: GC, load, splits
  • 17. Streaming puts Htable#setAutoFlushTo(false) Htable#put Htable#flushCommit • As simple puts, but • Puts are grouped and send in background • Load is taken into account • Does not block
  • 18. Multiple puts hbase.client.max.total.tasks (default 100) hbase.client.max.perserver.tasks (default 5) hbase.client.max.perregion.tasks (default 1) • Decouple the client from a latency spike of a region server • Increase the throughput by 50% compared to old multiput • Makes split and GC more transparent
  • 19. Conclusion on write path • Single puts can be very fast • It’s not a « hard real time » system: there are spikes • Most latency spikes can be hidden when streaming puts • Failure are NOT that difficult for the write path • No WAL to replay
  • 20. And now for the read path
  • 21. Read path • Get/short scan are assumed for low-latency operations • Again, two APIs • Single get: HTable#get(Get) • Multi-get: HTable#get(List<Get>) • Four stages, same as write path • Start (tcp connection, …) • Steady: when expected conditions are met • Machine failure: expected as well • Overloaded system: you may need to add machines or tune your workload
  • 22. Multi get / Client Group Gets by RegionServer Execute them one by one
  • 23. Multi get / Server
  • 24. Multi get / Server
  • 25. Access latency magnidesStorage hierarchy: a different view Dean/2009 Memory is 100000x faster than disk! Disk seek = 10ms
  • 26. Known unknowns • For each candidate HFile • Exclude by file metadata • Timestamp • Rowkey range • Exclude by bloom filter StoreFileScanner# shouldUseScanner()
  • 27. Unknown knowns • Merge sort results polled from Stores • Seek each scanner to a reference KeyValue • Retrieve candidate data from disk • Multiple HFiles => mulitple seeks • hbase.storescanner.parallel.seek.enable=true • Short Circuit Reads • dfs.client.read.shortcircuit=true • Block locality • Happy clusters compact! HFileBlock# readBlockData()
  • 28. BlockCache • Reuse previously read data • Maximize cache hit rate • Larger cache • Temporal access locality • Physical access locality BlockCache#getBlock()
  • 29. BlockCache Showdown • LruBlockCache • Default, onheap • Quite good most of the time • Evictions impact GC • BucketCache • Offheap alternative • Serialization overhead • Large memory configurations http://www.n10k.com/blog/block cache-showdown/ L2 off-heap BucketCache makes a strong showing
  • 30. Latency enemies: Garbage Collection • Use heap. Not too much. With CMS. • Max heap • 30GB (compressed pointers) • 8-16GB if you care about 9’s • Healthy cluster load • regular, reliable collections • 25-100ms pause on regular interval • Overloaded RegionServer suffers GC overmuch
  • 31. Off-heap to the rescue? • BucketCache (0.96, HBASE-7404) • Network interfaces (HBASE-9535) • MemStore et al (HBASE-10191)
  • 32. Latency enemies: Compactions • Fewer HFiles => fewer seeks • Evict data blocks! • Evict Index blocks!! • hfile.block.index.cacheonwrite • Evict bloom blocks!!! • hfile.block.bloom.cacheonwrite • OS buffer cache to the rescue • Compactected data is still fresh • Better than going all the way back to disk
  • 33. Failure • Detect + Reassign + Replay • Strong consistency requires replay • Locality drops to 0 • Cache starts from scratch
  • 34. Hedging our bets • HDFS Hedged reads (2.4, HDFS-5776) • Reads on secondary DataNodes • Strongly consistent • Works at the HDFS level • Timeline consistency (HBASE-10070) • Reads on « Replica Region » • Not strongly consistent
  • 35. Read latency in summary • Steady mode • Cache hit: < 1 ms • Cache miss: + 10 ms per seek • Writing while reading => cache churn • GC: 25-100ms pause on regular interval Network request + (1 - P(cache hit)) * (10 ms * seeks) • Same long tail issues as write • Overloaded: same scheduling issues as write • Partial failures hurt a lot
  • 36. HBase ranges for 99% latency Put Streamed Multiput Get Timeline get Steady milliseconds milliseconds milliseconds milliseconds Failure seconds seconds seconds milliseconds GC 10’s of milliseconds milliseconds 10’s of milliseconds milliseconds
  • 37. What’s next • Less GC • Use less objects • Offheap • Compressed BlockCache (HBASE-8894) • Prefered location (HBASE-4755) • The « magical 1% » • Most tools stops at the 99% latency • What happens after is much more complex
  • 38. Thanks! Nick Dimiduk, Hortonworks (@xefyr) Nicolas Liochon, Scaled Risk (@nkeywal) Hadoop Summit June 4, 2014

Editor's Notes

  1. Random read/write database, latency is very important
  2. Micro-seconds? Seconds? We are talking milliseconds Everyone stops looking after the 99% -- the literature calls this “magical 1%”
  3. How to measure latency in HBase
  4. Client connects to region server with TCP connection Connection is shared by client threads Server manages lots of client connections Schedules client queries – synchronization, locks, queues, &c
  5. WAL queue shared between regions - sometimes the sync work has already been done for you, can help - sometimes your small edit is sync’d with another larger one, can hurt you
  6. Small cluster, 4-yo machines 1 put, 1 put, 1put… 99% is double the mean Servers doing other work, but nothing major
  7. Where do we spend time? How about the network? 0.5ms is conservative, usually much less TCP round trip, same cluster as previous slide 99% 4x mean
  8. Where do we spend time? How about HDFS? Flushing, writing, &c Just HDFS, 2.4, 1kb, flush, flush, flush 99% 5x mean – not bad
  9. What else? Millisecond world means minor things start to matter: - JVM 1.7 is better at blocking queues than 1.6 - forget to configure naggle? 40ms - older linux scheduler bugs, 50ms - facebook literature talks a lot about filesystems
  10. Regular cluster operations also hurt Region split – 1’s of seconds Region move – 100ms (better on 0.96) GC – 10’s ms even after configuration tuning
  11. Now add some load Load == concurrency == queues: 1ms 0.98 adds pluggable scheduler, so you can influence these queues yourself Work in progress to expose this through configuration
  12. Contract: save you from cluster explosion How to protect? Stop writing - too many WALs? Stop writes. - too many hfiles to compact? Stop writes. Lasts indefinitely. New default settings.
  13. Puts do not require reads, so WAL replay doesn’t block writes Simple crash: quickly detected, 1s Hung/frozen: - conservative detection takes longer - configured to 5-15s, but look like long GC pause
  14. New since 0.96
  15. Settings for average cluster, unaggressive client Decoupled client from issue of a single RS YSCB, single empty region, 50% better throughput (HBASE-6295)
  16. This talk: assume get/short scan implies low-latency requirements No fancy streaming client like Puts; waiting for slowest RS.
  17. Gets grouped by RS, groups sent in parallel, block until all groups return. Network call: 10’s micros, in parallel
  18. Read path in full detail is quite complex See Lar’s HBaseCon 2012 talk for the nitty-gritty Complexity optimizing around one invariant: 10ms seek
  19. Read path in full detail is quite complex See Lar’s HBaseCon 2012 talk for the nitty-gritty Complexity optimizing around one invariant: 10ms seek
  20. Complexity optimizing around one invariant: 10ms seek Aiming for 100 microSec world; how to work around this? Goal: avoid disk at all cost!
  21. Goal: don’t go to disk unless absolutely necessary. Tactic: Candidate HFile elimination. Regular compactions => 3-5 candidates Bloom filters on by default in 0.96+
  22. Mergesort over multiple files, multiple seeks More spindles = parallel scanning SCR avoids proxy process (DataNode) But remember! Goal: don’t go to disk unless absolutely necessary.
  23. “block” is a segment of an HFile Data blocks, index blocks, and bloom blocks Read blocks retained in BlockCache Seek to same and adjacent data become cache hits
  24. HBase ships with a variety of cache implementations Happy with 95% stats? Stick with LruBlockCache and modest heapsize Pushing 99%? Lru still okay, but watch that heapsize. Spent money on RAM? BucketCache
  25. GC is a part of healthy operation BlockCache garbage is awkward size and age, which means pauses Pause time is a function of heap size More like ~16GiB if you’re really worried about 99% Overloaded: more cache evictions, time in GC
  26. Why generate garbage at all? GC are smart, but maybe we know our pain spots better? Don’t know until we try
  27. Necessary for fewer scan candidates, fewer seeks Buffer cache to the rescue “That’s steady state and overloaded; let’s talk about failure”
  28. Replaying WALs takes time Unlucky: no data locality, talking to remove DataNode Empty cache “Failure isn’t binary. What about the sick and the dying?”
  29. Don’t wait for a slow machine!
  30. Reads dominated by disk seek, so keep that data in memory After cache miss, GC is the next candidate cause of latency “Ideal formula” P(cache hit): fn (cache size :: db size, request locality) Sometimes the jitter dominates
  31. Standard deployment, well designed schema Millisecond responses, seconds for failure recovery, and GC at a regular interval Everything we’ve focused on here is impactful of the 99% Beyond that there’s a lot of interesting problems to solve
  32. There’s always more work to be done Generate less garbage Compressed BlockCache Improve recovery time and locality
  33. Questions!