SlideShare a Scribd company logo
1 of 39
1©MapR Technologies - Confidential
M7 Technical Overview
M. C. Srivas
CTO/Founder, MapR
2©MapR Technologies - Confidential
MapR: Lights Out Data Center Ready
• Automated stateful failover
• Automated re-replication
• Self-healing from HW and SW
failures
• Load balancing
• Rolling upgrades
• No lost jobs or data
• 99999’s of uptime
Reliable Compute Dependable Storage
• Business continuity with snapshots
and mirrors
• Recover to a point in time
• End-to-end check summing
• Strong consistency
• Built-in compression
• Mirror between two sites by RTO
policy
3©MapR Technologies - Confidential
MapR does MapReduce (fast)
TeraSort Record
1 TB in 54 seconds
1003 nodes
MinuteSort Record
1.5 TB in 59 seconds
2103 nodes
4©MapR Technologies - Confidential
MapR does MapReduce (faster)
TeraSort Record
1 TB in 54 seconds
1003 nodes
MinuteSort Record
1.5 TB in 59 seconds
2103 nodes
1.65
300
5©MapR Technologies - Confidential
Dynamo DB
ZopeDB
Shoal
CloudKit
Vertex DB
FlockD
B
NoSQL
6©MapR Technologies - Confidential
HBase Table Architecture
 Tables are divided into key ranges (regions)
 Regions are served by nodes (RegionServers)
 Columns are divided into access groups (columns families)
CF1 CF2 CF3 CF4 CF5
R1
R2
R3
R4
7©MapR Technologies - Confidential
HBase Architecture is Better
 Strong consistency model
– when a write returns, all readers will see same value
– "eventually consistent" is often "eventually inconsistent"
 Scan works
– does not broadcast
– ring-based NoSQL databases (eg, Cassandra, Riak) suffer on scans
 Scales automatically
– Splits when regions become too large
– Uses HDFS to spread data, manage space
 Integrated with Hadoop
– map-reduce on HBase is straightforward
8©MapR Technologies - Confidential
M7
An integrated system for
unstructured and structured data
9©MapR Technologies - Confidential
MapR M7 Tables
 Binary compatible with Apache HBase
– no recompilation needed to access M7 tables
– Just set CLASSPATH
– including HBase CLI
 M7 tables accessed via pathname
– openTable( "hello") … uses HBase
– openTable( "/hello") … uses M7
– openTable( "/user/srivas/hello") … uses M7
9
10©MapR Technologies - Confidential
Binary Compatible
 HBase applications work "as is" with M7
– No need to recompile , just set CLASSPATH
 Can run M7 and HBase side-by-side on the same cluster
– eg, during a migration
– can access both M7 table and HBase table in same program
 Use standard Apache HBase CopyTable tool to copy a table
from HBase to M7 or vice-versa, viz.,
% hbase org.apache.hadoop.hbase.mapreduce.CopyTable
--new.name=/user/srivas/mytable oldtable
11©MapR Technologies - Confidential
Features
 Unlimited number of tables
– HBase is typically 10-20 tables (max 100)
 No compaction
 Instant-On
– zero recovery time
 8x insert/update perf
 10x random scan perf
 10x faster with flash - special flash support
11
12©MapR Technologies - Confidential
M7: Remove Layers, Simplify
MapR M7
13©MapR Technologies - Confidential
M7 tables in a MapR Cluster
 M7 tables integrated into storage
– always available on every node
– no separate process to start/stop/monitor
– zero administration
– no tuning parameters … just works
 M7 tables work 'as expected'
– First copy local to writing client
– Snapshots and mirrors
– Quotas , repl factor, data placement
13
14©MapR Technologies - Confidential
Unified Namespace for Files and Tables
$ pwd
/mapr/default/user/dave
$ ls
file1 file2 table1 table2
$ hbase shell
hbase(main):003:0> create '/user/dave/table3', 'cf1', 'cf2', 'cf3'
0 row(s) in 0.1570 seconds
$ ls
file1 file2 table1 table2 table3
$ hadoop fs -ls /user/dave
Found 5 items
-rw-r--r-- 3 mapr mapr 16 2012-09-28 08:34 /user/dave/file1
-rw-r--r-- 3 mapr mapr 22 2012-09-28 08:34 /user/dave/file2
trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:32 /user/dave/table1
trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:33 /user/dave/table2
trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:38 /user/dave/table3
15©MapR Technologies - Confidential
M7 – An Integrated System
16©MapR Technologies - Confidential
Tables for End Users
 Users can create and manage their own tables
– Unlimited # of tables
– first copy local
 Tables can be created in any directory
– Tables count towards volume and user quotas
 No admin intervention needed
– do stuff on the fly, no stop/restart servers
 Automatic data protection and disaster recovery
– Users can recover from snapshots/mirrors on their own
17©MapR Technologies - Confidential
M7 combines the best of LSM and BTrees
 LSM Trees reduce insert cost by deferring and batching index changes
– If don't compact often, read perf is impacted
– If compact too often, write perf is impacted
 B-Trees are great for reads
– but expensive to update in real-time
Can we combine both ideas?
Writes cannot be done better than W = 2.5x
write to log + write data to somewhere + update meta-data
18©MapR Technologies - Confidential
M7 from MapR
 Twisting BTree's
– leaves are variable size (8K - 8M or larger)
– can stay unbalanced for long periods of time
• more inserts will balance it eventually
• automatically throttles updates to interior btree nodes
– M7 inserts "close to" where the data is supposed to go
 Reads
– Uses BTree structure to get "close" very fast
• very high branching with key-prefix-compression
– Utilizes a separate lower-level index to find it exactly
• updated "in-place"bloom-filters for gets, range-maps for scans
 Overhead
– 1K record read will transfer about 32K from disk in logN seeks
19©MapR Technologies - Confidential
M7
Comparative Analysis with
Apache HBase, Level-DB and a BTree
20©MapR Technologies - Confidential
Apache HBase HFile Structure
64Kbyte blocks
are compressed
An index into the
compressed blocks is
created as a btree
Key-value
pairs are
laid out in
increasing
order
Each cell is an individual key + value
- a row repeats the key for each column
21©MapR Technologies - Confidential
HBase Region Operation
 Typical region size is a few GB, sometimes even 10G or 20G
 RS holds data in memory until full, then writes a new HFile
– Logical view of database constructed by layering these files, with the
latest on top
Key range represented by this region
newest
oldest
22©MapR Technologies - Confidential
HBase Read Amplification
 When a get/scan comes in, all the files have to be examined
– schema-less, so where is the column?
– Done in-memory and does not change what's on disk
• Bloom-filters do not help in scans
newest
oldest
With 7 files, a 1K-record get () takes about 30 seeks, 7 block decompressions,
and a total data transfer of about 130K from HDFS.
23©MapR Technologies - Confidential
HBase Write Amplification
 To reduce the read-amplification, HBase merges the HFiles
periodically
– process called compaction
– runs automatically when too many files
– usually turned off due to I/O storms
– and kicked-off manually on weekends
Compaction reads all files and merges
into a single HFile
24©MapR Technologies - Confidential
HBase Compaction Analysis
 Assume 10G per region, write 10% per day, grow 10% per week
– 1G of writes
– after 7 days, 7 files of 1G and 1file of 10G
 Compaction
– Total reads: 17G (= 7 x 1G + 1 x 10G)
– Total writes: 25G (= 7G wal + 7G flush + 11G write to new HFile)
 500 regions
– read 8.5T, write 12.5T  major outage on node
– with fewer hfiles, it only gets worse
 Best practice, serve < 500g per node (50 regions)
25©MapR Technologies - Confidential
Level-DB
 Tiered, logarithmic increase
– L1: 2 x 1M files
– L2: 10 x 1M
– L3: 100 x 1M
– L4: 1,000 x 1M, etc
 Compaction overhead
– avoids IO storms (i/o done in smaller increments of ~10M)
– but significantly extra bandwidth compared to HBase
 Read overhead is still high
– 10-15 seeks, perhaps more if the lowest level is very large
– 40K - 60K read from disk to retrieve a 1K record
26©MapR Technologies - Confidential
BTree analysis
 Read finds data directly, proven to be fastest
– interior nodes only hold keys
– very large branching factor
– values only at leaves
– thus caches work
– R = logN seeks, if no caching
– 1K record read will transfer about logN blocks from disk
 Writes are slow on inserts
– inserted into correct place right away
– otherwise read will not find it
– requires btree to be continuously rebalanced
– causes extreme random i/o in insert path
– W = 2.5x + logN seeks if no caching
Let’s look at some
Performance Numbers
for proof
29©MapR Technologies - Confidential
M7 vs. CDH: 50-50 Mix (Reads)
30©MapR Technologies - Confidential
M7 vs. CDH: 50-50 load (read latency)
31©MapR Technologies - Confidential
M7 vs. CDH: 50-50 Mix (Updates)
32©MapR Technologies - Confidential
M7 vs. CDH: 50-50 mix (update latency)
33©MapR Technologies - Confidential
MapR M7 Accelerates HBase Applications
Benchmark MapR 3.0.1
(M7)
CDH 4.3.0
(HBase)
MapR
Increase
50% read,
50% update
8000 1695 5.5x
95% read, 5%
update
3716 602 6x
Reads 5520 764 7.2x
Scans
(50 rows)
1080 156 6.9x
CPU: 2 x Intel Xeon CPU E5645 2.40GHz 12 cores
RAM: 48GB
Disk: 12 x 3TB (7200 RPM)
Record size: 1KB
Data size: 2TB
OS: CentOS Release 6.2 (Final)
Benchmark MapR 3.0.1
(M7)
CDH 4.3.0
(HBase)
MapR
Increase
50% read,
50% update
21328 2547 8.4x
95% read, 5%
update
13455 2660 5x
Reads 18206 1605 11.3x
Scans
(50 rows)
1298 116 11.2x
CPU: 2 x Intel Xeon CPU E5620 2.40GHz 8 cores
RAM: 24GB
Disk: 1 x 1.2TB Fusion I/O ioDrive2
Record size: 1KB
Data size: 600GB
OS: CentOS Release 6.3 (Final)
MapR speedup with HDDs: 5x-7x MapR speedup with SSD: 5x-11.3x
34©MapR Technologies - Confidential
M7: Fileservers Serve Regions
 Region lives entirely inside a container
– Does not coordinate through ZooKeeper
 Containers support distributed transactions
– with replication built-in
 Only coordination in the system is for splits
– Between region-map and data-container
– already solved this problem for files and its chunks
35©MapR Technologies - Confidential
Server Reboot
 Full container-reports are tiny
– CLDB needs 2G dram for 1000-node cluster
 Volumes come online very fast
– each volume independent of others
– as soon as min-repl # of containers ready
36©MapR Technologies - Confidential
Server Reboot
 Full container-reports are tiny
– CLDB needs 2G dram for 1000-node cluster
 Volumes come online very fast
– each volume independent of others
– as soon as min-repl # of containers ready
– does not wait for whole cluster
(eg, HDFS waits for 99.9% blocks reporting)
37©MapR Technologies - Confidential
Server Reboot
 Full container-reports are tiny
– CLDB needs 2G dram for 1000-node cluster
 Volumes come online very fast
– each volume independent of others
– as soon as min-repl # of containers ready
– does not wait for whole cluster
(eg, HDFS waits for 99.9% blocks reporting)
 1000-node cluster restart < 5 mins
38©MapR Technologies - Confidential
M7 provides Instant Recovery
 0-40 microWALs per region
– idle WALs go to zero quickly, so most are empty
– region is up before all microWALs are recovered
– recovers region in background in parallel
– when a key is accessed, that microWAL is recovered inline
– 1000-10000x faster recovery
39©MapR Technologies - Confidential
M7 provides Instant Recovery
 0-40 microWALs per region
– idle WALs go to zero quickly, so most are empty
– region is up before all microWALs are recovered
– recovers region in background in parallel
– when a key is accessed, that microWAL is recovered inline
– 1000-10000x faster recovery
 Why doesn't HBase do this?
– M7 leverages unique MapR-FS capabilities, not impacted by HDFS
limitations
– No limit to # of files on disk
– No limit to # open files
– I/O path translates random writes to sequential writes on disk
40©MapR Technologies - Confidential
MapR M7 Accelerates HBase Applications
Benchmark MapR 3.0.1
(M7)
CDH 4.3.0
(HBase)
MapR
Increase
50% read,
50% update
8000 1695 5.5x
95% read, 5%
update
3716 602 6x
Reads 5520 764 7.2x
Scans
(50 rows)
1080 156 6.9x
CPU: 2 x Intel Xeon CPU E5645 2.40GHz 12 cores
RAM: 48GB
Disk: 12 x 3TB (7200 RPM)
Record size: 1KB
Data size: 2TB
OS: CentOS Release 6.2 (Final)
Benchmark MapR 3.0.1
(M7)
CDH 4.3.0
(HBase)
MapR
Increase
50% read,
50% update
21328 2547 8.4x
95% read, 5%
update
13455 2660 5x
Reads 18206 1605 11.3x
Scans
(50 rows)
1298 116 11.2x
CPU: 2 x Intel Xeon CPU E5620 2.40GHz 8 cores
RAM: 24GB
Disk: 1 x 1.2TB Fusion I/O ioDrive2
Record size: 1KB
Data size: 600GB
OS: CentOS Release 6.3 (Final)
MapR speedup with HDDs: 5x-7x MapR speedup with SSD: 5x-11.3x

More Related Content

What's hot

Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Modern Data Stack France
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Mohamed Ali Mahmoud khouder
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadea
viadea
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 

What's hot (20)

Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Yahoo's Experience Running Pig on Tez at Scale
Yahoo's Experience Running Pig on Tez at ScaleYahoo's Experience Running Pig on Tez at Scale
Yahoo's Experience Running Pig on Tez at Scale
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
February 2014 HUG : Hive On Tez
February 2014 HUG : Hive On TezFebruary 2014 HUG : Hive On Tez
February 2014 HUG : Hive On Tez
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadea
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
Spark vstez
Spark vstezSpark vstez
Spark vstez
 

Viewers also liked

Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Databricks
 

Viewers also liked (12)

Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
 
MapR Tutorial Series
MapR Tutorial SeriesMapR Tutorial Series
MapR Tutorial Series
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
 
MapR Data Analyst
MapR Data AnalystMapR Data Analyst
MapR Data Analyst
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 

Similar to MapR M7: Providing an enterprise quality Apache HBase API

Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
Cisco Canada
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batch
boorad
 
Shak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalShak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-final
Tommy Lee
 

Similar to MapR M7: Providing an enterprise quality Apache HBase API (20)

Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance
Shift into High Gear: Dramatically Improve Hadoop & NoSQL PerformanceShift into High Gear: Dramatically Improve Hadoop & NoSQL Performance
Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
HBase with MapR
HBase with MapRHBase with MapR
HBase with MapR
 
Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0
 
03 Hadoop
03 Hadoop03 Hadoop
03 Hadoop
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batch
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
 
Shak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalShak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-final
 
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

MapR M7: Providing an enterprise quality Apache HBase API

  • 1. 1©MapR Technologies - Confidential M7 Technical Overview M. C. Srivas CTO/Founder, MapR
  • 2. 2©MapR Technologies - Confidential MapR: Lights Out Data Center Ready • Automated stateful failover • Automated re-replication • Self-healing from HW and SW failures • Load balancing • Rolling upgrades • No lost jobs or data • 99999’s of uptime Reliable Compute Dependable Storage • Business continuity with snapshots and mirrors • Recover to a point in time • End-to-end check summing • Strong consistency • Built-in compression • Mirror between two sites by RTO policy
  • 3. 3©MapR Technologies - Confidential MapR does MapReduce (fast) TeraSort Record 1 TB in 54 seconds 1003 nodes MinuteSort Record 1.5 TB in 59 seconds 2103 nodes
  • 4. 4©MapR Technologies - Confidential MapR does MapReduce (faster) TeraSort Record 1 TB in 54 seconds 1003 nodes MinuteSort Record 1.5 TB in 59 seconds 2103 nodes 1.65 300
  • 5. 5©MapR Technologies - Confidential Dynamo DB ZopeDB Shoal CloudKit Vertex DB FlockD B NoSQL
  • 6. 6©MapR Technologies - Confidential HBase Table Architecture  Tables are divided into key ranges (regions)  Regions are served by nodes (RegionServers)  Columns are divided into access groups (columns families) CF1 CF2 CF3 CF4 CF5 R1 R2 R3 R4
  • 7. 7©MapR Technologies - Confidential HBase Architecture is Better  Strong consistency model – when a write returns, all readers will see same value – "eventually consistent" is often "eventually inconsistent"  Scan works – does not broadcast – ring-based NoSQL databases (eg, Cassandra, Riak) suffer on scans  Scales automatically – Splits when regions become too large – Uses HDFS to spread data, manage space  Integrated with Hadoop – map-reduce on HBase is straightforward
  • 8. 8©MapR Technologies - Confidential M7 An integrated system for unstructured and structured data
  • 9. 9©MapR Technologies - Confidential MapR M7 Tables  Binary compatible with Apache HBase – no recompilation needed to access M7 tables – Just set CLASSPATH – including HBase CLI  M7 tables accessed via pathname – openTable( "hello") … uses HBase – openTable( "/hello") … uses M7 – openTable( "/user/srivas/hello") … uses M7 9
  • 10. 10©MapR Technologies - Confidential Binary Compatible  HBase applications work "as is" with M7 – No need to recompile , just set CLASSPATH  Can run M7 and HBase side-by-side on the same cluster – eg, during a migration – can access both M7 table and HBase table in same program  Use standard Apache HBase CopyTable tool to copy a table from HBase to M7 or vice-versa, viz., % hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=/user/srivas/mytable oldtable
  • 11. 11©MapR Technologies - Confidential Features  Unlimited number of tables – HBase is typically 10-20 tables (max 100)  No compaction  Instant-On – zero recovery time  8x insert/update perf  10x random scan perf  10x faster with flash - special flash support 11
  • 12. 12©MapR Technologies - Confidential M7: Remove Layers, Simplify MapR M7
  • 13. 13©MapR Technologies - Confidential M7 tables in a MapR Cluster  M7 tables integrated into storage – always available on every node – no separate process to start/stop/monitor – zero administration – no tuning parameters … just works  M7 tables work 'as expected' – First copy local to writing client – Snapshots and mirrors – Quotas , repl factor, data placement 13
  • 14. 14©MapR Technologies - Confidential Unified Namespace for Files and Tables $ pwd /mapr/default/user/dave $ ls file1 file2 table1 table2 $ hbase shell hbase(main):003:0> create '/user/dave/table3', 'cf1', 'cf2', 'cf3' 0 row(s) in 0.1570 seconds $ ls file1 file2 table1 table2 table3 $ hadoop fs -ls /user/dave Found 5 items -rw-r--r-- 3 mapr mapr 16 2012-09-28 08:34 /user/dave/file1 -rw-r--r-- 3 mapr mapr 22 2012-09-28 08:34 /user/dave/file2 trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:32 /user/dave/table1 trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:33 /user/dave/table2 trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:38 /user/dave/table3
  • 15. 15©MapR Technologies - Confidential M7 – An Integrated System
  • 16. 16©MapR Technologies - Confidential Tables for End Users  Users can create and manage their own tables – Unlimited # of tables – first copy local  Tables can be created in any directory – Tables count towards volume and user quotas  No admin intervention needed – do stuff on the fly, no stop/restart servers  Automatic data protection and disaster recovery – Users can recover from snapshots/mirrors on their own
  • 17. 17©MapR Technologies - Confidential M7 combines the best of LSM and BTrees  LSM Trees reduce insert cost by deferring and batching index changes – If don't compact often, read perf is impacted – If compact too often, write perf is impacted  B-Trees are great for reads – but expensive to update in real-time Can we combine both ideas? Writes cannot be done better than W = 2.5x write to log + write data to somewhere + update meta-data
  • 18. 18©MapR Technologies - Confidential M7 from MapR  Twisting BTree's – leaves are variable size (8K - 8M or larger) – can stay unbalanced for long periods of time • more inserts will balance it eventually • automatically throttles updates to interior btree nodes – M7 inserts "close to" where the data is supposed to go  Reads – Uses BTree structure to get "close" very fast • very high branching with key-prefix-compression – Utilizes a separate lower-level index to find it exactly • updated "in-place"bloom-filters for gets, range-maps for scans  Overhead – 1K record read will transfer about 32K from disk in logN seeks
  • 19. 19©MapR Technologies - Confidential M7 Comparative Analysis with Apache HBase, Level-DB and a BTree
  • 20. 20©MapR Technologies - Confidential Apache HBase HFile Structure 64Kbyte blocks are compressed An index into the compressed blocks is created as a btree Key-value pairs are laid out in increasing order Each cell is an individual key + value - a row repeats the key for each column
  • 21. 21©MapR Technologies - Confidential HBase Region Operation  Typical region size is a few GB, sometimes even 10G or 20G  RS holds data in memory until full, then writes a new HFile – Logical view of database constructed by layering these files, with the latest on top Key range represented by this region newest oldest
  • 22. 22©MapR Technologies - Confidential HBase Read Amplification  When a get/scan comes in, all the files have to be examined – schema-less, so where is the column? – Done in-memory and does not change what's on disk • Bloom-filters do not help in scans newest oldest With 7 files, a 1K-record get () takes about 30 seeks, 7 block decompressions, and a total data transfer of about 130K from HDFS.
  • 23. 23©MapR Technologies - Confidential HBase Write Amplification  To reduce the read-amplification, HBase merges the HFiles periodically – process called compaction – runs automatically when too many files – usually turned off due to I/O storms – and kicked-off manually on weekends Compaction reads all files and merges into a single HFile
  • 24. 24©MapR Technologies - Confidential HBase Compaction Analysis  Assume 10G per region, write 10% per day, grow 10% per week – 1G of writes – after 7 days, 7 files of 1G and 1file of 10G  Compaction – Total reads: 17G (= 7 x 1G + 1 x 10G) – Total writes: 25G (= 7G wal + 7G flush + 11G write to new HFile)  500 regions – read 8.5T, write 12.5T  major outage on node – with fewer hfiles, it only gets worse  Best practice, serve < 500g per node (50 regions)
  • 25. 25©MapR Technologies - Confidential Level-DB  Tiered, logarithmic increase – L1: 2 x 1M files – L2: 10 x 1M – L3: 100 x 1M – L4: 1,000 x 1M, etc  Compaction overhead – avoids IO storms (i/o done in smaller increments of ~10M) – but significantly extra bandwidth compared to HBase  Read overhead is still high – 10-15 seeks, perhaps more if the lowest level is very large – 40K - 60K read from disk to retrieve a 1K record
  • 26. 26©MapR Technologies - Confidential BTree analysis  Read finds data directly, proven to be fastest – interior nodes only hold keys – very large branching factor – values only at leaves – thus caches work – R = logN seeks, if no caching – 1K record read will transfer about logN blocks from disk  Writes are slow on inserts – inserted into correct place right away – otherwise read will not find it – requires btree to be continuously rebalanced – causes extreme random i/o in insert path – W = 2.5x + logN seeks if no caching
  • 27. Let’s look at some Performance Numbers for proof
  • 28. 29©MapR Technologies - Confidential M7 vs. CDH: 50-50 Mix (Reads)
  • 29. 30©MapR Technologies - Confidential M7 vs. CDH: 50-50 load (read latency)
  • 30. 31©MapR Technologies - Confidential M7 vs. CDH: 50-50 Mix (Updates)
  • 31. 32©MapR Technologies - Confidential M7 vs. CDH: 50-50 mix (update latency)
  • 32. 33©MapR Technologies - Confidential MapR M7 Accelerates HBase Applications Benchmark MapR 3.0.1 (M7) CDH 4.3.0 (HBase) MapR Increase 50% read, 50% update 8000 1695 5.5x 95% read, 5% update 3716 602 6x Reads 5520 764 7.2x Scans (50 rows) 1080 156 6.9x CPU: 2 x Intel Xeon CPU E5645 2.40GHz 12 cores RAM: 48GB Disk: 12 x 3TB (7200 RPM) Record size: 1KB Data size: 2TB OS: CentOS Release 6.2 (Final) Benchmark MapR 3.0.1 (M7) CDH 4.3.0 (HBase) MapR Increase 50% read, 50% update 21328 2547 8.4x 95% read, 5% update 13455 2660 5x Reads 18206 1605 11.3x Scans (50 rows) 1298 116 11.2x CPU: 2 x Intel Xeon CPU E5620 2.40GHz 8 cores RAM: 24GB Disk: 1 x 1.2TB Fusion I/O ioDrive2 Record size: 1KB Data size: 600GB OS: CentOS Release 6.3 (Final) MapR speedup with HDDs: 5x-7x MapR speedup with SSD: 5x-11.3x
  • 33. 34©MapR Technologies - Confidential M7: Fileservers Serve Regions  Region lives entirely inside a container – Does not coordinate through ZooKeeper  Containers support distributed transactions – with replication built-in  Only coordination in the system is for splits – Between region-map and data-container – already solved this problem for files and its chunks
  • 34. 35©MapR Technologies - Confidential Server Reboot  Full container-reports are tiny – CLDB needs 2G dram for 1000-node cluster  Volumes come online very fast – each volume independent of others – as soon as min-repl # of containers ready
  • 35. 36©MapR Technologies - Confidential Server Reboot  Full container-reports are tiny – CLDB needs 2G dram for 1000-node cluster  Volumes come online very fast – each volume independent of others – as soon as min-repl # of containers ready – does not wait for whole cluster (eg, HDFS waits for 99.9% blocks reporting)
  • 36. 37©MapR Technologies - Confidential Server Reboot  Full container-reports are tiny – CLDB needs 2G dram for 1000-node cluster  Volumes come online very fast – each volume independent of others – as soon as min-repl # of containers ready – does not wait for whole cluster (eg, HDFS waits for 99.9% blocks reporting)  1000-node cluster restart < 5 mins
  • 37. 38©MapR Technologies - Confidential M7 provides Instant Recovery  0-40 microWALs per region – idle WALs go to zero quickly, so most are empty – region is up before all microWALs are recovered – recovers region in background in parallel – when a key is accessed, that microWAL is recovered inline – 1000-10000x faster recovery
  • 38. 39©MapR Technologies - Confidential M7 provides Instant Recovery  0-40 microWALs per region – idle WALs go to zero quickly, so most are empty – region is up before all microWALs are recovered – recovers region in background in parallel – when a key is accessed, that microWAL is recovered inline – 1000-10000x faster recovery  Why doesn't HBase do this? – M7 leverages unique MapR-FS capabilities, not impacted by HDFS limitations – No limit to # of files on disk – No limit to # open files – I/O path translates random writes to sequential writes on disk
  • 39. 40©MapR Technologies - Confidential MapR M7 Accelerates HBase Applications Benchmark MapR 3.0.1 (M7) CDH 4.3.0 (HBase) MapR Increase 50% read, 50% update 8000 1695 5.5x 95% read, 5% update 3716 602 6x Reads 5520 764 7.2x Scans (50 rows) 1080 156 6.9x CPU: 2 x Intel Xeon CPU E5645 2.40GHz 12 cores RAM: 48GB Disk: 12 x 3TB (7200 RPM) Record size: 1KB Data size: 2TB OS: CentOS Release 6.2 (Final) Benchmark MapR 3.0.1 (M7) CDH 4.3.0 (HBase) MapR Increase 50% read, 50% update 21328 2547 8.4x 95% read, 5% update 13455 2660 5x Reads 18206 1605 11.3x Scans (50 rows) 1298 116 11.2x CPU: 2 x Intel Xeon CPU E5620 2.40GHz 8 cores RAM: 24GB Disk: 1 x 1.2TB Fusion I/O ioDrive2 Record size: 1KB Data size: 600GB OS: CentOS Release 6.3 (Final) MapR speedup with HDDs: 5x-7x MapR speedup with SSD: 5x-11.3x