SlideShare a Scribd company logo
1 of 49
HBase
Operations
&
Best Practices
Venu Anuganti
July 2013
http://scalein.com/
Blog: http://venublog.com/
Twitter: @vanuganti
Who am I
o Data Architect, Technology Advisor
o Founder of ScaleIN, Data Consulting Company, 5+ years
o 100+ companies, 20+ from Fortune 200
o http://scalein.com/
o Architect, Implement & Support SQL, NoSQL and BigData
Solutions
 Industry: Databases, Games, Social, Video, SaaS,
Analytics, Warehouse, Web, Financial, Mobile,
Advertising & SEM Marketing
Agenda
 BigData - Hadoop & HBase Overview
 BigData Architecture
 HBase Cluster Setup Walkthrough
 High Availability
 Backup and Restore
 Operational Best Practices
BigData Overview
BigData Trends
• BigData is the latest industry buzz, many companies
adopting or migrating
o Not a replacement for OLTP or RDBMS systems
• Gartner – 28B in 2012 & 34B in 2013 spend
o 2013 top-10 technology trends – 6th place
• Solves large data problems that existed for years
o Social, User, Mobile growth demanded such a solution
o Google “BigTable” is the key, followed by Amazon “Dynamo”;
new papers like Dremel drives it further
o Hadoop & ecosystem is becoming synonym for BigData
• Combines vast structured/un-structured data
o Overcomes from legacy warehouse model
o Brings data analytics & data science
o Real-time, mining, insights, discovery & complex reporting
BigData
• Key factors - Pros
 Can handle any size
 Commodity hardware
 Scalable, Distributed, Highly
Available
 Ecosystem & growing
community
• Key factors – Cons
 Latency
 Hardware evolution, even
though designed for
commodity
 Does not fit for all
BigData Architecture
Low Level Architecture
Why HBase
Why HBase
• HBase is proven, widely adopted
 Tightly coupled with hadoop ecosystem
 Almost all major data driven companies using it
• Scales linearly
 Read performance is its core; random, sequential reads
 Can store tera/peta bytes of data
 Large scale scans, millions of records
 Highly distributed
• CAP Theorem – HBase is CP driven
• Competition: Cassandra (AP)
Hadoop/HBase
Cluster Setup
Cluster Components
3 Major Components
 Master(s)
 HMaster
 Coordination
 Zookeeper
 Slave(s)
 Region server
Name Node
HMaster
Zookeeper
MASTER
Data Node
Region Server
SLAVE 1
Data Node
Region Server
SLAVE 3
Data Node
Region Server
SLAVE 2
How It Works
HMASTERDDLCLIENT
HDFS
REGION SERVERS
RS RS RS
ZOOKEEPER CLUSTER
ZK ZK ZK
Zookeeper
 Zookeeper
o Coordination for entire cluster
o Master selection
o Root region server lookup
o Node registration
o Client always communicates with Zookeper for lookups
(cached for sub-sequent calls)
hbase(main):001:0> zk "ls /hbase"
[safe-mode, root-region-server, rs, master, shutdown,
replication]
Zookeeper Setup
 Zookeeper
• Dedicated nodes in the cluster
• Always in odd number
• Disk, memory, cpu usage is low
• Availability is a key
Master Node
 HMaster
o Typically runs with Name Node
o Monitors all region servers, handles RS failover
o Handles all meta data changes
o Assigns regions
o Interface for all meta data changes
o Load balancing on idle times
Master Setup
• Dedicated Master Node
o Light on use, but should be on reliable hardware
o Good amount of memory and CPU can help
o Disk space is pretty nominal
• Must Have Redundancy
o Avoid single point of failure (SPOF)
o RAID preferred for redundancy or even JBOD
o DRBD or NFS is also preferred
Region Server
 Region Server
o Handles all I/O requests
o Flush MemStore to HDFS
o Splitting
o Compaction
o Basic element of table storage
o Table => Regions => Store per Column Family => CF => MemStore /
CF/Region && StoreFile /Store/Region => Block
o Maintains WAL (Write Ahead Log) for all changes
Region Server - Setup
• Should be stand-alone and dedicated
o JBOD disks
o In-expensive
o Data node and region server should be co-located
• Network
o Dual 1G, 10G or InfiniBand, DNS lookup free
• Replication - at least 3, locality
• Region size for splits; too many or too small
regions are not good.
Cluster Setup – 10 Node
High Availability
High Availability
• HBase Cluster - Failure Candidates
 Data Center
 Cluster
 Rack
 Network Switch
 Power Strip
 Region or Data Node
 Zookeeper Node
 HBase Master
 Name Node
HA - Data Center
• Cross data center, geo distributed
• Replication is the only solution
 Up2date data
 Active-active
 Active-passive
 Costly (can be sized)
 Need dedicated network
• On-demand offline cluster
 Only for disaster recovery
 No up2date copy
 Can be sized appropriately
 Need to reprocess for latest data
HA – Redundant Cluster
• Redundant cluster within a data center using
replication
• Mainly to have backup cluster for disasters
 Up2date data
 Restore a state back using TTL based
 Restore deleted data by keeping deleted cells
 Run backups
 Read/write distributed with load balancer
 Support development or provide on-demand data
 Support low important activities
• Best practice: Avoid redundant cluster, rather have
one big cluster with high redundancy
HA – Rack, Network, Power
• Cluster nodes should be rack and switch aware
• Loosing a rack or a network switch should not bring
cluster down
• Hadoop has built-in rack awareness
 Assign nodes based on rack diagram
 Redundant nodes are within rack, across switch and
rack
 Manual or automatic setup to detect location
• Redundant power and network within each node
(master)
HA – Region Servers
• Loosing a region server or data node is very
common, in many cases it could be very frequent
• They are distributed and replicated
• Can be added/removed dynamically, taken out for
regular maintenance
• Replication factor of 3
– Can loose ⅔rd of the cluster nodes
• Replication factor of 4
– Can loose ¾th of the cluster nodes
HA – Zookeeper
• Zookeeper nodes are distributed
• Can be added/removed dynamically
• Should be implemented in odd number, due to
quorum (majority voting wins the active state)
• If 4, can loose 1 node (3 major voting)
• If 5, can loose 2 nodes (3 major voting)
• If 6, can loose 2 nodes (4 major voting)
• If 7, can loose 3 nodes (4 major voting)
• Best Practice: 5 or 7 with dedicated hardware.
HA – HMaster
• HMaster - single point of failure
• HA - Multiple HMaster nodes within a cluster
 Zookeeper co-ordinates master failure
 Only one active at any given point of time
 Best practice: 2-3 HMasters, 1 per rack
Scalability
How to scale
• By design, cluster is highly distributed and scalable
• Keep adding more region servers to scale
 Region splits
 Replication factor
 Row key design is a key factor for scaling writes
 No single “hot” region
 Bulk loading, pre-split
 Native java access X other protocols like thrift
 Compaction at regular intervals
Performance
 Benchmarking is a key
• Nothing fits for all
• Simulate use cases and run the tests
oBulk loading
oRandom access, read/write
oBulk processing
oScan, filter
• Negative performance
oReplication factor
oZookeeper nodes
oNetwork latency
oSlower disks, CPUs
oHot regions, Bad row key or Bulk loading without pre-splits
Tuning
 Tune the cluster to best fit the environment
• Block Size, LRU cache, 64K default, per CF
• JBOD
• Memstore
• Compaction, manual
• WAL flush
• Avoid long GC pauses, JVM
• Region size, small is better, split based on “hot”
• Batch size
• In-memory column families
• Compression, LZO
• Timeouts
• Region handler count, threads/region
• Speculative execution
• Balancer, manual
Backup
&
(Point-in-time ) Restore
Backup - Built-in
• In general no external backup needed
• HBase is highly distributed and has built-in
versioning, data retention policy
 No need to backup just for redundancy
 Point-in-time restore:
• Use TTL/Table/CF/C and keep the history for X hours/days
 Accidental deletes:
• Use ‘KeepDeletedCells’ to keep all deleted data
Backup - Tools
• Use Export/Import tool
 Based on timestamp; and use it for point-in-time
backup/restore
• Use region snapshots
 Take HFile snapshots and copy them over to new
storage location
 Copy Hlog files for point-in-time roll-forward from
snapshot time (replay using WALPlayer post import).
 Table snapshots (0.94.6+)
Backup - Replication
• Use replicated cluster as one of the backup /
disaster recovery
• Statement based, write ahead log (WAL, HLog)
from each region server
 Asynchronous
 Active Active using 1-1 replication
 Active Passive using 1-N replication
 Can be of same or different node size
 0.92 onwards Active Active possible
Operational
Best Practices
Hardware
• Commodity Hardware
• 1U or 2U preferred, avoid 4U or NAS or expensive
systems
• JBOD on slaves, RAID 1+0 on masters
• No SSDs, No virtualized storage
• Good number of cores (4-16), HT enabled
• Good amount of RAM (24-72G)
• Dual 1G network, 10G or InfiniBand
Disks
• SATA, 7/10/15K, cheaper the better
• Use RAID firmware drives, faster error detection &
enable disks to fail on h/w errors
• Limit to 6/8 drives on 8 core, allow 1 drive/core
= 100 IOPS/Drive
= 4 * 1T = 4T, 400 IOPS, 400MB
= 8 * 500G = 4T, 800 IOPS
= not beyond 800/900MB/sec due to n/w saturation
• Ext3/ext4/XFS
• Mount => noatime, nodiratime
OS, Kernel
• RHEL or CentOS or Ubuntu
• Swappiness=0, and no swap files
• File limits to hadoop user
(/etc/security/limits.conf) => 64/128K
• JVM GC, HBase heap
• NTP
• Block size
Automation
• Automation is a key in distributed cluster setup
 To easily launch a new node
 To restore to base state
 Keep same packages, configurations across the cluster
• Use puppet/Chef/Existing process
 Keep as much as possible puppetized
 No accidental upgrades as it can restart the service
• Cloudera Manager (CM) for any node
management tasks
 You can also puppetize & automate the process
 CM will install all necessary packages
Load Balancer
• Internal
 Periodically run balancer to ensure data distribution
among region servers
• hadoop-daemon.sh start balancer -threshold 10
• External
 Has built-in load balancing capability
 If using thrift bindings; then thrift servers needs to be
load balanced
 Future versions will address thrift balancing as well
Upgrades
• In general upgrades should be well planned
• To update changes to cluster nodes (OS, configs,
hardware, etc.); you can also do rolling restart
without taking cluster down
• Hadoop/HBase supports simple upgrade paths
with rollback strategy to go back to old version
• Make sure HBase/Hadoop versions are compatible
• Use rolling restart for minor version upgrades
Monitoring
• Quick Checks
 Use built-in web tools
 Cloudera manager
 Command line tools or wrapper scripts
• RRD, Monitoring
 Cloudera manager
 Ganglia, Cacti, Nagios, NewRelic
 OpenTSDB
 Need proper alerting system for all events
 Threshold monitoring for any surprises
Alerting System
 Need proper alerting system
 JMX exposes all metrics
 Ops Dashboard (Ganglia, Cacti, OpenTSDB, NewRelic)
 Small dashboard for critical events
 Define proper levels for escalation
 Critical
 Loosing a Master or ZooKeeper Node
 +/- 10% drop in performance or latency
 Key thresholds (load, swap, IO)
 Loosing 2 or more slave nodes
 Disk failures
 Loosing a single slave node (critical in prime time)
 Un-balanced nodes
 FATAL errors in logs
Case Study
Case Study - 1
• 110 node cluster
 Dual Quad Core, Intel Xeon, 2.2GHz
 48G, no swap
 6 2T SATA, 7K
 Ubuntu 11.04
 Puppet
 Fabric for running commands on all nodes
 /home/hadoop is everything, symlinks
 Nagios
 OpenTSDB for Trending points, dashboard
 M/R limited to 50% of available RAM
Questions ?
• http://scalein.com/
• http://venublog.com/
• venu@venublog.com
• Twitter: @vanuganti

More Related Content

What's hot

Oracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsOracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsAnil Nair
 
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceImproving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceDataWorks Summit
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationDatabricks
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Oracle Exadata Cloud Services guide from practical experience - OOW19
Oracle Exadata Cloud Services guide from practical experience - OOW19Oracle Exadata Cloud Services guide from practical experience - OOW19
Oracle Exadata Cloud Services guide from practical experience - OOW19Nelson Calero
 
VMware HCI solutions - 2020-01-16
VMware HCI solutions - 2020-01-16VMware HCI solutions - 2020-01-16
VMware HCI solutions - 2020-01-16David Pasek
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Building a Stretched Cluster using Virtual SAN 6.1
Building a Stretched Cluster using Virtual SAN 6.1Building a Stretched Cluster using Virtual SAN 6.1
Building a Stretched Cluster using Virtual SAN 6.1Duncan Epping
 
Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)Anastasia Lubennikova
 
MAA Best Practices for Oracle Database 19c
MAA Best Practices for Oracle Database 19cMAA Best Practices for Oracle Database 19c
MAA Best Practices for Oracle Database 19cMarkus Michalewicz
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Five common customer use cases for Virtual SAN - VMworld US / 2015
Five common customer use cases for Virtual SAN - VMworld US / 2015Five common customer use cases for Virtual SAN - VMworld US / 2015
Five common customer use cases for Virtual SAN - VMworld US / 2015Duncan Epping
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridJames Serra
 
The Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - PresentationThe Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - PresentationMarkus Michalewicz
 

What's hot (20)

Postgre sql vs oracle
Postgre sql vs oraclePostgre sql vs oracle
Postgre sql vs oracle
 
Oracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret InternalsOracle RAC 19c: Best Practices and Secret Internals
Oracle RAC 19c: Best Practices and Secret Internals
 
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceImproving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Ceph
CephCeph
Ceph
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Oracle Exadata Cloud Services guide from practical experience - OOW19
Oracle Exadata Cloud Services guide from practical experience - OOW19Oracle Exadata Cloud Services guide from practical experience - OOW19
Oracle Exadata Cloud Services guide from practical experience - OOW19
 
VMware HCI solutions - 2020-01-16
VMware HCI solutions - 2020-01-16VMware HCI solutions - 2020-01-16
VMware HCI solutions - 2020-01-16
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Building a Stretched Cluster using Virtual SAN 6.1
Building a Stretched Cluster using Virtual SAN 6.1Building a Stretched Cluster using Virtual SAN 6.1
Building a Stretched Cluster using Virtual SAN 6.1
 
Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)
 
MAA Best Practices for Oracle Database 19c
MAA Best Practices for Oracle Database 19cMAA Best Practices for Oracle Database 19c
MAA Best Practices for Oracle Database 19c
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Five common customer use cases for Virtual SAN - VMworld US / 2015
Five common customer use cases for Virtual SAN - VMworld US / 2015Five common customer use cases for Virtual SAN - VMworld US / 2015
Five common customer use cases for Virtual SAN - VMworld US / 2015
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybrid
 
The Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - PresentationThe Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - Presentation
 

Similar to HBase Operations and Best Practices

Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisMike Pittaro
 
Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis PyData
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanNarayana B
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfsNAVER D2
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisMike Pittaro
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data Analysisodsc
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)Lars Marowsky-Brée
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoBig Data Joe™ Rossi
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation Yahoo Developer Network
 
Column Stores and Google BigQuery
Column Stores and Google BigQueryColumn Stores and Google BigQuery
Column Stores and Google BigQueryCsaba Toth
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxStrata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxManish Maheshwari
 
Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) PostgreSQL Experts, Inc.
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsSpeedment, Inc.
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Colin Charles
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesHazelcast
 

Similar to HBase Operations and Best Practices (20)

Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data Analysis
 
Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_Plan
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data Analysis
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data Analysis
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
Column Stores and Google BigQuery
Column Stores and Google BigQueryColumn Stores and Google BigQuery
Column Stores and Google BigQuery
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxStrata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptx
 
Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009)
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

HBase Operations and Best Practices

  • 1. HBase Operations & Best Practices Venu Anuganti July 2013 http://scalein.com/ Blog: http://venublog.com/ Twitter: @vanuganti
  • 2. Who am I o Data Architect, Technology Advisor o Founder of ScaleIN, Data Consulting Company, 5+ years o 100+ companies, 20+ from Fortune 200 o http://scalein.com/ o Architect, Implement & Support SQL, NoSQL and BigData Solutions  Industry: Databases, Games, Social, Video, SaaS, Analytics, Warehouse, Web, Financial, Mobile, Advertising & SEM Marketing
  • 3. Agenda  BigData - Hadoop & HBase Overview  BigData Architecture  HBase Cluster Setup Walkthrough  High Availability  Backup and Restore  Operational Best Practices
  • 5. BigData Trends • BigData is the latest industry buzz, many companies adopting or migrating o Not a replacement for OLTP or RDBMS systems • Gartner – 28B in 2012 & 34B in 2013 spend o 2013 top-10 technology trends – 6th place • Solves large data problems that existed for years o Social, User, Mobile growth demanded such a solution o Google “BigTable” is the key, followed by Amazon “Dynamo”; new papers like Dremel drives it further o Hadoop & ecosystem is becoming synonym for BigData • Combines vast structured/un-structured data o Overcomes from legacy warehouse model o Brings data analytics & data science o Real-time, mining, insights, discovery & complex reporting
  • 6. BigData • Key factors - Pros  Can handle any size  Commodity hardware  Scalable, Distributed, Highly Available  Ecosystem & growing community • Key factors – Cons  Latency  Hardware evolution, even though designed for commodity  Does not fit for all
  • 8.
  • 11. Why HBase • HBase is proven, widely adopted  Tightly coupled with hadoop ecosystem  Almost all major data driven companies using it • Scales linearly  Read performance is its core; random, sequential reads  Can store tera/peta bytes of data  Large scale scans, millions of records  Highly distributed • CAP Theorem – HBase is CP driven • Competition: Cassandra (AP)
  • 13. Cluster Components 3 Major Components  Master(s)  HMaster  Coordination  Zookeeper  Slave(s)  Region server Name Node HMaster Zookeeper MASTER Data Node Region Server SLAVE 1 Data Node Region Server SLAVE 3 Data Node Region Server SLAVE 2
  • 14. How It Works HMASTERDDLCLIENT HDFS REGION SERVERS RS RS RS ZOOKEEPER CLUSTER ZK ZK ZK
  • 15. Zookeeper  Zookeeper o Coordination for entire cluster o Master selection o Root region server lookup o Node registration o Client always communicates with Zookeper for lookups (cached for sub-sequent calls) hbase(main):001:0> zk "ls /hbase" [safe-mode, root-region-server, rs, master, shutdown, replication]
  • 16. Zookeeper Setup  Zookeeper • Dedicated nodes in the cluster • Always in odd number • Disk, memory, cpu usage is low • Availability is a key
  • 17. Master Node  HMaster o Typically runs with Name Node o Monitors all region servers, handles RS failover o Handles all meta data changes o Assigns regions o Interface for all meta data changes o Load balancing on idle times
  • 18. Master Setup • Dedicated Master Node o Light on use, but should be on reliable hardware o Good amount of memory and CPU can help o Disk space is pretty nominal • Must Have Redundancy o Avoid single point of failure (SPOF) o RAID preferred for redundancy or even JBOD o DRBD or NFS is also preferred
  • 19. Region Server  Region Server o Handles all I/O requests o Flush MemStore to HDFS o Splitting o Compaction o Basic element of table storage o Table => Regions => Store per Column Family => CF => MemStore / CF/Region && StoreFile /Store/Region => Block o Maintains WAL (Write Ahead Log) for all changes
  • 20. Region Server - Setup • Should be stand-alone and dedicated o JBOD disks o In-expensive o Data node and region server should be co-located • Network o Dual 1G, 10G or InfiniBand, DNS lookup free • Replication - at least 3, locality • Region size for splits; too many or too small regions are not good.
  • 21. Cluster Setup – 10 Node
  • 23. High Availability • HBase Cluster - Failure Candidates  Data Center  Cluster  Rack  Network Switch  Power Strip  Region or Data Node  Zookeeper Node  HBase Master  Name Node
  • 24. HA - Data Center • Cross data center, geo distributed • Replication is the only solution  Up2date data  Active-active  Active-passive  Costly (can be sized)  Need dedicated network • On-demand offline cluster  Only for disaster recovery  No up2date copy  Can be sized appropriately  Need to reprocess for latest data
  • 25. HA – Redundant Cluster • Redundant cluster within a data center using replication • Mainly to have backup cluster for disasters  Up2date data  Restore a state back using TTL based  Restore deleted data by keeping deleted cells  Run backups  Read/write distributed with load balancer  Support development or provide on-demand data  Support low important activities • Best practice: Avoid redundant cluster, rather have one big cluster with high redundancy
  • 26. HA – Rack, Network, Power • Cluster nodes should be rack and switch aware • Loosing a rack or a network switch should not bring cluster down • Hadoop has built-in rack awareness  Assign nodes based on rack diagram  Redundant nodes are within rack, across switch and rack  Manual or automatic setup to detect location • Redundant power and network within each node (master)
  • 27. HA – Region Servers • Loosing a region server or data node is very common, in many cases it could be very frequent • They are distributed and replicated • Can be added/removed dynamically, taken out for regular maintenance • Replication factor of 3 – Can loose ⅔rd of the cluster nodes • Replication factor of 4 – Can loose ¾th of the cluster nodes
  • 28. HA – Zookeeper • Zookeeper nodes are distributed • Can be added/removed dynamically • Should be implemented in odd number, due to quorum (majority voting wins the active state) • If 4, can loose 1 node (3 major voting) • If 5, can loose 2 nodes (3 major voting) • If 6, can loose 2 nodes (4 major voting) • If 7, can loose 3 nodes (4 major voting) • Best Practice: 5 or 7 with dedicated hardware.
  • 29. HA – HMaster • HMaster - single point of failure • HA - Multiple HMaster nodes within a cluster  Zookeeper co-ordinates master failure  Only one active at any given point of time  Best practice: 2-3 HMasters, 1 per rack
  • 31. How to scale • By design, cluster is highly distributed and scalable • Keep adding more region servers to scale  Region splits  Replication factor  Row key design is a key factor for scaling writes  No single “hot” region  Bulk loading, pre-split  Native java access X other protocols like thrift  Compaction at regular intervals
  • 32. Performance  Benchmarking is a key • Nothing fits for all • Simulate use cases and run the tests oBulk loading oRandom access, read/write oBulk processing oScan, filter • Negative performance oReplication factor oZookeeper nodes oNetwork latency oSlower disks, CPUs oHot regions, Bad row key or Bulk loading without pre-splits
  • 33. Tuning  Tune the cluster to best fit the environment • Block Size, LRU cache, 64K default, per CF • JBOD • Memstore • Compaction, manual • WAL flush • Avoid long GC pauses, JVM • Region size, small is better, split based on “hot” • Batch size • In-memory column families • Compression, LZO • Timeouts • Region handler count, threads/region • Speculative execution • Balancer, manual
  • 35. Backup - Built-in • In general no external backup needed • HBase is highly distributed and has built-in versioning, data retention policy  No need to backup just for redundancy  Point-in-time restore: • Use TTL/Table/CF/C and keep the history for X hours/days  Accidental deletes: • Use ‘KeepDeletedCells’ to keep all deleted data
  • 36. Backup - Tools • Use Export/Import tool  Based on timestamp; and use it for point-in-time backup/restore • Use region snapshots  Take HFile snapshots and copy them over to new storage location  Copy Hlog files for point-in-time roll-forward from snapshot time (replay using WALPlayer post import).  Table snapshots (0.94.6+)
  • 37. Backup - Replication • Use replicated cluster as one of the backup / disaster recovery • Statement based, write ahead log (WAL, HLog) from each region server  Asynchronous  Active Active using 1-1 replication  Active Passive using 1-N replication  Can be of same or different node size  0.92 onwards Active Active possible
  • 39. Hardware • Commodity Hardware • 1U or 2U preferred, avoid 4U or NAS or expensive systems • JBOD on slaves, RAID 1+0 on masters • No SSDs, No virtualized storage • Good number of cores (4-16), HT enabled • Good amount of RAM (24-72G) • Dual 1G network, 10G or InfiniBand
  • 40. Disks • SATA, 7/10/15K, cheaper the better • Use RAID firmware drives, faster error detection & enable disks to fail on h/w errors • Limit to 6/8 drives on 8 core, allow 1 drive/core = 100 IOPS/Drive = 4 * 1T = 4T, 400 IOPS, 400MB = 8 * 500G = 4T, 800 IOPS = not beyond 800/900MB/sec due to n/w saturation • Ext3/ext4/XFS • Mount => noatime, nodiratime
  • 41. OS, Kernel • RHEL or CentOS or Ubuntu • Swappiness=0, and no swap files • File limits to hadoop user (/etc/security/limits.conf) => 64/128K • JVM GC, HBase heap • NTP • Block size
  • 42. Automation • Automation is a key in distributed cluster setup  To easily launch a new node  To restore to base state  Keep same packages, configurations across the cluster • Use puppet/Chef/Existing process  Keep as much as possible puppetized  No accidental upgrades as it can restart the service • Cloudera Manager (CM) for any node management tasks  You can also puppetize & automate the process  CM will install all necessary packages
  • 43. Load Balancer • Internal  Periodically run balancer to ensure data distribution among region servers • hadoop-daemon.sh start balancer -threshold 10 • External  Has built-in load balancing capability  If using thrift bindings; then thrift servers needs to be load balanced  Future versions will address thrift balancing as well
  • 44. Upgrades • In general upgrades should be well planned • To update changes to cluster nodes (OS, configs, hardware, etc.); you can also do rolling restart without taking cluster down • Hadoop/HBase supports simple upgrade paths with rollback strategy to go back to old version • Make sure HBase/Hadoop versions are compatible • Use rolling restart for minor version upgrades
  • 45. Monitoring • Quick Checks  Use built-in web tools  Cloudera manager  Command line tools or wrapper scripts • RRD, Monitoring  Cloudera manager  Ganglia, Cacti, Nagios, NewRelic  OpenTSDB  Need proper alerting system for all events  Threshold monitoring for any surprises
  • 46. Alerting System  Need proper alerting system  JMX exposes all metrics  Ops Dashboard (Ganglia, Cacti, OpenTSDB, NewRelic)  Small dashboard for critical events  Define proper levels for escalation  Critical  Loosing a Master or ZooKeeper Node  +/- 10% drop in performance or latency  Key thresholds (load, swap, IO)  Loosing 2 or more slave nodes  Disk failures  Loosing a single slave node (critical in prime time)  Un-balanced nodes  FATAL errors in logs
  • 48. Case Study - 1 • 110 node cluster  Dual Quad Core, Intel Xeon, 2.2GHz  48G, no swap  6 2T SATA, 7K  Ubuntu 11.04  Puppet  Fabric for running commands on all nodes  /home/hadoop is everything, symlinks  Nagios  OpenTSDB for Trending points, dashboard  M/R limited to 50% of available RAM
  • 49. Questions ? • http://scalein.com/ • http://venublog.com/ • venu@venublog.com • Twitter: @vanuganti

Editor's Notes

  1. C:Consistency – When you write a tuple, its immediately available for readA: Availability – Loosing a node, will not bring the cluster downP: Partition Tolerance – Data is sharded across nodes, so if you loose group of nodes, its still available Cassandra - AP