SlideShare a Scribd company logo
1 of 40
Referent
Einrichtung Titel des Vortrages 1
WP-Benchmarking Top NoSQL
Databases
Apache Cassandra, Apache HBase and MongoDB
Presented By
Athiq Ahamed
Supriya
Referent
Einrichtung Titel des Vortrages 2
Introduction
 Enormous amount of data-BigData
 Scalabilty issue in RDBMS
 Rise of NoSQL databases
 Amazon Dynamo
 Big table
 CAP Theorem
 BASE system
Referent
Einrichtung Titel des Vortrages 3
CAP Theorem
 Consistency
 Availability
 Partition tolerance
CAP theorem states that only two of the properties can be
achieved at a time.
Referent
Einrichtung Titel des Vortrages 4
RDBMS NoSQL
Supports powerful query
language
Supports very simple query
language
It has a fixed schema No fixed schema
Follows ACID (Atomicity,
Consistency, Isolation and
Durability)
It is only eventually consistent
Supports transactions Does not support transactions
RDBMS vs NoSQL
Content:tutorialspoint.com
Referent
Einrichtung Titel des Vortrages 5
 Basically available: System guarantees availability, in
terms of the CAP theorem
 Soft state: State of the system may change over time,
because of eventual consistency model
 Eventual consistency: System will become consistent over
time
BASE
Content:www.edureka.in
Referent
Einrichtung Titel des Vortrages 6
 Fast Performance is the key.
 POC processes include right benchmarks:
 Configurations
 Parameters
 Workloads
Making the right choice!
Selection of NoSQL
Referent
Einrichtung Titel des Vortrages 7
 Yahoo Cloud Serving Benchmark (YCSB)
 Top 3 NoSQL databases-Apache Cassandra, Apache
Hbase and MongoDB.
 Amazon Web Services EC2 instances for hosting the tests
 Test performed 3 times on 3 different days
Benchmark configuration
Referent
Einrichtung Titel des Vortrages 8
 The tests ran on large size instances (15GB RAM and 4
CPU cores)
 Instances used customized Ubuntu with Oracle Java 1.6
installed as a base.
 A customized script written to drive the benchmark
processes
Benchmark configuration
Referent
Einrichtung Titel des Vortrages 9
 Each NoSQL system performs differently, not alike.
 Components and Internal working.
 Apache Cassandra: Columnar database model
 Apache HBase: Columnar database model
 MongoDB: Document storage database model
Understanding NoSQL Databases
Referent
Einrichtung Titel des Vortrages 10
Apache Cassandra
 Cassandra is scalable, fault-tolerant, and consistent. All
nodes are equal.
 Its distribution design is based on Amazon’s Dynamo and
its data model on Google’s Bigtable.
 Key components: Node, Cluster, Commit log, Mem-table,
SSTable and Bloom filter
Content:http://www.tutorialspoint.com/cassandra/cassandra_architecture.htm
Referent
Einrichtung Titel des Vortrages 11
 Ring structure, peer to peer architecture
 All nodes are equal
 This improves general database availablity
 Scaling up and scaling down is easier
 Cassandra has key-value, column oriented database
Apache Cassandra
Referent
Einrichtung Titel des Vortrages 12
Apache Cassandra
Content:http://demoiselle.sourceforge.net/component/demoiselle-
cassandra/1.0.0/images/datamodel1.png
Referent
Einrichtung Titel des Vortrages 13
 Cassandra has an internal keyspace called system, stores
metadata about the cluster.
 Metadata:
 The node‘s token
 The cluster name
 Keyspace n schema definitions (dynamic loading)
 Whether or not the node is bootstrapped
Apache Cassandra
Content:https://www.edureka.co/blog/category/apache-cassandra/
Referent
Einrichtung Titel des Vortrages 14
 Commit log: Crash recovery mechanism. Every write
operation is written to commit log
 Mem-Table: A memory resident data structure.
 SSTable: It is a disk file to which the data is flushed from
the mem-table
Apache Cassandra
Referent
Einrichtung Titel des Vortrages 15
 Bloom filters are used as a performance booster
 Bloom filter are very fast, quick algorithms for testing a
member in the set.
 Bloom filters serves as a special kind of cache – quick
lookups/search as they reside in memory
Apache Cassandra
Referent
Einrichtung Titel des Vortrages 16
 Gossip protocol: Communiction between nodes, co-
ordination and failure check
 Anti-Entropy protocol: Replica sync mechanism enusing
data on different nodes are updated (Merkle trees)
 Snitches ensures host proximity
Apache Cassandra
Referent
Einrichtung Titel des Vortrages 17
Apache Cassandra- Read/Write operation
Referent
Einrichtung Titel des Vortrages 18
 Sparse, distributed, sorted map and multidimensional and
consistent.
 Hbase is a Key/value store
 Consists Row key, Column family, columns and timestamp.
Apache HBase
Referent
Einrichtung Titel des Vortrages 19
Apache HBase
Content:http://zhangjunhd.github.io/assets/2013-02-25-apache-hbase/rowkey-
Referent
Einrichtung Titel des Vortrages 20
 Region: Contiguous rows form a region
 Region server(RS): Serves one or more regions.
 Master server: Daemon responsible for managing Hbase
cluster
 HDFS: Distributed, open source file system containing
HBase‘s data
 Zookeeper: Distributed, open source co-ordinated service
for co-ordination of master and region servers.
Apache HBase Components
Content: https://www.mapr.com/blog/in-depth-look-hbase-architecture
Referent
Einrichtung Titel des Vortrages 21
Apache Hbase Architecture
Referent
Einrichtung Titel des Vortrages 22
 Client obtains meta table RS from Zookeeper
 Client gets RS which holds the corresponding rowkey
 Client receives the row from the respective Region server
 Client caches this information along with the location of
meta table server.
First Read/Write to HBase
Referent
Einrichtung Titel des Vortrages 23
 WAL: Write Ahead Log is a file on the distributed file
system. It is used to store new data
 Block Cache: It is the read cache. It stores frequently
read data in memory
 Mem Store: Write cache that stores new data which is not
written to disk yet.
 Hfiles stores the rows as sorted key values on disk
HBase RS Components
Referent
Einrichtung Titel des Vortrages 24
 Client writes the data to the WAL file stored on disk
 WAL is used to recover not yet persisted data in case a
server crashes.
 Once data is written to WAL, it is placed in Mem Store
Hbase Write steps (1)
Referent
Einrichtung Titel des Vortrages 25
 All write/read are to/from the primary node.
 HDFS replicates WAL and Hfile blocks. Replication
happens automatically.
 When data is written in HDFS, one copy is written locally
and then it is replicated to a secondary node and later to
tertiary node.
HDFS Write steps (2)
Referent
Einrichtung Titel des Vortrages 26
 Cassandra usecase: Availability and Partition tolerant
requirements.
Consistency is tunable by setting it high in the option
 Hbase usecase: Consistency and Scalability. However, at
less number of nodes/threads, availability is achieved high
Cassandra and Hbase
Referent
Einrichtung Titel des Vortrages 27
 Document-oriented database
 High performance and automatic scaling
 High consistency and partition tolerant
 Replication and failover for high availability
 Low latency
 Flexible indexing
MongoDB
Referent
Einrichtung Titel des Vortrages 28
 Document is the basic unit for MongoDB(row)
 Collection is similar to a table
 A single instance has multiple independent databases
 Every document has a special key, “_id”
 Powerful JavaScript shell for administration
 Configdb contains metadata of clusters
MongoDB Concepts
Referent
Einrichtung Titel des Vortrages 29
MongoDB Simple Architecture
Referent
Einrichtung Titel des Vortrages 30
 A mongo receives queries from applications
 Uses metadata from config server for the data
 Mangos directs write operations to a particular shard
 Mongos uses the cluster metadata from the config
database
Read/Write MongoDB
Referent
Einrichtung Titel des Vortrages 31
 Scalability
 Availability
 Partition Tolerant
 Consistency
MOST IMPORTANT PERFORMANCE
Yahoo Cloud Serving Benchmark (YCSB)
Recap Importance of Benchmark and Factors
Referent
Einrichtung Titel des Vortrages 32
Results: Load Process
Referent
Einrichtung Titel des Vortrages 33
Results: Read/Write Mix Workload
Referent
Einrichtung Titel des Vortrages 34
Results: Read/Scan Mix Workload
Referent
Einrichtung Titel des Vortrages 35
Results: Read Latency across all workloads
Referent
Einrichtung Titel des Vortrages 36
Results: Insert Latency across all workloads
Referent
Einrichtung Titel des Vortrages 37
Lets MIGRATE from traditional data base !!!!
Live Demo
Referent
Einrichtung Titel des Vortrages 38
 Identify data model for the application
 Corresponding data sets have to be known
 Whether the application requires replication
 Identify the performance requirements
 Prototype the application
 Test the performance of the prototype
Discussion
Referent
Einrichtung Titel des Vortrages 39
Conclusion
 NoSQL replaced tradition relational databases
 Performance is the key feature
 Importance of benchmarks
 Top three NoSQL data base’s performance tested
 Cassandra outperforms all the other NoSQL data bases
 Decide based on application
Referent
Einrichtung Titel des Vortrages 40

More Related Content

What's hot

HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDBRick Copeland
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
Migrating to postgresql
Migrating to postgresqlMigrating to postgresql
Migrating to postgresqlbotsplash.com
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodesaaronmorton
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsOleg Magazov
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State DrivesVinoth Chandar
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardMatthew Blair
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time CassandraAcunu
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamojbellis
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra Knoldus Inc.
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in KubernetesRun Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in KubernetesBernd Ocklin
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architectureT Jake Luciani
 

What's hot (20)

HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Voldemort
VoldemortVoldemort
Voldemort
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
Migrating to postgresql
Migrating to postgresqlMigrating to postgresql
Migrating to postgresql
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
Apache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and BasicsApache Cassandra training. Overview and Basics
Apache Cassandra training. Overview and Basics
 
Voldemort on Solid State Drives
Voldemort on Solid State DrivesVoldemort on Solid State Drives
Voldemort on Solid State Drives
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in TelcoGruter TECHDAY 2014 Realtime Processing in Telco
Gruter TECHDAY 2014 Realtime Processing in Telco
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in KubernetesRun Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in Kubernetes
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 

Viewers also liked

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsDataWorks Summit
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 

Viewers also liked (6)

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table Functions
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Spark + HBase
Spark + HBase Spark + HBase
Spark + HBase
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 

Similar to Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB

Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBPerformance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBKaushik Rajan
 
Lecture-20.pptx
Lecture-20.pptxLecture-20.pptx
Lecture-20.pptxmohaaalsa
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Couchbase - Yet Another Introduction
Couchbase - Yet Another IntroductionCouchbase - Yet Another Introduction
Couchbase - Yet Another IntroductionKelum Senanayake
 
Nosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understandingNosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understandingHUSNAINAHMAD39
 
Oracle NoSQL Database Compared to Cassandra and HBase
Oracle NoSQL Database Compared to Cassandra and HBaseOracle NoSQL Database Compared to Cassandra and HBase
Oracle NoSQL Database Compared to Cassandra and HBasePaulo Fagundes
 
Drupalcamp Estonia - High Performance Sites
Drupalcamp Estonia - High Performance SitesDrupalcamp Estonia - High Performance Sites
Drupalcamp Estonia - High Performance SitesExove
 
Drupalcamp Estonia - High Performance Sites
Drupalcamp Estonia - High Performance SitesDrupalcamp Estonia - High Performance Sites
Drupalcamp Estonia - High Performance Sitesdrupalcampest
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppthothyfa
 
cassandra
cassandracassandra
cassandraAkash R
 
Performance analysis of MongoDB and HBase
Performance analysis of MongoDB and HBasePerformance analysis of MongoDB and HBase
Performance analysis of MongoDB and HBaseSindhujanDhayalan
 

Similar to Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB (20)

Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBPerformance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODB
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
 
Lecture-20.pptx
Lecture-20.pptxLecture-20.pptx
Lecture-20.pptx
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
 
Couchbase - Yet Another Introduction
Couchbase - Yet Another IntroductionCouchbase - Yet Another Introduction
Couchbase - Yet Another Introduction
 
Hbase
HbaseHbase
Hbase
 
Nosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understandingNosql Presentation.pdf for DBMS understanding
Nosql Presentation.pdf for DBMS understanding
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Oracle NoSQL Database Compared to Cassandra and HBase
Oracle NoSQL Database Compared to Cassandra and HBaseOracle NoSQL Database Compared to Cassandra and HBase
Oracle NoSQL Database Compared to Cassandra and HBase
 
Drupalcamp Estonia - High Performance Sites
Drupalcamp Estonia - High Performance SitesDrupalcamp Estonia - High Performance Sites
Drupalcamp Estonia - High Performance Sites
 
Drupalcamp Estonia - High Performance Sites
Drupalcamp Estonia - High Performance SitesDrupalcamp Estonia - High Performance Sites
Drupalcamp Estonia - High Performance Sites
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
cassandra
cassandracassandra
cassandra
 
No sq lv2
No sq lv2No sq lv2
No sq lv2
 
Performance analysis of MongoDB and HBase
Performance analysis of MongoDB and HBasePerformance analysis of MongoDB and HBase
Performance analysis of MongoDB and HBase
 

Recently uploaded

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 

Recently uploaded (20)

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 

Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB

  • 1. Referent Einrichtung Titel des Vortrages 1 WP-Benchmarking Top NoSQL Databases Apache Cassandra, Apache HBase and MongoDB Presented By Athiq Ahamed Supriya
  • 2. Referent Einrichtung Titel des Vortrages 2 Introduction  Enormous amount of data-BigData  Scalabilty issue in RDBMS  Rise of NoSQL databases  Amazon Dynamo  Big table  CAP Theorem  BASE system
  • 3. Referent Einrichtung Titel des Vortrages 3 CAP Theorem  Consistency  Availability  Partition tolerance CAP theorem states that only two of the properties can be achieved at a time.
  • 4. Referent Einrichtung Titel des Vortrages 4 RDBMS NoSQL Supports powerful query language Supports very simple query language It has a fixed schema No fixed schema Follows ACID (Atomicity, Consistency, Isolation and Durability) It is only eventually consistent Supports transactions Does not support transactions RDBMS vs NoSQL Content:tutorialspoint.com
  • 5. Referent Einrichtung Titel des Vortrages 5  Basically available: System guarantees availability, in terms of the CAP theorem  Soft state: State of the system may change over time, because of eventual consistency model  Eventual consistency: System will become consistent over time BASE Content:www.edureka.in
  • 6. Referent Einrichtung Titel des Vortrages 6  Fast Performance is the key.  POC processes include right benchmarks:  Configurations  Parameters  Workloads Making the right choice! Selection of NoSQL
  • 7. Referent Einrichtung Titel des Vortrages 7  Yahoo Cloud Serving Benchmark (YCSB)  Top 3 NoSQL databases-Apache Cassandra, Apache Hbase and MongoDB.  Amazon Web Services EC2 instances for hosting the tests  Test performed 3 times on 3 different days Benchmark configuration
  • 8. Referent Einrichtung Titel des Vortrages 8  The tests ran on large size instances (15GB RAM and 4 CPU cores)  Instances used customized Ubuntu with Oracle Java 1.6 installed as a base.  A customized script written to drive the benchmark processes Benchmark configuration
  • 9. Referent Einrichtung Titel des Vortrages 9  Each NoSQL system performs differently, not alike.  Components and Internal working.  Apache Cassandra: Columnar database model  Apache HBase: Columnar database model  MongoDB: Document storage database model Understanding NoSQL Databases
  • 10. Referent Einrichtung Titel des Vortrages 10 Apache Cassandra  Cassandra is scalable, fault-tolerant, and consistent. All nodes are equal.  Its distribution design is based on Amazon’s Dynamo and its data model on Google’s Bigtable.  Key components: Node, Cluster, Commit log, Mem-table, SSTable and Bloom filter Content:http://www.tutorialspoint.com/cassandra/cassandra_architecture.htm
  • 11. Referent Einrichtung Titel des Vortrages 11  Ring structure, peer to peer architecture  All nodes are equal  This improves general database availablity  Scaling up and scaling down is easier  Cassandra has key-value, column oriented database Apache Cassandra
  • 12. Referent Einrichtung Titel des Vortrages 12 Apache Cassandra Content:http://demoiselle.sourceforge.net/component/demoiselle- cassandra/1.0.0/images/datamodel1.png
  • 13. Referent Einrichtung Titel des Vortrages 13  Cassandra has an internal keyspace called system, stores metadata about the cluster.  Metadata:  The node‘s token  The cluster name  Keyspace n schema definitions (dynamic loading)  Whether or not the node is bootstrapped Apache Cassandra Content:https://www.edureka.co/blog/category/apache-cassandra/
  • 14. Referent Einrichtung Titel des Vortrages 14  Commit log: Crash recovery mechanism. Every write operation is written to commit log  Mem-Table: A memory resident data structure.  SSTable: It is a disk file to which the data is flushed from the mem-table Apache Cassandra
  • 15. Referent Einrichtung Titel des Vortrages 15  Bloom filters are used as a performance booster  Bloom filter are very fast, quick algorithms for testing a member in the set.  Bloom filters serves as a special kind of cache – quick lookups/search as they reside in memory Apache Cassandra
  • 16. Referent Einrichtung Titel des Vortrages 16  Gossip protocol: Communiction between nodes, co- ordination and failure check  Anti-Entropy protocol: Replica sync mechanism enusing data on different nodes are updated (Merkle trees)  Snitches ensures host proximity Apache Cassandra
  • 17. Referent Einrichtung Titel des Vortrages 17 Apache Cassandra- Read/Write operation
  • 18. Referent Einrichtung Titel des Vortrages 18  Sparse, distributed, sorted map and multidimensional and consistent.  Hbase is a Key/value store  Consists Row key, Column family, columns and timestamp. Apache HBase
  • 19. Referent Einrichtung Titel des Vortrages 19 Apache HBase Content:http://zhangjunhd.github.io/assets/2013-02-25-apache-hbase/rowkey-
  • 20. Referent Einrichtung Titel des Vortrages 20  Region: Contiguous rows form a region  Region server(RS): Serves one or more regions.  Master server: Daemon responsible for managing Hbase cluster  HDFS: Distributed, open source file system containing HBase‘s data  Zookeeper: Distributed, open source co-ordinated service for co-ordination of master and region servers. Apache HBase Components Content: https://www.mapr.com/blog/in-depth-look-hbase-architecture
  • 21. Referent Einrichtung Titel des Vortrages 21 Apache Hbase Architecture
  • 22. Referent Einrichtung Titel des Vortrages 22  Client obtains meta table RS from Zookeeper  Client gets RS which holds the corresponding rowkey  Client receives the row from the respective Region server  Client caches this information along with the location of meta table server. First Read/Write to HBase
  • 23. Referent Einrichtung Titel des Vortrages 23  WAL: Write Ahead Log is a file on the distributed file system. It is used to store new data  Block Cache: It is the read cache. It stores frequently read data in memory  Mem Store: Write cache that stores new data which is not written to disk yet.  Hfiles stores the rows as sorted key values on disk HBase RS Components
  • 24. Referent Einrichtung Titel des Vortrages 24  Client writes the data to the WAL file stored on disk  WAL is used to recover not yet persisted data in case a server crashes.  Once data is written to WAL, it is placed in Mem Store Hbase Write steps (1)
  • 25. Referent Einrichtung Titel des Vortrages 25  All write/read are to/from the primary node.  HDFS replicates WAL and Hfile blocks. Replication happens automatically.  When data is written in HDFS, one copy is written locally and then it is replicated to a secondary node and later to tertiary node. HDFS Write steps (2)
  • 26. Referent Einrichtung Titel des Vortrages 26  Cassandra usecase: Availability and Partition tolerant requirements. Consistency is tunable by setting it high in the option  Hbase usecase: Consistency and Scalability. However, at less number of nodes/threads, availability is achieved high Cassandra and Hbase
  • 27. Referent Einrichtung Titel des Vortrages 27  Document-oriented database  High performance and automatic scaling  High consistency and partition tolerant  Replication and failover for high availability  Low latency  Flexible indexing MongoDB
  • 28. Referent Einrichtung Titel des Vortrages 28  Document is the basic unit for MongoDB(row)  Collection is similar to a table  A single instance has multiple independent databases  Every document has a special key, “_id”  Powerful JavaScript shell for administration  Configdb contains metadata of clusters MongoDB Concepts
  • 29. Referent Einrichtung Titel des Vortrages 29 MongoDB Simple Architecture
  • 30. Referent Einrichtung Titel des Vortrages 30  A mongo receives queries from applications  Uses metadata from config server for the data  Mangos directs write operations to a particular shard  Mongos uses the cluster metadata from the config database Read/Write MongoDB
  • 31. Referent Einrichtung Titel des Vortrages 31  Scalability  Availability  Partition Tolerant  Consistency MOST IMPORTANT PERFORMANCE Yahoo Cloud Serving Benchmark (YCSB) Recap Importance of Benchmark and Factors
  • 32. Referent Einrichtung Titel des Vortrages 32 Results: Load Process
  • 33. Referent Einrichtung Titel des Vortrages 33 Results: Read/Write Mix Workload
  • 34. Referent Einrichtung Titel des Vortrages 34 Results: Read/Scan Mix Workload
  • 35. Referent Einrichtung Titel des Vortrages 35 Results: Read Latency across all workloads
  • 36. Referent Einrichtung Titel des Vortrages 36 Results: Insert Latency across all workloads
  • 37. Referent Einrichtung Titel des Vortrages 37 Lets MIGRATE from traditional data base !!!! Live Demo
  • 38. Referent Einrichtung Titel des Vortrages 38  Identify data model for the application  Corresponding data sets have to be known  Whether the application requires replication  Identify the performance requirements  Prototype the application  Test the performance of the prototype Discussion
  • 39. Referent Einrichtung Titel des Vortrages 39 Conclusion  NoSQL replaced tradition relational databases  Performance is the key feature  Importance of benchmarks  Top three NoSQL data base’s performance tested  Cassandra outperforms all the other NoSQL data bases  Decide based on application

Editor's Notes

  1. Managing the start up Configuration and Termination of EC2 instances Running the test on clients
  2. Apache Cassandra: Columnar database model (Combination of Amazon Dynamo+Bigtable) Apache HBase: Columnar database model (Big table inspired Hadoop system)
  3. Rows are split and it has row key for range of rows (primary key is hashed, md5 hash), column family (column name) with value and time stamp. In habse, data is split columnwise, it has row key for range of rows, column family and column qualifier and time stamp. Ordered distribution and no hash distribution. Frequently accessed column are grouped together under commom family.
  4. System keyspace stores metadata for the local node. System keyspace cannot be modeified or edited by us . The node‘s token is decided by the partitioner.
  5. Memory reads are faster than disk reads..so when we see results of test, cassandra outperforms and bloom filters could be one of the reason, because of fast memory access and reads.
  6. Cassandra nodes exchange merkle trees for conversation with neighbours. Merkle tree is a hash representing the data in a column family. Trees are compared and if there is any difference, it launches a repair for the ranges that dont agree. Read-repair happens in the background internally.There is something called as snitch which routes the client to the nearest node.(there is no separate configdb like mongodb to route or zookeeper in hbase..which may take aditional time to respond). Snitch gives host proximity.
  7. Give example of facebook