SlideShare a Scribd company logo
1 of 24
Slide 1
HBase Vs Cassandra Vs
MongoDB - choose the right
NoSQL database
View NoSQL database Courses at : www.edureka.in
*
Slide 2
Objectives of this Session
• Un
For Queries during the session and class recording:
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
 Traditional databases
 Challenges with traditional databases
 CAP Theorem
 NoSQL to the rescue
 A BASE system
 Choose the right NoSQL database
www.edureka.in
Slide 3 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
RDBMS/OLTP/Real Time
NoSQL/New SQL/BigData
DSS/OLAP/DW
Oracle
MySQL
MS SQL
DB2
Netezza
SAP Hana
Oracle Express
MongoDB
HBase
Cassandra
CouchDB
Database Categories
www.edureka.in
Slide 4 www.edureka.in
5000 TPS
Caching Layer
300 ~ 500 SQL
Transaction
100 ~ 200 SQL
Transaction
1000 TPS
WEB APPLICATION
RDBMS1
Applications Changing Data
RDBMS1
Elastic Scale
A Traditional database solution
Slide 5 www.edureka.in
1000 TPS
Elastic Scale WEB APPLICATION
Applications Changing Data
Elastic Scale
CASSANDRA
300 ~ 500 SQL
Transaction
100 ~ 200 SQL
Transaction
5000 TPS
A NoSQL database solution
Slide 6 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
Challenges with traditional databases
 Not a good fit for large Data Volume (petabytes of data) with Varying data types
e.g. images, videos, text etc.
 Can’t scale for large data volume e.g. 15 - 20 petabyte data in Govt. of India
“AADHAR” project
 Scale-up - Limited by Memory and Processing (CPU) capabilities
 Scale-out - Cache dependent ‘Read’ and ‘Write’ Operations
 Complex RDBMS model – Parsing, Locking, Logging, Buffer pool, Threads etc.
 Sharding causes operational problems e.g. managing a shard failure
 Consistency – A bottleneck for Scalability in RDBMS
 Satisfying ACID is an hindrance for Scaling
 Relaxed consistency to scale out with NoSQL databases
Slide 7 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
CAP
We must understand the CAP
theorem when we talk about
NoSQL databases or in fact
when designing any distributed
system.
CAP theorem states that there are 3 basic requirements which exist in a special relation when designing
applications for a distributed architecture.
Consistency
Availability
Partition
Tolerance
CAP Theorem
This means that the system is always on (service guarantee
availability), no downtime.
This means that the system continues to function even the
communication among the servers is unreliable, i.e. the servers
may be partitioned into multiple groups that cannot communicate
with one another.
This means that the data in the database remains consistent after
the execution of an operation. For example after an update
operation all clients see the same data.
Slide 8 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
 CAP provides the basic requirements for a distributed system
to follow 2 of the 3 requirements.
 In theoretically it is impossible to fulfill all 3 requirements.
 Therefore all the current NoSQL database follow the different
combinations of the C, A, P from the CAP theorem.
CAP Theorem and NoSQL databases
 CA - Single site cluster, therefore all nodes are always
in contact. When a partition occurs, the system blocks.
 CP - Some data may not be accessible, but the rest is
still consistent/accurate.
 AP - System is still available under partitioning, but
some of the data returned may be inaccurate.
www.edureka.in
Slide 9 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
NoSQL to the rescue
 A scale-out, shared-nothing architecture, capable of running on a large number of
nodes
 A non-locking concurrency control mechanism so real-time reads will not conflict
with writes
 Scalable replication and distribution
 Thousands of machines with distributed data
 An architecture providing much higher per-node performance than available from
the traditional SQL-based databases
 Schema-less Data Model
 Mostly Query and Few Updates
Slide 10 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
 Basically Available indicates that the system does guarantee availability, in terms of
the CAP theorem.
Basically Available
 Soft State indicates that the state of the system may change over time, even without
input. This is because of the eventual consistency model.
Soft State
 Eventual Consistency indicates that the system will become consistent over time,
given that the system doesn't receive input during that time.
Eventual Consistency
A BASE system gives up on consistency.
NoSQL database - A BASE not ACID system
Slide 11 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
~ 150 No SQL Database
are there in Market
~150
NoSQL database – Not a Panacea
Slide 12 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
NoSQL Database – Storage Architecture
CouchDB, MongoDB
Collection of key value
Connections
Incomplete Data
Tolerant
Query Performance, No
Standard Query Syntax
Hbase, Cassandra
Column Families
Fast Look-ups
Very Low Level API
Amazon Simple DB,
Redis
Collection of Key
Value pairs
Fast Look-ups
Stored Data
has no Schema
InfoGrid, Infinite Graph
“Property Graph” - Nodes
Graph Algorithms – Shortest
Path, Connected ness, Etc
Not easy to Cluster, traverse
whole graph to get answer
Data Model
Example
Weakness
Strength
Data Model
Example
Weakness
Strength
Data Model
Example
Weakness
Strength
Data Model
Example
Weakness
Strength
Document Data
Store Databases
Key Value
Databases
Columnar NoSQL
Databases
Graph NoSQL
Databases
No SQL
Database Types
www.edureka.in
Slide 13 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Right Data Model
Pros and Cons of
Consistency
Compromising
Features of RDBMS
Step 2
Step 3
Selecting a NoSQL database
Step 1
www.edureka.in
Slide 14 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Where to Use Cassandra?
 If looking for simple setup, maintenance and code
 Very High Velocity Random Reads & Writes
 Flexible Sparse / Wide Column Requirements
 No Multiple Secondary Index Needs
www.edureka.in
Slide 15 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.inwww.edureka.in
Massive Scale, High Availability
Cassandra Use Case - Twitter
Slide 16 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Where NOT to Use Cassandra?
Do not use Cassandra if your application has:
 Secondary Indexes.
 Relational Data.
 Transactional (Rollback, Commit)
 Primary & Financial Records.
 Stringent Security & Authorization Needs On Data
 Dynamic Queries on Columns.
 Searching Column Data
 Low Latency
www.edureka.in
Slide 17 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Where to Use HBase
 Optimized for reads
 Well suited for doing Range based scans
 Applications with strict consistency requirements
 Applications with fast read and writes with scalability
 Facebook uses it to manage its user statuses, photos, chat messages etc.
www.edureka.in
Slide 18 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.inwww.edureka.in
Consistency and Scale
HBase Use Case - Facebook Messenger
Slide 19 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
 It is not optimized for classic transactional applications or even relational analytics
 Application that need:
 full table scans
 data to be aggregated, rolled up, analysed across rows
Where Not to use HBase
Slide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
Where to Use MongoDB
www.edureka.in
 RDBMS replacement for Web Applications
 Semi-structured Content Management
 Real-time Analytics & High-Speed Logging
 Caching and High Scalability
 Web 2.0, Media, SAAS, Gaming
http://www.mongodb.org/about/production-deployments/
Slide 21 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
 MySQL for Active posts
 MongoDB for Archived posts
 Migrated Two billion plus posts to MongoDB
 Migrated from RDBMS to MongoDB
 Storage of venues and check-ins
High-performance and Schema-free
MongoDB Use Cases
Slide 22 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
 Highly Transactional Applications
 Applications with traditional database systems requirements such as foreign-key
constraints etc.
Where Not to use MongoDB
Slide 23 www.edureka.in
 Distributed and
scalable big data store
 Strong consistency
 Built on top of Hadoop
Distributed File
system (HDFS)
 CP on CAP
Cassandra MongoDBHBase
 High availability
 Incremental scalability
 Eventually consistent
 Trade-offs between
consistency and latency
 Minimal administration
 No SPF (Single Point of Failure)
 AP on CAP
 Schemas to change as applications
evolve (Schema-free)
 Full Index Support for High
Performance.
 Replication and Failover for High
Availability.
 Auto Sharding for Easy Scalability.
 Rich Document based queries for
Easy readability
 CP on CAP
HBase Vs Cassandra Vs MongoDB
Slide 24
Questions?
Buy NoSQL database Courses at : www.edureka.in
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
www.edureka.in

More Related Content

What's hot

Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
Jun Rao
 

What's hot (20)

Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache Spark
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query language
 

Viewers also liked

Strengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBStrengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDB
lehresman
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table Functions
DataWorks Summit
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performance
DataWorks Summit
 

Viewers also liked (20)

Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Strengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBStrengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDB
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
MySQL Atchitecture and Concepts
MySQL Atchitecture and ConceptsMySQL Atchitecture and Concepts
MySQL Atchitecture and Concepts
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table Functions
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Introduction to MySQL
Introduction to MySQLIntroduction to MySQL
Introduction to MySQL
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performance
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
AWS Snowball: Accelerating Large-Scale Data Ingest Into the AWS Cloud | AWS P...
 
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
 
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
 
Introduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesIntroduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code Services
 
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
 
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)
 

Similar to HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Edureka!
 
Introduction to Cloud Computing with AWS
Introduction to Cloud Computing with AWSIntroduction to Cloud Computing with AWS
Introduction to Cloud Computing with AWS
Edureka!
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
xlight
 

Similar to HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database (20)

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Architecting in Cloud
Architecting in CloudArchitecting in Cloud
Architecting in Cloud
 
Building a Scalable Application on Cloud
Building a Scalable Application on CloudBuilding a Scalable Application on Cloud
Building a Scalable Application on Cloud
 
Architecting in Cloud : Your Guide to Amazon Web Services
Architecting in Cloud : Your Guide to Amazon Web ServicesArchitecting in Cloud : Your Guide to Amazon Web Services
Architecting in Cloud : Your Guide to Amazon Web Services
 
Build Application With MongoDB
Build Application With MongoDBBuild Application With MongoDB
Build Application With MongoDB
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
No sql databases
No sql databases No sql databases
No sql databases
 
Introduction to Cloud Computing with AWS
Introduction to Cloud Computing with AWSIntroduction to Cloud Computing with AWS
Introduction to Cloud Computing with AWS
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousing
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
 
Erciyes university
Erciyes universityErciyes university
Erciyes university
 
Why you should(n't) run your databases in the cloud
Why you should(n't) run your databases in the cloudWhy you should(n't) run your databases in the cloud
Why you should(n't) run your databases in the cloud
 
Cassandra
CassandraCassandra
Cassandra
 
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
OUG Scotland 2014 - NoSQL and MySQL - The best of both worldsOUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
 
No sql
No sqlNo sql
No sql
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربي
 
NoSQL - 05March2014 Seminar
NoSQL - 05March2014 SeminarNoSQL - 05March2014 Seminar
NoSQL - 05March2014 Seminar
 
no sql presentation
no sql presentationno sql presentation
no sql presentation
 

More from Edureka!

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database

  • 1. Slide 1 HBase Vs Cassandra Vs MongoDB - choose the right NoSQL database View NoSQL database Courses at : www.edureka.in *
  • 2. Slide 2 Objectives of this Session • Un For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN  Traditional databases  Challenges with traditional databases  CAP Theorem  NoSQL to the rescue  A BASE system  Choose the right NoSQL database www.edureka.in
  • 3. Slide 3 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions RDBMS/OLTP/Real Time NoSQL/New SQL/BigData DSS/OLAP/DW Oracle MySQL MS SQL DB2 Netezza SAP Hana Oracle Express MongoDB HBase Cassandra CouchDB Database Categories www.edureka.in
  • 4. Slide 4 www.edureka.in 5000 TPS Caching Layer 300 ~ 500 SQL Transaction 100 ~ 200 SQL Transaction 1000 TPS WEB APPLICATION RDBMS1 Applications Changing Data RDBMS1 Elastic Scale A Traditional database solution
  • 5. Slide 5 www.edureka.in 1000 TPS Elastic Scale WEB APPLICATION Applications Changing Data Elastic Scale CASSANDRA 300 ~ 500 SQL Transaction 100 ~ 200 SQL Transaction 5000 TPS A NoSQL database solution
  • 6. Slide 6 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in Challenges with traditional databases  Not a good fit for large Data Volume (petabytes of data) with Varying data types e.g. images, videos, text etc.  Can’t scale for large data volume e.g. 15 - 20 petabyte data in Govt. of India “AADHAR” project  Scale-up - Limited by Memory and Processing (CPU) capabilities  Scale-out - Cache dependent ‘Read’ and ‘Write’ Operations  Complex RDBMS model – Parsing, Locking, Logging, Buffer pool, Threads etc.  Sharding causes operational problems e.g. managing a shard failure  Consistency – A bottleneck for Scalability in RDBMS  Satisfying ACID is an hindrance for Scaling  Relaxed consistency to scale out with NoSQL databases
  • 7. Slide 7 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in CAP We must understand the CAP theorem when we talk about NoSQL databases or in fact when designing any distributed system. CAP theorem states that there are 3 basic requirements which exist in a special relation when designing applications for a distributed architecture. Consistency Availability Partition Tolerance CAP Theorem This means that the system is always on (service guarantee availability), no downtime. This means that the system continues to function even the communication among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with one another. This means that the data in the database remains consistent after the execution of an operation. For example after an update operation all clients see the same data.
  • 8. Slide 8 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions  CAP provides the basic requirements for a distributed system to follow 2 of the 3 requirements.  In theoretically it is impossible to fulfill all 3 requirements.  Therefore all the current NoSQL database follow the different combinations of the C, A, P from the CAP theorem. CAP Theorem and NoSQL databases  CA - Single site cluster, therefore all nodes are always in contact. When a partition occurs, the system blocks.  CP - Some data may not be accessible, but the rest is still consistent/accurate.  AP - System is still available under partitioning, but some of the data returned may be inaccurate. www.edureka.in
  • 9. Slide 9 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in NoSQL to the rescue  A scale-out, shared-nothing architecture, capable of running on a large number of nodes  A non-locking concurrency control mechanism so real-time reads will not conflict with writes  Scalable replication and distribution  Thousands of machines with distributed data  An architecture providing much higher per-node performance than available from the traditional SQL-based databases  Schema-less Data Model  Mostly Query and Few Updates
  • 10. Slide 10 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in  Basically Available indicates that the system does guarantee availability, in terms of the CAP theorem. Basically Available  Soft State indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model. Soft State  Eventual Consistency indicates that the system will become consistent over time, given that the system doesn't receive input during that time. Eventual Consistency A BASE system gives up on consistency. NoSQL database - A BASE not ACID system
  • 11. Slide 11 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in ~ 150 No SQL Database are there in Market ~150 NoSQL database – Not a Panacea
  • 12. Slide 12 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions NoSQL Database – Storage Architecture CouchDB, MongoDB Collection of key value Connections Incomplete Data Tolerant Query Performance, No Standard Query Syntax Hbase, Cassandra Column Families Fast Look-ups Very Low Level API Amazon Simple DB, Redis Collection of Key Value pairs Fast Look-ups Stored Data has no Schema InfoGrid, Infinite Graph “Property Graph” - Nodes Graph Algorithms – Shortest Path, Connected ness, Etc Not easy to Cluster, traverse whole graph to get answer Data Model Example Weakness Strength Data Model Example Weakness Strength Data Model Example Weakness Strength Data Model Example Weakness Strength Document Data Store Databases Key Value Databases Columnar NoSQL Databases Graph NoSQL Databases No SQL Database Types www.edureka.in
  • 13. Slide 13 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Right Data Model Pros and Cons of Consistency Compromising Features of RDBMS Step 2 Step 3 Selecting a NoSQL database Step 1 www.edureka.in
  • 14. Slide 14 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Where to Use Cassandra?  If looking for simple setup, maintenance and code  Very High Velocity Random Reads & Writes  Flexible Sparse / Wide Column Requirements  No Multiple Secondary Index Needs www.edureka.in
  • 15. Slide 15 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.inwww.edureka.in Massive Scale, High Availability Cassandra Use Case - Twitter
  • 16. Slide 16 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Where NOT to Use Cassandra? Do not use Cassandra if your application has:  Secondary Indexes.  Relational Data.  Transactional (Rollback, Commit)  Primary & Financial Records.  Stringent Security & Authorization Needs On Data  Dynamic Queries on Columns.  Searching Column Data  Low Latency www.edureka.in
  • 17. Slide 17 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Where to Use HBase  Optimized for reads  Well suited for doing Range based scans  Applications with strict consistency requirements  Applications with fast read and writes with scalability  Facebook uses it to manage its user statuses, photos, chat messages etc. www.edureka.in
  • 18. Slide 18 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.inwww.edureka.in Consistency and Scale HBase Use Case - Facebook Messenger
  • 19. Slide 19 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in  It is not optimized for classic transactional applications or even relational analytics  Application that need:  full table scans  data to be aggregated, rolled up, analysed across rows Where Not to use HBase
  • 20. Slide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in Where to Use MongoDB www.edureka.in  RDBMS replacement for Web Applications  Semi-structured Content Management  Real-time Analytics & High-Speed Logging  Caching and High Scalability  Web 2.0, Media, SAAS, Gaming http://www.mongodb.org/about/production-deployments/
  • 21. Slide 21 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in  MySQL for Active posts  MongoDB for Archived posts  Migrated Two billion plus posts to MongoDB  Migrated from RDBMS to MongoDB  Storage of venues and check-ins High-performance and Schema-free MongoDB Use Cases
  • 22. Slide 22 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in  Highly Transactional Applications  Applications with traditional database systems requirements such as foreign-key constraints etc. Where Not to use MongoDB
  • 23. Slide 23 www.edureka.in  Distributed and scalable big data store  Strong consistency  Built on top of Hadoop Distributed File system (HDFS)  CP on CAP Cassandra MongoDBHBase  High availability  Incremental scalability  Eventually consistent  Trade-offs between consistency and latency  Minimal administration  No SPF (Single Point of Failure)  AP on CAP  Schemas to change as applications evolve (Schema-free)  Full Index Support for High Performance.  Replication and Failover for High Availability.  Auto Sharding for Easy Scalability.  Rich Document based queries for Easy readability  CP on CAP HBase Vs Cassandra Vs MongoDB
  • 24. Slide 24 Questions? Buy NoSQL database Courses at : www.edureka.in Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in