NoSQL includes a wide range of different database technologies and were developed as a result of surging volume of data stored. Relational databases are not capable of coping with this huge volume and faces agility challenges. This is where NoSQL databases have come in to play and are popular because of their features. The session covers the following topics to help you choose the right NoSQL databases:
Traditional databases
Challenges with traditional databases
CAP Theorem
NoSQL to the rescue
A BASE system
Choose the right NoSQL database
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
1. Slide 1
HBase Vs Cassandra Vs
MongoDB - choose the right
NoSQL database
View NoSQL database Courses at : www.edureka.in
*
2. Slide 2
Objectives of this Session
• Un
For Queries during the session and class recording:
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
Traditional databases
Challenges with traditional databases
CAP Theorem
NoSQL to the rescue
A BASE system
Choose the right NoSQL database
www.edureka.in
3. Slide 3 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
RDBMS/OLTP/Real Time
NoSQL/New SQL/BigData
DSS/OLAP/DW
Oracle
MySQL
MS SQL
DB2
Netezza
SAP Hana
Oracle Express
MongoDB
HBase
Cassandra
CouchDB
Database Categories
www.edureka.in
4. Slide 4 www.edureka.in
5000 TPS
Caching Layer
300 ~ 500 SQL
Transaction
100 ~ 200 SQL
Transaction
1000 TPS
WEB APPLICATION
RDBMS1
Applications Changing Data
RDBMS1
Elastic Scale
A Traditional database solution
6. Slide 6 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
Challenges with traditional databases
Not a good fit for large Data Volume (petabytes of data) with Varying data types
e.g. images, videos, text etc.
Can’t scale for large data volume e.g. 15 - 20 petabyte data in Govt. of India
“AADHAR” project
Scale-up - Limited by Memory and Processing (CPU) capabilities
Scale-out - Cache dependent ‘Read’ and ‘Write’ Operations
Complex RDBMS model – Parsing, Locking, Logging, Buffer pool, Threads etc.
Sharding causes operational problems e.g. managing a shard failure
Consistency – A bottleneck for Scalability in RDBMS
Satisfying ACID is an hindrance for Scaling
Relaxed consistency to scale out with NoSQL databases
7. Slide 7 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
CAP
We must understand the CAP
theorem when we talk about
NoSQL databases or in fact
when designing any distributed
system.
CAP theorem states that there are 3 basic requirements which exist in a special relation when designing
applications for a distributed architecture.
Consistency
Availability
Partition
Tolerance
CAP Theorem
This means that the system is always on (service guarantee
availability), no downtime.
This means that the system continues to function even the
communication among the servers is unreliable, i.e. the servers
may be partitioned into multiple groups that cannot communicate
with one another.
This means that the data in the database remains consistent after
the execution of an operation. For example after an update
operation all clients see the same data.
8. Slide 8 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
CAP provides the basic requirements for a distributed system
to follow 2 of the 3 requirements.
In theoretically it is impossible to fulfill all 3 requirements.
Therefore all the current NoSQL database follow the different
combinations of the C, A, P from the CAP theorem.
CAP Theorem and NoSQL databases
CA - Single site cluster, therefore all nodes are always
in contact. When a partition occurs, the system blocks.
CP - Some data may not be accessible, but the rest is
still consistent/accurate.
AP - System is still available under partitioning, but
some of the data returned may be inaccurate.
www.edureka.in
9. Slide 9 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
NoSQL to the rescue
A scale-out, shared-nothing architecture, capable of running on a large number of
nodes
A non-locking concurrency control mechanism so real-time reads will not conflict
with writes
Scalable replication and distribution
Thousands of machines with distributed data
An architecture providing much higher per-node performance than available from
the traditional SQL-based databases
Schema-less Data Model
Mostly Query and Few Updates
10. Slide 10 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
Basically Available indicates that the system does guarantee availability, in terms of
the CAP theorem.
Basically Available
Soft State indicates that the state of the system may change over time, even without
input. This is because of the eventual consistency model.
Soft State
Eventual Consistency indicates that the system will become consistent over time,
given that the system doesn't receive input during that time.
Eventual Consistency
A BASE system gives up on consistency.
NoSQL database - A BASE not ACID system
11. Slide 11 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
~ 150 No SQL Database
are there in Market
~150
NoSQL database – Not a Panacea
12. Slide 12 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
NoSQL Database – Storage Architecture
CouchDB, MongoDB
Collection of key value
Connections
Incomplete Data
Tolerant
Query Performance, No
Standard Query Syntax
Hbase, Cassandra
Column Families
Fast Look-ups
Very Low Level API
Amazon Simple DB,
Redis
Collection of Key
Value pairs
Fast Look-ups
Stored Data
has no Schema
InfoGrid, Infinite Graph
“Property Graph” - Nodes
Graph Algorithms – Shortest
Path, Connected ness, Etc
Not easy to Cluster, traverse
whole graph to get answer
Data Model
Example
Weakness
Strength
Data Model
Example
Weakness
Strength
Data Model
Example
Weakness
Strength
Data Model
Example
Weakness
Strength
Document Data
Store Databases
Key Value
Databases
Columnar NoSQL
Databases
Graph NoSQL
Databases
No SQL
Database Types
www.edureka.in
13. Slide 13 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Right Data Model
Pros and Cons of
Consistency
Compromising
Features of RDBMS
Step 2
Step 3
Selecting a NoSQL database
Step 1
www.edureka.in
14. Slide 14 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Where to Use Cassandra?
If looking for simple setup, maintenance and code
Very High Velocity Random Reads & Writes
Flexible Sparse / Wide Column Requirements
No Multiple Secondary Index Needs
www.edureka.in
15. Slide 15 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.inwww.edureka.in
Massive Scale, High Availability
Cassandra Use Case - Twitter
16. Slide 16 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Where NOT to Use Cassandra?
Do not use Cassandra if your application has:
Secondary Indexes.
Relational Data.
Transactional (Rollback, Commit)
Primary & Financial Records.
Stringent Security & Authorization Needs On Data
Dynamic Queries on Columns.
Searching Column Data
Low Latency
www.edureka.in
17. Slide 17 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Where to Use HBase
Optimized for reads
Well suited for doing Range based scans
Applications with strict consistency requirements
Applications with fast read and writes with scalability
Facebook uses it to manage its user statuses, photos, chat messages etc.
www.edureka.in
18. Slide 18 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.inwww.edureka.in
Consistency and Scale
HBase Use Case - Facebook Messenger
19. Slide 19 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
It is not optimized for classic transactional applications or even relational analytics
Application that need:
full table scans
data to be aggregated, rolled up, analysed across rows
Where Not to use HBase
20. Slide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
Where to Use MongoDB
www.edureka.in
RDBMS replacement for Web Applications
Semi-structured Content Management
Real-time Analytics & High-Speed Logging
Caching and High Scalability
Web 2.0, Media, SAAS, Gaming
http://www.mongodb.org/about/production-deployments/
21. Slide 21 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
MySQL for Active posts
MongoDB for Archived posts
Migrated Two billion plus posts to MongoDB
Migrated from RDBMS to MongoDB
Storage of venues and check-ins
High-performance and Schema-free
MongoDB Use Cases
22. Slide 22 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
Highly Transactional Applications
Applications with traditional database systems requirements such as foreign-key
constraints etc.
Where Not to use MongoDB
23. Slide 23 www.edureka.in
Distributed and
scalable big data store
Strong consistency
Built on top of Hadoop
Distributed File
system (HDFS)
CP on CAP
Cassandra MongoDBHBase
High availability
Incremental scalability
Eventually consistent
Trade-offs between
consistency and latency
Minimal administration
No SPF (Single Point of Failure)
AP on CAP
Schemas to change as applications
evolve (Schema-free)
Full Index Support for High
Performance.
Replication and Failover for High
Availability.
Auto Sharding for Easy Scalability.
Rich Document based queries for
Easy readability
CP on CAP
HBase Vs Cassandra Vs MongoDB
24. Slide 24
Questions?
Buy NoSQL database Courses at : www.edureka.in
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
www.edureka.in