SlideShare a Scribd company logo
1 of 49
Download to read offline
NoSQL Data Modeling
Concepts and Cases


Shashank Tiwari
blog: shanky.org | twitter: @tshanky
st@treasuryofideas.com
NoSQL?
NoSQL : Various Shapes and Sizes

• Document Databases


• Column-family Oriented Stores


• Key/value Data stores


• XML Databases


• Object Databases


• Graph Databases
Key Questions

• How do I model data for my application?


• How do I determine which one is right for me?


• Can I easily shift from one database to the other?


• Is there a standard way of storing, accessing, and querying data?
Agenda for this session

• Explore some of the main NoSQL products


• Understand how they are similar and different


• How best to use these products in the stack


•
Document Databases




• also GenieDB, SimpleDB
What is a document db?

• One that stores documents


• Popular options:


  • MongoDB -- C++


  • CouchDB -- Erlang


  • Also Amazon’s SimpleDB


• ...what exactly is a document?
In the real world




• (Source: http://guide.couchdb.org/draft/why.html)
In terms of JSON

• {name: “John Doe”,


• zip: 10001}
What about db schema?

• Schema-less


• Different documents could be stored in a single collection
Data types: MongoDB

• Essential JSON types:


• string


• integer


• boolean


• double
Data types: MongoDB (...cont)

• Additional JSON types


• null, array and object


• BSON types -- binary encoded serialization of JSON like documents


   • date, binary data, object id, regular expression and code


   • (Reference: bsonspec.org)
A BSON example: object id
Data types: CouchDB

• Everything JSON


• Large objects: attachments
CRUD operations for documents

• Create


• Read


• Update


• Delete
MongoDB: Create Document

• use mydb


• w = {name: “John Doe”, zip: 10001};


• db.location.save(w);
Create db and collection

• Lazily created


• Implicitly created


• use mydb


• db.collection.save(w)
MongoDB: Read Document

• db.location.find({zip: 10001});


• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe",
  "zip" : 10001 }
MongoDB: Read Document (...cont)

• db.location.find({name: "John Doe"});


• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe",
  "zip" : 10001 }
MongoDB: Update Document

• Atomic operations on single documents


• db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
CouchDB: RESTful

• Supports REST verbs: GET, HEAD, PUT, POST, DELETE


• Supports Replication


• Supports the notion of attachments


• Could work in offline modes and supports small footprint profiles
Sorted Ordered Column-family Datastores

• Sorted


• Ordered


• Distributed


• Map
Essential schema
Multi-dimensional View
A Map/Hash View

•{


• "row_key_1" : { "name" : {


•     "first_name" : "Jolly", "last_name" : "Goodfellow"


•     } } },


•    "location" : { "zip": "94301" },
Architectural View (HBase)
The Persistence Mechanism
Model Wrappers (The GAE Way)

• Python


  • Model, Expando, PolyModel


• Java


  • JDO, JPA
HBase Data Access

• Thrift + Avro


• Java API -- HTable, HBaseAdmin


• Hive (SQL like)


• MapReduce -- sink and/or source
Transactions

• Atomic row level


• GAE Entity Groups
Indexes

• Row ordered


• Secondary indexes


• GAE style multiple indexes


  • thinking from output to query
Use cases

• Many Google’s Products


• Facebook Messaging


• StumbleUpon


  • Open TSDB


• Mahalo, Ning, Meetup, Twitter, Yahoo!


• Lily -- open source CMS built on HBase & Solr
Brewer’s CAP Theorem




• http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf


• http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
Distributed Systems & Consistency (case: success)
Distributed Systems & Consistency (case: failure)
Binding by Transactions
Consistency Spectrum
Inconsistency Window
RWN Math

• R – Number of nodes that are read from.


• W – Number of nodes that are written to.


• N – Total number of nodes in the cluster.




• In general: R < N and W < N for higher availability
R+W>N

• Easy to determine consistent state


• R + W = 2N


  • absolutely consistent, can provide ACID gaurantee


• In all cases when R + W > N there is some overlap between read and write
  nodes.
R = 1, W = N

• more reads than writes


•W=N


  • 1 node failure = entire system unavailable
R = N, W =1

•W=N


 • Chance of data inconsistency quite high


•R=N


 • Read only possible when all nodes in the cluster are available
R = W = ceiling ((N + 1)/2)
Effective quorum for eventual consistency
Eventual consistency variants

• Causal consistency -- A writes and informs B then B always sees updated
  value


• Read-your-writes-consistency -- A writes a new value and never see the old
  one


• Session consistency -- read-your-writes-consistency within a client session


• Monotonic read consistency -- once seen a new value, never return previous
  value


• Monotonic write consistency -- serialize writes by the same process
Dynamo Techniques

• Consistent Hashing (Incremental scalability)


• Vector clocks (high availability for writes)


• Sloppy quorum and hinted handoff (recover from temporary failure)


• Gossip based membership protocol (periodic, pair wise, inter-process
  interactions, low reliability, random peer selection)


• Anti-entropy using Merkle trees


• (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-
  dynamo-sosp2007.pdf)
Consistent Hashing
CouchDB MVCC Style




• (Source: http://guide.couchdb.org/draft/consistency.html)
Key/value Stores

• Memcached


• Membase


• Redis


• Tokyo Cabinet


• Kyoto Cabinet


• Berkeley DB
Questions?




• blog: shanky.org | twitter: @tshanky


• st@treasuryofideas.com

More Related Content

What's hot

5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
Fabio Fumarola
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
BigBlueHat
 
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
Benefits of using MongoDB: Reduce Complexity & Adapt to ChangesBenefits of using MongoDB: Reduce Complexity & Adapt to Changes
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
Alex Nguyen
 

What's hot (20)

Mongo DB
Mongo DBMongo DB
Mongo DB
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
 
Azure DocumentDB
Azure DocumentDBAzure DocumentDB
Azure DocumentDB
 
No sql
No sqlNo sql
No sql
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 
SQL & NoSQL
SQL & NoSQLSQL & NoSQL
SQL & NoSQL
 
MongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL DatabaseMongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL Database
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
Benefits of using MongoDB: Reduce Complexity & Adapt to ChangesBenefits of using MongoDB: Reduce Complexity & Adapt to Changes
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
 
Cool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDBCool NoSQL on Azure with DocumentDB
Cool NoSQL on Azure with DocumentDB
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Azure doc db (slideshare)
Azure doc db (slideshare)Azure doc db (slideshare)
Azure doc db (slideshare)
 
Introduction à DocumentDB
Introduction à DocumentDBIntroduction à DocumentDB
Introduction à DocumentDB
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
 
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
 
The What and Why of NoSql
The What and Why of NoSqlThe What and Why of NoSql
The What and Why of NoSql
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB Schema Design by Examples
MongoDB Schema Design by ExamplesMongoDB Schema Design by Examples
MongoDB Schema Design by Examples
 

Viewers also liked

Ocean base海量结构化数据存储系统 hadoop in china
Ocean base海量结构化数据存储系统 hadoop in chinaOcean base海量结构化数据存储系统 hadoop in china
Ocean base海量结构化数据存储系统 hadoop in china
knuthocean
 
Consistency in Distributed Systems
Consistency in Distributed SystemsConsistency in Distributed Systems
Consistency in Distributed Systems
Shane Johnson
 

Viewers also liked (20)

Big Data Modeling
Big Data ModelingBig Data Modeling
Big Data Modeling
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big Data
 
Ocean base海量结构化数据存储系统 hadoop in china
Ocean base海量结构化数据存储系统 hadoop in chinaOcean base海量结构化数据存储系统 hadoop in china
Ocean base海量结构化数据存储系统 hadoop in china
 
Couchdb and me
Couchdb and meCouchdb and me
Couchdb and me
 
Ooredis
OoredisOoredis
Ooredis
 
Mysql HandleSocket技术在SNS Feed存储中的应用
Mysql HandleSocket技术在SNS Feed存储中的应用Mysql HandleSocket技术在SNS Feed存储中的应用
Mysql HandleSocket技术在SNS Feed存储中的应用
 
Consistency Models in New Generation Databases
Consistency Models in New Generation DatabasesConsistency Models in New Generation Databases
Consistency Models in New Generation Databases
 
8 minute MongoDB tutorial slide
8 minute MongoDB tutorial slide8 minute MongoDB tutorial slide
8 minute MongoDB tutorial slide
 
Consistency in Distributed Systems
Consistency in Distributed SystemsConsistency in Distributed Systems
Consistency in Distributed Systems
 
Big Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data Modeling
 
skip list
skip listskip list
skip list
 
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش سوم
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش سومآموزش مدیریت بانک اطلاعاتی اوراکل - بخش سوم
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش سوم
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
Thoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency ModelsThoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency Models
 
Data Modeling for Integration of NoSQL with a Data Warehouse
Data Modeling for Integration of NoSQL with a Data WarehouseData Modeling for Integration of NoSQL with a Data Warehouse
Data Modeling for Integration of NoSQL with a Data Warehouse
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and Spark
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
Coherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architectureCoherence and consistency models in multiprocessor architecture
Coherence and consistency models in multiprocessor architecture
 
Consistency in Distributed Systems
Consistency in Distributed SystemsConsistency in Distributed Systems
Consistency in Distributed Systems
 

Similar to SDEC2011 NoSQL Data modelling

SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
Korea Sdec
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data Hullabaloo
Andrew Brust
 

Similar to SDEC2011 NoSQL Data modelling (20)

SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Webinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDBWebinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDB
 
Mongodb my
Mongodb myMongodb my
Mongodb my
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 
NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011
 
NoSQL and The Big Data Hullabaloo
NoSQL and The Big Data HullabalooNoSQL and The Big Data Hullabaloo
NoSQL and The Big Data Hullabaloo
 
A Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data HullabalooA Practical Look at the NOSQL and Big Data Hullabaloo
A Practical Look at the NOSQL and Big Data Hullabaloo
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
No sql Database
No sql DatabaseNo sql Database
No sql Database
 
No SQL : Which way to go? Presented at DDDMelbourne 2015
No SQL : Which way to go?  Presented at DDDMelbourne 2015No SQL : Which way to go?  Presented at DDDMelbourne 2015
No SQL : Which way to go? Presented at DDDMelbourne 2015
 
NoSQL, which way to go?
NoSQL, which way to go?NoSQL, which way to go?
NoSQL, which way to go?
 
Object Relational Database Management System
Object Relational Database Management SystemObject Relational Database Management System
Object Relational Database Management System
 
Drop acid
Drop acidDrop acid
Drop acid
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overview
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
NoSQL Introduction
NoSQL IntroductionNoSQL Introduction
NoSQL Introduction
 

More from Korea Sdec

SDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerSDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuer
Korea Sdec
 
SDEC2011 Introducing Hadoop
SDEC2011 Introducing HadoopSDEC2011 Introducing Hadoop
SDEC2011 Introducing Hadoop
Korea Sdec
 
Sdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoopSdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoop
Korea Sdec
 
SDEC2011 Essentials of Pig
SDEC2011 Essentials of PigSDEC2011 Essentials of Pig
SDEC2011 Essentials of Pig
Korea Sdec
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
Korea Sdec
 
SDEC2011 Essentials of Hive
SDEC2011 Essentials of HiveSDEC2011 Essentials of Hive
SDEC2011 Essentials of Hive
Korea Sdec
 
Sdec2011 Introducing Hadoop
Sdec2011 Introducing HadoopSdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
Korea Sdec
 
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveSDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
Korea Sdec
 
SDEC2011 Going by TACC
SDEC2011 Going by TACCSDEC2011 Going by TACC
SDEC2011 Going by TACC
Korea Sdec
 

More from Korea Sdec (15)

SDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerSDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuer
 
SDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionSDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestion
 
SDEC2011 Introducing Hadoop
SDEC2011 Introducing HadoopSDEC2011 Introducing Hadoop
SDEC2011 Introducing Hadoop
 
Sdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoopSdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoop
 
SDEC2011 Essentials of Pig
SDEC2011 Essentials of PigSDEC2011 Essentials of Pig
SDEC2011 Essentials of Pig
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
 
SDEC2011 Essentials of Hive
SDEC2011 Essentials of HiveSDEC2011 Essentials of Hive
SDEC2011 Essentials of Hive
 
Sdec2011 Introducing Hadoop
Sdec2011 Introducing HadoopSdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
 
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and HiveSDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive
 
SDEC2011 Rapidant
SDEC2011 RapidantSDEC2011 Rapidant
SDEC2011 Rapidant
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
SDEC2011 Going by TACC
SDEC2011 Going by TACCSDEC2011 Going by TACC
SDEC2011 Going by TACC
 
SDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesSDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & Experiences
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speed
 
SDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudSDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloud
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

SDEC2011 NoSQL Data modelling

  • 1. NoSQL Data Modeling Concepts and Cases Shashank Tiwari blog: shanky.org | twitter: @tshanky st@treasuryofideas.com
  • 3. NoSQL : Various Shapes and Sizes • Document Databases • Column-family Oriented Stores • Key/value Data stores • XML Databases • Object Databases • Graph Databases
  • 4. Key Questions • How do I model data for my application? • How do I determine which one is right for me? • Can I easily shift from one database to the other? • Is there a standard way of storing, accessing, and querying data?
  • 5. Agenda for this session • Explore some of the main NoSQL products • Understand how they are similar and different • How best to use these products in the stack •
  • 6. Document Databases • also GenieDB, SimpleDB
  • 7. What is a document db? • One that stores documents • Popular options: • MongoDB -- C++ • CouchDB -- Erlang • Also Amazon’s SimpleDB • ...what exactly is a document?
  • 8. In the real world • (Source: http://guide.couchdb.org/draft/why.html)
  • 9. In terms of JSON • {name: “John Doe”, • zip: 10001}
  • 10. What about db schema? • Schema-less • Different documents could be stored in a single collection
  • 11. Data types: MongoDB • Essential JSON types: • string • integer • boolean • double
  • 12. Data types: MongoDB (...cont) • Additional JSON types • null, array and object • BSON types -- binary encoded serialization of JSON like documents • date, binary data, object id, regular expression and code • (Reference: bsonspec.org)
  • 13. A BSON example: object id
  • 14. Data types: CouchDB • Everything JSON • Large objects: attachments
  • 15. CRUD operations for documents • Create • Read • Update • Delete
  • 16. MongoDB: Create Document • use mydb • w = {name: “John Doe”, zip: 10001}; • db.location.save(w);
  • 17. Create db and collection • Lazily created • Implicitly created • use mydb • db.collection.save(w)
  • 18. MongoDB: Read Document • db.location.find({zip: 10001}); • { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 19. MongoDB: Read Document (...cont) • db.location.find({name: "John Doe"}); • { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 20. MongoDB: Update Document • Atomic operations on single documents • db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
  • 21. CouchDB: RESTful • Supports REST verbs: GET, HEAD, PUT, POST, DELETE • Supports Replication • Supports the notion of attachments • Could work in offline modes and supports small footprint profiles
  • 22. Sorted Ordered Column-family Datastores • Sorted • Ordered • Distributed • Map
  • 25. A Map/Hash View •{ • "row_key_1" : { "name" : { • "first_name" : "Jolly", "last_name" : "Goodfellow" • } } }, • "location" : { "zip": "94301" },
  • 28. Model Wrappers (The GAE Way) • Python • Model, Expando, PolyModel • Java • JDO, JPA
  • 29. HBase Data Access • Thrift + Avro • Java API -- HTable, HBaseAdmin • Hive (SQL like) • MapReduce -- sink and/or source
  • 30. Transactions • Atomic row level • GAE Entity Groups
  • 31. Indexes • Row ordered • Secondary indexes • GAE style multiple indexes • thinking from output to query
  • 32. Use cases • Many Google’s Products • Facebook Messaging • StumbleUpon • Open TSDB • Mahalo, Ning, Meetup, Twitter, Yahoo! • Lily -- open source CMS built on HBase & Solr
  • 33. Brewer’s CAP Theorem • http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf • http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
  • 34. Distributed Systems & Consistency (case: success)
  • 35. Distributed Systems & Consistency (case: failure)
  • 39. RWN Math • R – Number of nodes that are read from. • W – Number of nodes that are written to. • N – Total number of nodes in the cluster. • In general: R < N and W < N for higher availability
  • 40. R+W>N • Easy to determine consistent state • R + W = 2N • absolutely consistent, can provide ACID gaurantee • In all cases when R + W > N there is some overlap between read and write nodes.
  • 41. R = 1, W = N • more reads than writes •W=N • 1 node failure = entire system unavailable
  • 42. R = N, W =1 •W=N • Chance of data inconsistency quite high •R=N • Read only possible when all nodes in the cluster are available
  • 43. R = W = ceiling ((N + 1)/2) Effective quorum for eventual consistency
  • 44. Eventual consistency variants • Causal consistency -- A writes and informs B then B always sees updated value • Read-your-writes-consistency -- A writes a new value and never see the old one • Session consistency -- read-your-writes-consistency within a client session • Monotonic read consistency -- once seen a new value, never return previous value • Monotonic write consistency -- serialize writes by the same process
  • 45. Dynamo Techniques • Consistent Hashing (Incremental scalability) • Vector clocks (high availability for writes) • Sloppy quorum and hinted handoff (recover from temporary failure) • Gossip based membership protocol (periodic, pair wise, inter-process interactions, low reliability, random peer selection) • Anti-entropy using Merkle trees • (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon- dynamo-sosp2007.pdf)
  • 47. CouchDB MVCC Style • (Source: http://guide.couchdb.org/draft/consistency.html)
  • 48. Key/value Stores • Memcached • Membase • Redis • Tokyo Cabinet • Kyoto Cabinet • Berkeley DB
  • 49. Questions? • blog: shanky.org | twitter: @tshanky • st@treasuryofideas.com