NoSQL Talk at eBuddy


Published in: Technology
  1. 1. Agenda What is NoSQL Databases Overview Aggregate Data Models Distributions Models Consistency NWR
  2. 2. Purpose of this talk Just to share some information To spend time nicely Facilitate the discussion (questions are welcome )
  3. 3. Rise of NoSQL Inspired by 2 papers: Amazon Dynamo Google BigTable
  4. 4. What is NoSQL Not a well defined term (just the name of one single meetup in 2009 at San Francisco)
  5. 5. So, what does it stand for? It is better to pay attention what does it mean rather than what does it stand for
  6. 6. Common characteristics of NoSQL ● Don't use SQL as a query language (provide it is own query mechanism) ● Non relational ● Open-source projects ● Run on clusters ● Developed in 21st century ● Schemaless
  7. 7. Schemaless While being schemaless, there is still implicit schema in the application code
  8. 8. Why do you use NoSQL To operate on big data on multiple machines running across the cluster Increase developer productivity (even if there is no demand for big data)
  9. 9. What is wrong with traditional RDBMS ● Nothing really, they will not disappear (who knows ;) ● Well defined tools (even the whole profession is behind DBA) ● There is no black or white choice, NoSQL and RDBMS will continue to work closely together, i.e. the rise of Polyglot Persistence
  10. 10. But, RDBMS is not perfect Impedance mismatch Running on cluster is a challenge
  11. 11. NoSQL World (major ones) Document Oriented Key-Value Column-Family Graph Databases
  12. 12. Data Model Aggregate Oriented VS Relational - Access by key - Make it easier to manage data storage over clusters - Usually you adopt you aggregate/data model to the query pattern your application has Aggregate – is the collection of related objects that we wish to treat as a unit
  13. 13. ACID NoSQL has ACID, but in scope of one aggregate (we can do atomic manipulate of a single aggregate at a time) Graph databases actually have full support of ACID
  14. 14. Distribution Models ● Single Sever (no distribution at all) ● Sharding (can be combined with replication) (shard key – range based or hash based) ● Master-Slave Replication (“read” scalability) (writes to M, reads can be done from S) (M – single point of failure) ● Peer-to-Peer Replication (common to CF) (consistency issue)
  15. 15. (Eventual)Consistency Actual trade off is between latency and consitency
  16. 16. NWR ● N – number of nodes to replicate to (replication factor, number of copies in the cluster) ● W – number of nodes to write before write succeeded successful ● R – number of nodes to read from before read succeeded successful
  17. 17. NWR ● W+R <= N – eventual consistency (eventually all the nodes in the cluster will get the data) ● W = N, R = 1 – consistency by writes (what RDBMS does) ● W = 1, R = N – consistency by reads (conflicts must be resolved somehow) ● W + R > N – consistency by quorum
  18. 18. Quorum (W+R > N) Read from more than half and write to more than half (QUORUM = N/2 + 1)
  19. 19. Books