SlideShare a Scribd company logo
1 of 38
Download to read offline
Modelling data in a
schema-free world?
          FrOSCon
         2012-08-26

   Jan Steemann (triAGENS)




              © 2012 triAGENS GmbH | 2012-08-26   1
About me

   Professional database developer @ triAGENS GmbH, Cologne,
    Germany
   Interested in all kind of databases
   Currently working on an open source mixed-model nosql database




                                      © 2012 triAGENS GmbH | 2012-08-26   2
Schemas in relational databases

   Relational databases are around for a long time already
   In most relational databases we can deal with these schema elements:
       tables                           relationships
       columns / fields                 views
       data types                       triggers
       indexes                          procedures / functions
   Schema elements are auxiliary means to organise and validate the
    actual data!




                                    © 2012 triAGENS GmbH | 2012-08-26   3
Normalisation in relational databases

   Relational databases like your data to be in a highly normalized and
    de-duplicated form
   Put simple, normalisation means
       dividing big entities (tables, fields) into smaller ones
       defining relationships between them
   Example data, not normalised:
        user_id   user_name group_name         status
           1          foo       Power user     Active
           2          bar         Rookie      Inactive
           3          baz         Rookie       Active




                                                 © 2012 triAGENS GmbH | 2012-08-26   4
Normalisation in relational databases

   Example data, normalised (relationships not depicted):
                Table „users“                Table „groups“                      Table „status“
          id    name    group    status       id      name                        id     name
          1      foo      1        1          1       Power user                 1       Active
          2      bar      2        2          2         Rookie                   2      Inactive
          3      baz      2        1

   Normalisation helps to
       reduce redundancies and storage requirements
       increase consistency and integrity
   Normalised data is re-arranged on retrieval using joins, projections etc.
   Due to normalisation, data may be fetched from several tables

                                             © 2012 triAGENS GmbH | 2012-08-26                     5
Issues with relational databases

   Issues of relational databases:
       scaling is hard:
        This is due to the ACID (atomicity, consistency, isolation, durability) guarantees
        they usually provide.
       rigid database schemas can obstruct rapid application development:
        Initial and ongoing database schema development and evolution takes time,
        but you must do it in order to have your database accept any data at all,
        designing good schemas is hard
       fixed database schemas block deployment of changes:
        When deploying application changes, the database schema needs to stay in sync
        with the application. Deploying application changes may even cause downtime due
        to long-running database migrations (e.g. ALTER TABLE statements)




                                                © 2012 triAGENS GmbH | 2012-08-26            6
Issues with relational databases

   Some more issues:
       static schemas do not play well with dynamically typed languages:
        many languages support dynamic keys for their container types, dynamic
        variables, run-time typing etc. To use this with a relational database, one needs
        workarounds (e.g. entity-attribute value modelling, auxilliary tables) that increase
        complexity and make it slow
       saving multi-valued data is „forbidden“:
        Mutli-value containers (e.g. hashes, dictionaries, lists, arrays) are used
        frequently in programming but must not be saved in the database as such
        Complex workarounds are necessary to handle that
       saving objects can be expensive:
        saving one object in a relational way may create a lot of noise in relational
        databases (though this may be obscured by frameworks)




                                                © 2012 triAGENS GmbH | 2012-08-26         7
NoSQL databases – why?

   In the past 5 years, many new database products have emerged
   The common term for these new products is „NoSQL“ databases,
    mainly interpreted as „not only SQL“
   „non-relational“ is also a good name for them because almost no
    NoSQL database follows the relational model
   Instead, most NoSQL databases overcome specific issues that
    relational databases have, especially
       scaling
       schema rigidity




                                    © 2012 triAGENS GmbH | 2012-08-26   8
NoSQL databases – scaling

   Scaling problems of relational databases are addressed in most
    NoSQL databases by reducing ACID guarantees and removing
    relational features:
       Atomicity is guaranteed only for some operations, not all
       Isolation is only guaranteed for some operations
       Strict consistency is relaxed to eventual consistency
       Strict durability is normally turned off (but is configurable by admin)
       Don't offer referential integrity or do not enforce it
       Complex data operations as joins etc. are highly restricted or completely
        disallowed
   These measures help to reduce locking and synchronisation
    overhead and make scaling possible at all

                                                  © 2012 triAGENS GmbH | 2012-08-26   9
NoSQL databases – schema rigidity

   Most NoSQL databases lift the requirement of having to define
    explicit database schemas:
       Some NoSQL databases don't use schemas at all, they are fully schema-free
       Some require the user to only predefine highest level schema elements
        (e.g. databases, collections, column families) upfront (and some allow it as you go)
       Some use schemas implicitly by analysing incoming data, using dynamic schemas
        for each data element inserted instead of predefined schemas
   Benefits:
       Less time spent on upfront schema development and ongoing schema evolution,
        more time can be used for actual application development
       Can store and retrieve programming language objects more easily
       No need to sync application changes deployment with database schema changes,
        no database downtime required for schema modifications

                                              © 2012 triAGENS GmbH | 2012-08-26        10
NoSQL databases – overview

   There is a multitude of different NoSQL databases around
   http://nosql-databases.org/ currently lists 122 NoSQL databases
   Some of the most popular ones:
       MongoDB                       Riak
       CouchDB                       Redis
       RavenDB                       Tokyo Cabinet
       Hadoop                        Voldemort
       Hypertable                    Neo4j
       Cassandra                     OrientDB
       SimpleDB                      InfiniteGraph
   They are all different, but a basic categorisation follows

                                         © 2012 triAGENS GmbH | 2012-08-26   11
NoSQL databases – categorisation

   Key-value stores:
                                                                       key             value
       Map value data to unique string keys (identifiers)
                                                                      obj3         „{ a: 1, b: 1 }“
       Allow access to data by key only
                                                                      obj1         „{ a: 2, b: 1 }“
       Treat data as opaque (data has no schema)
                                                                      obj2         „{ a: 1, b: 2 }“
       Do not save keys in predicatable order
                                                                  numUsers              „25“
       Sometimes disallow key enumeration
        because it is expensive                                     userIds         „1, 2, 3, 4“
       May have one or multiple key spaces to logically similar group keys
       Can implement scaling and partitioning easily due to simplistic data model




                                               © 2012 triAGENS GmbH | 2012-08-26                      12
NoSQL databases – categorisation

   Key-value store extensions (containers, counters, expiration):
       may allow querying on value data, too, by offering secondary indexes.
        secondary index values must be provided by application because store
        still treats data as opaque (data has no schema)
       may offer atomic container operations, counters (e.g. inc / dec) or value
        expiration features that allow broader queries and use cases

          key          value          index on „a“        atomic operations
          obj3     „{ a: 1, b: 1 }“        1
          obj1     „{ a: 2, b: 1 }“        1
          obj2     „{ a: 1, b: 2 }“        2
        numUsers        „25“                                    increment
         userIds     „1, 2, 3, 4“                             add 5 to set


                                                © 2012 triAGENS GmbH | 2012-08-26   13
NoSQL databases – categorisation

   Ordered key-value stores:
       Co-locate values with adjacent keys so keys are stored sorted
       Also allow range queries on keys
       Still treat value data as opaque (data has no schema)
            key           value
          prj1_cnt         „25“
         prj1_user1   „{ a: 2, b: 1 }“
         prj1_user2   „{ a: 1, b: 2 }“
          prj2_cnt         „42“
         prj2_user1   „{ a: 5, b: 1 }“
         prj2_user2   „{ a: 1, b: 3 }“



                                              © 2012 triAGENS GmbH | 2012-08-26   14
NoSQL databases – categorisation

   Wide column stores:
       Save values in an n-level map-of-maps substructure,
        e.g. column families, supercolumns and columns with value & timestamp
       Allow querying specific attributes for keys (at least at the top levels)
           key                           columns
          user1        name: „foo“       city: „bar“
          user2        name: „bar“       city: „baz“          work: „foo“
          user3        name: „baz“

            key       supercolumns       columns
          user1          address         city: „bar“         street: „foo“
                          name           first: „foo“         last: „bar“



                                                   © 2012 triAGENS GmbH | 2012-08-26   15
NoSQL databases – categorisation

   Document stores:
       Normally based on key-value stores (each document still has a unique key)
       Allow to save documents with logical similarity in „databases“ or „collections“
       Treat data records as attribute-structured documents (data is no more opaque)
       Use inherent schemas of documents to categorise attributes and sub-attributes
       Introduce data types (primitive types, compound types)
       Allow arbitrary nesting and document complexity using lists, arrays etc.
       Allow individual documents to have different attributes, even if in same collection
       Often allow querying and indexing document attributes




                                               © 2012 triAGENS GmbH | 2012-08-26          16
NoSQL databases – categorisation

   Example documents in a document store:
        key                    attributes and sub-attributes
       user1     address       city: „bar“        street: „foo“
                name: „foo“
                 birthday      month: 12              day: 1                year: 1983
                   likes        „skiing“            „fishing“
               avgScore: 3.5

       user2     address       city: „baz“      street: „foobar“
                name: „bar“

      user42   banana: true




                                        © 2012 triAGENS GmbH | 2012-08-26                17
NoSQL databases – categorisation

   Graph databases:
       Often based on key-value or document stores
       Additionally save relationships (edges) between documents (vertices)
       Allow querying relationships between documents
       Relationships can have attributes as well

          user1       name: „foo“                                      user2       name: „bar“
                        age: 32                                                     age: 35

                                       rated           value: 3

                                       rated           value: 5




                                               © 2012 triAGENS GmbH | 2012-08-26                 18
Caveat emptor

   Things you should be aware of when using a NoSQL database:
       multi-query, multi-document or multi-key „transactions“ may not be atomic
        and not be isolated
       the application may need to handle inconsistent / stale reads
       the application may need to resolve version conflicts or handle multiple schema
        versions of documents
       referential integrity needs to be enforced by the application
       if the datastore treats data as opaque, it will not at all validate it
       by design, the database may not be able to answer some classes of queries,
        e.g. joins need to be performed by the application, not by the database
   Put simple: database will do less, application must do more!



                                                  © 2012 triAGENS GmbH | 2012-08-26   19
Which datastore to use?

   Relational databases are still a good fit for many problems,
    but not for all
   The same is true for NoSQL databases
   Each category of NoSQL databases and each individual product
    has its own sweet spots (and downsides)
   Check what you want to achieve first, then pick the right tool(s)
    for the job
   It may make sense to use different databases in parallel




                                      © 2012 triAGENS GmbH | 2012-08-26   20
Why care about data modelling in NoSQL?

   Even in NoSQL databases, your data will have some structure
   This effective data model determines
       the consistency level you're able to achieve
       the retrieval and update performance
       the storage space required for saving the data
       the types of queries you will be able to run on the data
   So even if you don't define explicit schemas, you still need to care
    about structuring your data properly




                                               © 2012 triAGENS GmbH | 2012-08-26   21
Some NoSQL data modelling considerations

   Data modelling for schema-less or dynamic schema datastores
    is different than modelling for relational databases
   Different NoSQL databases require different modelling
   The techniques to apply are not revolutionary, but in the relational
    world they are unnecessary or even anti-patterns
   A few things to look at:
       Proper key and attribute naming
       Building „indexes“ by hand for aggregation
       Data denormalisation and duplication




                                               © 2012 triAGENS GmbH | 2012-08-26   22
Picking „good“ keys – key length

   In plain key-value stores, keys are the only thing that you can query
   Keys are also present in other categories of NoSQL databases.
    Data elements are still identified using unique keys
   Keys should be picked so that they are sensible and legible
   Keys are normally stored just once, but...
   ...in some NoSQL databases, key values may be duplicated a lot of
    times in secondary indexes, append-only log structures etc.
   In these situations, keys should also be as short as can be tolerated
    to not waste disk space / RAM




                                      © 2012 triAGENS GmbH | 2012-08-26   23
Key length – CouchDB example

   Case: insert 1000 documents with unique keys into vanilla CouchDB
   Key prefix is „very-long-id“
   For each document, we also store an attribute named „very-long-value“
   Setup:
    for i in `seq 1000 1999`
     do
      curl -X POST                              
          --header "Content-Type: application/json" 
          --data "{ "_id": "very-long-id-$i", 
                  "very-long-value": "$i" }" 
          http://127.0.0.1:5984/keytest
     done
                                      © 2012 triAGENS GmbH | 2012-08-26   24
Key length – CouchDB example

   Now check how often the key prefix occurs in the datafile:
    strings /db/keytest.couch | grep -c "very-long-id"
    strings /db/keytest.couch | grep -c "very-long-value"
   Results:
       the non-key attribute name „very-long-value“ is saved 1,000 times
       the key prefix is stored 42,492 times instead of the expected 1,000 times !!
   This is due to MVCC that organises data in copy-on-write B+ trees




                                               © 2012 triAGENS GmbH | 2012-08-26       25
Picking „good“ keys – hierarchical keys

   If in a key-value store only one global keyspace is available, use
    „hierarchical keys“ to distinguish keys from multiple applications or
    multiple users. This helps preventing unintended key collisions.
    Avoid these keys:            Instead, use such keys:
       user1                       app1_user1
       profile42                   client6_profile42
       item23                      app1_client6_item23
   If your datastore allows efficient key enumeration with prefixes,
    this also allows hierarchical operations by prefix, e.g.
    DELETE client6_*, GET app1_client6_item*
   Don't need this in document stores due to collections

                                        © 2012 triAGENS GmbH | 2012-08-26   26
Attribute names in value data

   The value data you save is often a combination of attribute names and
    values, e.g. { „age“: 35, „gender“: „m“, „name“: „foo“ }
   In a fully schema-free datastore (e.g. plain key-value store), the store
    treats data as opaque and cannot tell attribute names and values in the
    value data apart
   In document stores, data is no more opaque. However, there are no
    fixed schemas. Each document, even if in the same collection, could
    have different attributes
   This means non-key attribute names are saved redundantly for EACH
    record/document
   Result: using attribute name „f“ will normally result in much lower disk
    footprint than using attribute name „firstName“.
   Check these MongoDB-specific links:
                                           © 2012 triAGENS GmbH | 2012-08-26   27
       http://christophermaier.name/blog/2011/05/22/MongoDB-key-names
Aggregation – home-made indexes

   NoSQL stores normally don't support aggregation queries, but you
    can workaround that by creating „index“ keys on your own.
    Initial data:
        key                  value
       user1       „{ state: „CA“, age: 35 }“

       user2       „{ state: „CA“, age: 26 }“

       user3       „{ state: „FL“, age: 21 }“

       user42      „{ state: „FL“, age: 39 }“

   From that you can derive „index“ keys that provide users by state.
    You can now run count queries easily:
      users_CA          „user1, user2“
      users_FL         „user3, user42“


                                                © 2012 triAGENS GmbH | 2012-08-26   28
Aggregation – home-made indexes

   „Index“ keys that provides users by age:
         key               value
      users_39            „user42“
      users_35             „user1“
      users_21             „user3“
      users_26             „user2“

   Can also directly aggregate categories in „index“ keys if there are
    (too) many distinct category values:
     users_20_29       „user3, user2“
     users_30_39       „user1, user42“

    (I know, I should have used better key names)

                                         © 2012 triAGENS GmbH | 2012-08-26   29
Aggregation – home-made indexes

   Home-made „indexes“ provide about the same functionality as
    materialised views do in relational databases
   And as in materialised views, this means data duplication!
   The database won't maintain the „indexes“ for you, so you have to
    maintain them yourself (from out of the application)
   Keeping the „indexes“ up-to-date may require data updates
    in several places
   Can easily get ouf of sync or provide an inconsistent view of the data
   Accuracy may still be good enough, but this depends on the use case




                                     © 2012 triAGENS GmbH | 2012-08-26   30
Aggregation – hierarchical keys

   If your datastore allows efficient key range enumeration using leftmost
    prefix queries, using hierarchical „index“ keys can also be used for
    multi-dimensional aggregation
                 key                              query
        Americas_CA_Toronto
        Americas_US_New York
           Asia_CN_Beijing               Query: GET Asia_*
           Asia_TW_Taipeh
           key
          Europe_DE_Berlin             Query: GET Europe_*
         Europe_DE_Cologne

          Europe_FR_Paris            Query: GET Europe_FR_*

         Europe_FR_Toulose


                                     © 2012 triAGENS GmbH | 2012-08-26   31
Aggregation – key buckets

   If the datastore does not allow querying attribute values, buckets
    with predictable key names can be used to support aggegation
   Example case: users online on a website during the last x minutes
   Key name convention: „usersOnline_${slot}“, with ${slot} being
    slot = (time() / 60).floor() * 60
    (the current Unix timestamp rounded to a 60 second slot)
   This will produce predictable key names such as
       usersOnline_1345732920
       usersOnline_1345732980
       UsersOnline_1345733040
   You can now easily save and query which users were online in
    which time slot and create reports
                                      © 2012 triAGENS GmbH | 2012-08-26   32
Denormalisation – nesting sub elements

   It may be beneficial to all nest sub elements of a logical unit in the
    unit itself instead of saving the sub elements separately
   Document stores are built for exactly this case (nesting attributes),
    but this pattern can be used in key-value stores as well
   Instead of saving all sub elements individually
         key                          value
     user1_name                         ...
    user1_address                       ...
     user1_likes                        ...

    Can store all sub elements of a user in one key / document only:
        user1         { name: ..., address: ..., likes: .... }


                                              © 2012 triAGENS GmbH | 2012-08-26   33
Denormalisation – nesting sub elements

   Benefit: nesting sub elements reduces the number of elements that
    keep data for a logical unit (here: „user“)
   This reduces the number of read / write operations for each unit
   Accessing data of a single key is normally an atomic operation, so
    nesting prevents sub elements from getting out sync, avoiding
    inconsistent reads and writes of logical units
   In NoSQL databases, nesting is also a natural way to avoid joins,
    which are not offered by most NoSQL products




                                     © 2012 triAGENS GmbH | 2012-08-26   34
Denormalisation – nesting sub elements

   Downside: nesting will increase the data volume stored for each key,
    increasing the overhead for each read and write in both the datastore
    and the application
   If you overdo nesting, you might end up having the same data stored
    in multiple places, and no clear responsibility for it
   This will waste space and cause consistency issues
   For example, a change to user2's „likes“ attribute requires 3 writes:
        key                                                    value
        user1      { likes: [ ... ], friends: [ { user2: { likes: [ ... ] } }, user15: { likes: [ ... ] } }, ... ] }

        user2      { likes: [ ... ], friends: [ { user67: { likes: [ ... ] } }, user1: { likes: [ ... ] } }, ... ] }

       user15      { likes: [ ... ], friends: [ { user2: { likes: [ ... ] } }, user99: { likes: [ ... ] } }, ... ] }



                                                   © 2012 triAGENS GmbH | 2012-08-26                         35
Denormalisation – nesting sub elements

   Instead of nesting such elements fully, prefer referencing them by
    key / id:
        key                             value
        user1        { likes: [ ... ], friends: [ user2, user15 ] }

        user2        { likes: [ ... ], friends: [ user67, user1 ] }

       user15        { likes: [ ... ], friends: [ user2, user99 ] }
   This will allow you to update data in only one place,
    saving storage space and improving consistency
   The price you pay: to retrieve all details for the referenced items, you
    now need additional read operations




                                                © 2012 triAGENS GmbH | 2012-08-26   36
Summary

   Data modelling for NoSQL databases is different than for relational
    databases:
       Instead of normalising your schema, you'd rather denormalise your data
       Data duplication is common to support analysis and aggregation queries, and for
        fast retrieval
   Different NoSQL databases have different features and to be used
    effectively,
    may require different data models
   To make best use of a specific store, you should have some
    knowledge about how it organises data internally and what guarantees
    it provides:
       Atomic operations should be exploited where possible to improve consistency
       Cross-record/document consistency is mostly not guaranteed.
        If your store at all offers it, it will be slowed down by it
                                             © 2012 triAGENS GmbH | 2012-08-26        37
   In general logic is moved from the database to the application
Misc

   NoSQL user group Cologne:
    every 1st Wednesday / month
    http://www.nosql-cologne.org/
   NoSQL matters conference Barcelona:
    October 6th, 2012
    http://www.nosql-matters.org/
   ArangoDB 1.0 release:
    September 2012
    http://www.arangodb.org/




                                    © 2012 triAGENS GmbH | 2012-08-26   38

More Related Content

What's hot

Deep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBDeep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBArangoDB Database
 
Assignment_4
Assignment_4Assignment_4
Assignment_4Kirti J
 
Processing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelProcessing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelMax Neunhöffer
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.Navdeep Charan
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLbalwinders
 
Big Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingDATAVERSITY
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Ahmed Rashwan
 
Mule database-connectors
Mule database-connectorsMule database-connectors
Mule database-connectorsPhaniu
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills
 
SQL or NoSQL, is this the question? - George Grammatikos
SQL or NoSQL, is this the question? - George GrammatikosSQL or NoSQL, is this the question? - George Grammatikos
SQL or NoSQL, is this the question? - George GrammatikosGeorge Grammatikos
 

What's hot (20)

Deep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBDeep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDB
 
Assignment_4
Assignment_4Assignment_4
Assignment_4
 
Processing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelProcessing large-scale graphs with Google Pregel
Processing large-scale graphs with Google Pregel
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Multi model-databases
Multi model-databasesMulti model-databases
Multi model-databases
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Big Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data Modeling
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?
 
Mule database-connectors
Mule database-connectorsMule database-connectors
Mule database-connectors
 
NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
 
No sq lv2
No sq lv2No sq lv2
No sq lv2
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample material
 
SQL or NoSQL, is this the question? - George Grammatikos
SQL or NoSQL, is this the question? - George GrammatikosSQL or NoSQL, is this the question? - George Grammatikos
SQL or NoSQL, is this the question? - George Grammatikos
 

Viewers also liked

UNIFAL - MySQL 5.6 - Replicação
UNIFAL - MySQL 5.6 - ReplicaçãoUNIFAL - MySQL 5.6 - Replicação
UNIFAL - MySQL 5.6 - ReplicaçãoWagner Bianchi
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASAshnikbiz
 
MySQL Multi Master Replication
MySQL Multi Master ReplicationMySQL Multi Master Replication
MySQL Multi Master ReplicationMoshe Kaplan
 
Processing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelProcessing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelArangoDB Database
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 

Viewers also liked (6)

UNIFAL - MySQL 5.6 - Replicação
UNIFAL - MySQL 5.6 - ReplicaçãoUNIFAL - MySQL 5.6 - Replicação
UNIFAL - MySQL 5.6 - Replicação
 
No sql std
No sql stdNo sql std
No sql std
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
 
MySQL Multi Master Replication
MySQL Multi Master ReplicationMySQL Multi Master Replication
MySQL Multi Master Replication
 
Processing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelProcessing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) Pregel
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 

Similar to Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2012-08-26)

Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...IJCERT JOURNAL
 
Complete first chapter rdbm 17332
Complete first chapter rdbm 17332Complete first chapter rdbm 17332
Complete first chapter rdbm 17332Tushar Wagh
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sqlAnuja Gunale
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL DatabasesAbiral Gautam
 
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...Vivek Adithya Mohankumar
 
DB Luminous... Know Your Data
DB Luminous... Know Your DataDB Luminous... Know Your Data
DB Luminous... Know Your DataRuss Pierce
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3RojaT4
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic WebStefan Ceriu
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebStefan Prutianu
 

Similar to Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2012-08-26) (20)

Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
unit2-ppt1.pptx
unit2-ppt1.pptxunit2-ppt1.pptx
unit2-ppt1.pptx
 
No sql database
No sql databaseNo sql database
No sql database
 
orical
oricalorical
orical
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
 
Complete first chapter rdbm 17332
Complete first chapter rdbm 17332Complete first chapter rdbm 17332
Complete first chapter rdbm 17332
 
Unit 3 MongDB
Unit 3 MongDBUnit 3 MongDB
Unit 3 MongDB
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Nosql
NosqlNosql
Nosql
 
Artigo no sql x relational
Artigo no sql x relationalArtigo no sql x relational
Artigo no sql x relational
 
Nosql
NosqlNosql
Nosql
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
 
NoSQL
NoSQLNoSQL
NoSQL
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL Databases
 
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
 
DB Luminous... Know Your Data
DB Luminous... Know Your DataDB Luminous... Know Your Data
DB Luminous... Know Your Data
 
Unit-10.pptx
Unit-10.pptxUnit-10.pptx
Unit-10.pptx
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic Web
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic Web
 

More from ArangoDB Database

ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022ArangoDB Database
 
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at ScaleArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at ScaleArangoDB Database
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBArangoDB Database
 
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Webinar: ArangoDB 3.8 Preview - Analytics at Scale Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Webinar: ArangoDB 3.8 Preview - Analytics at Scale ArangoDB Database
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDBArangoDB Database
 
Getting Started with ArangoDB Oasis
Getting Started with ArangoDB OasisGetting Started with ArangoDB Oasis
Getting Started with ArangoDB OasisArangoDB Database
 
Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBArangoDB Database
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsArangoDB Database
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarA Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarArangoDB Database
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?ArangoDB Database
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoDB Database
 
ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at ScaleArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at ScaleArangoDB Database
 
Webinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB OasisWebinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB OasisArangoDB Database
 
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019ArangoDB Database
 
Webinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDBWebinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDBArangoDB Database
 
An introduction to multi-model databases
An introduction to multi-model databasesAn introduction to multi-model databases
An introduction to multi-model databasesArangoDB Database
 
Running complex data queries in a distributed system
Running complex data queries in a distributed systemRunning complex data queries in a distributed system
Running complex data queries in a distributed systemArangoDB Database
 

More from ArangoDB Database (20)

ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
 
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at ScaleArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at Scale
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Webinar: ArangoDB 3.8 Preview - Analytics at Scale Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDB
 
Getting Started with ArangoDB Oasis
Getting Started with ArangoDB OasisGetting Started with ArangoDB Oasis
Getting Started with ArangoDB Oasis
 
Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDB
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge Graphs
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarA Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
 
ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at ScaleArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at Scale
 
Webinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB OasisWebinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB Oasis
 
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
 
3.5 webinar
3.5 webinar 3.5 webinar
3.5 webinar
 
Webinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDBWebinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDB
 
An introduction to multi-model databases
An introduction to multi-model databasesAn introduction to multi-model databases
An introduction to multi-model databases
 
Running complex data queries in a distributed system
Running complex data queries in a distributed systemRunning complex data queries in a distributed system
Running complex data queries in a distributed system
 

Recently uploaded

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Recently uploaded (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2012-08-26)

  • 1. Modelling data in a schema-free world? FrOSCon 2012-08-26 Jan Steemann (triAGENS) © 2012 triAGENS GmbH | 2012-08-26 1
  • 2. About me  Professional database developer @ triAGENS GmbH, Cologne, Germany  Interested in all kind of databases  Currently working on an open source mixed-model nosql database © 2012 triAGENS GmbH | 2012-08-26 2
  • 3. Schemas in relational databases  Relational databases are around for a long time already  In most relational databases we can deal with these schema elements:  tables  relationships  columns / fields  views  data types  triggers  indexes  procedures / functions  Schema elements are auxiliary means to organise and validate the actual data! © 2012 triAGENS GmbH | 2012-08-26 3
  • 4. Normalisation in relational databases  Relational databases like your data to be in a highly normalized and de-duplicated form  Put simple, normalisation means  dividing big entities (tables, fields) into smaller ones  defining relationships between them  Example data, not normalised: user_id user_name group_name status 1 foo Power user Active 2 bar Rookie Inactive 3 baz Rookie Active © 2012 triAGENS GmbH | 2012-08-26 4
  • 5. Normalisation in relational databases  Example data, normalised (relationships not depicted): Table „users“ Table „groups“ Table „status“ id name group status id name id name 1 foo 1 1 1 Power user 1 Active 2 bar 2 2 2 Rookie 2 Inactive 3 baz 2 1  Normalisation helps to  reduce redundancies and storage requirements  increase consistency and integrity  Normalised data is re-arranged on retrieval using joins, projections etc.  Due to normalisation, data may be fetched from several tables © 2012 triAGENS GmbH | 2012-08-26 5
  • 6. Issues with relational databases  Issues of relational databases:  scaling is hard: This is due to the ACID (atomicity, consistency, isolation, durability) guarantees they usually provide.  rigid database schemas can obstruct rapid application development: Initial and ongoing database schema development and evolution takes time, but you must do it in order to have your database accept any data at all, designing good schemas is hard  fixed database schemas block deployment of changes: When deploying application changes, the database schema needs to stay in sync with the application. Deploying application changes may even cause downtime due to long-running database migrations (e.g. ALTER TABLE statements) © 2012 triAGENS GmbH | 2012-08-26 6
  • 7. Issues with relational databases  Some more issues:  static schemas do not play well with dynamically typed languages: many languages support dynamic keys for their container types, dynamic variables, run-time typing etc. To use this with a relational database, one needs workarounds (e.g. entity-attribute value modelling, auxilliary tables) that increase complexity and make it slow  saving multi-valued data is „forbidden“: Mutli-value containers (e.g. hashes, dictionaries, lists, arrays) are used frequently in programming but must not be saved in the database as such Complex workarounds are necessary to handle that  saving objects can be expensive: saving one object in a relational way may create a lot of noise in relational databases (though this may be obscured by frameworks) © 2012 triAGENS GmbH | 2012-08-26 7
  • 8. NoSQL databases – why?  In the past 5 years, many new database products have emerged  The common term for these new products is „NoSQL“ databases, mainly interpreted as „not only SQL“  „non-relational“ is also a good name for them because almost no NoSQL database follows the relational model  Instead, most NoSQL databases overcome specific issues that relational databases have, especially  scaling  schema rigidity © 2012 triAGENS GmbH | 2012-08-26 8
  • 9. NoSQL databases – scaling  Scaling problems of relational databases are addressed in most NoSQL databases by reducing ACID guarantees and removing relational features:  Atomicity is guaranteed only for some operations, not all  Isolation is only guaranteed for some operations  Strict consistency is relaxed to eventual consistency  Strict durability is normally turned off (but is configurable by admin)  Don't offer referential integrity or do not enforce it  Complex data operations as joins etc. are highly restricted or completely disallowed  These measures help to reduce locking and synchronisation overhead and make scaling possible at all © 2012 triAGENS GmbH | 2012-08-26 9
  • 10. NoSQL databases – schema rigidity  Most NoSQL databases lift the requirement of having to define explicit database schemas:  Some NoSQL databases don't use schemas at all, they are fully schema-free  Some require the user to only predefine highest level schema elements (e.g. databases, collections, column families) upfront (and some allow it as you go)  Some use schemas implicitly by analysing incoming data, using dynamic schemas for each data element inserted instead of predefined schemas  Benefits:  Less time spent on upfront schema development and ongoing schema evolution, more time can be used for actual application development  Can store and retrieve programming language objects more easily  No need to sync application changes deployment with database schema changes, no database downtime required for schema modifications © 2012 triAGENS GmbH | 2012-08-26 10
  • 11. NoSQL databases – overview  There is a multitude of different NoSQL databases around  http://nosql-databases.org/ currently lists 122 NoSQL databases  Some of the most popular ones:  MongoDB  Riak  CouchDB  Redis  RavenDB  Tokyo Cabinet  Hadoop  Voldemort  Hypertable  Neo4j  Cassandra  OrientDB  SimpleDB  InfiniteGraph  They are all different, but a basic categorisation follows © 2012 triAGENS GmbH | 2012-08-26 11
  • 12. NoSQL databases – categorisation  Key-value stores: key value  Map value data to unique string keys (identifiers) obj3 „{ a: 1, b: 1 }“  Allow access to data by key only obj1 „{ a: 2, b: 1 }“  Treat data as opaque (data has no schema) obj2 „{ a: 1, b: 2 }“  Do not save keys in predicatable order numUsers „25“  Sometimes disallow key enumeration because it is expensive userIds „1, 2, 3, 4“  May have one or multiple key spaces to logically similar group keys  Can implement scaling and partitioning easily due to simplistic data model © 2012 triAGENS GmbH | 2012-08-26 12
  • 13. NoSQL databases – categorisation  Key-value store extensions (containers, counters, expiration):  may allow querying on value data, too, by offering secondary indexes. secondary index values must be provided by application because store still treats data as opaque (data has no schema)  may offer atomic container operations, counters (e.g. inc / dec) or value expiration features that allow broader queries and use cases key value index on „a“ atomic operations obj3 „{ a: 1, b: 1 }“ 1 obj1 „{ a: 2, b: 1 }“ 1 obj2 „{ a: 1, b: 2 }“ 2 numUsers „25“ increment userIds „1, 2, 3, 4“ add 5 to set © 2012 triAGENS GmbH | 2012-08-26 13
  • 14. NoSQL databases – categorisation  Ordered key-value stores:  Co-locate values with adjacent keys so keys are stored sorted  Also allow range queries on keys  Still treat value data as opaque (data has no schema) key value prj1_cnt „25“ prj1_user1 „{ a: 2, b: 1 }“ prj1_user2 „{ a: 1, b: 2 }“ prj2_cnt „42“ prj2_user1 „{ a: 5, b: 1 }“ prj2_user2 „{ a: 1, b: 3 }“ © 2012 triAGENS GmbH | 2012-08-26 14
  • 15. NoSQL databases – categorisation  Wide column stores:  Save values in an n-level map-of-maps substructure, e.g. column families, supercolumns and columns with value & timestamp  Allow querying specific attributes for keys (at least at the top levels) key columns user1 name: „foo“ city: „bar“ user2 name: „bar“ city: „baz“ work: „foo“ user3 name: „baz“ key supercolumns columns user1 address city: „bar“ street: „foo“ name first: „foo“ last: „bar“ © 2012 triAGENS GmbH | 2012-08-26 15
  • 16. NoSQL databases – categorisation  Document stores:  Normally based on key-value stores (each document still has a unique key)  Allow to save documents with logical similarity in „databases“ or „collections“  Treat data records as attribute-structured documents (data is no more opaque)  Use inherent schemas of documents to categorise attributes and sub-attributes  Introduce data types (primitive types, compound types)  Allow arbitrary nesting and document complexity using lists, arrays etc.  Allow individual documents to have different attributes, even if in same collection  Often allow querying and indexing document attributes © 2012 triAGENS GmbH | 2012-08-26 16
  • 17. NoSQL databases – categorisation  Example documents in a document store: key attributes and sub-attributes user1 address city: „bar“ street: „foo“ name: „foo“ birthday month: 12 day: 1 year: 1983 likes „skiing“ „fishing“ avgScore: 3.5 user2 address city: „baz“ street: „foobar“ name: „bar“ user42 banana: true © 2012 triAGENS GmbH | 2012-08-26 17
  • 18. NoSQL databases – categorisation  Graph databases:  Often based on key-value or document stores  Additionally save relationships (edges) between documents (vertices)  Allow querying relationships between documents  Relationships can have attributes as well user1 name: „foo“ user2 name: „bar“ age: 32 age: 35 rated value: 3 rated value: 5 © 2012 triAGENS GmbH | 2012-08-26 18
  • 19. Caveat emptor  Things you should be aware of when using a NoSQL database:  multi-query, multi-document or multi-key „transactions“ may not be atomic and not be isolated  the application may need to handle inconsistent / stale reads  the application may need to resolve version conflicts or handle multiple schema versions of documents  referential integrity needs to be enforced by the application  if the datastore treats data as opaque, it will not at all validate it  by design, the database may not be able to answer some classes of queries, e.g. joins need to be performed by the application, not by the database  Put simple: database will do less, application must do more! © 2012 triAGENS GmbH | 2012-08-26 19
  • 20. Which datastore to use?  Relational databases are still a good fit for many problems, but not for all  The same is true for NoSQL databases  Each category of NoSQL databases and each individual product has its own sweet spots (and downsides)  Check what you want to achieve first, then pick the right tool(s) for the job  It may make sense to use different databases in parallel © 2012 triAGENS GmbH | 2012-08-26 20
  • 21. Why care about data modelling in NoSQL?  Even in NoSQL databases, your data will have some structure  This effective data model determines  the consistency level you're able to achieve  the retrieval and update performance  the storage space required for saving the data  the types of queries you will be able to run on the data  So even if you don't define explicit schemas, you still need to care about structuring your data properly © 2012 triAGENS GmbH | 2012-08-26 21
  • 22. Some NoSQL data modelling considerations  Data modelling for schema-less or dynamic schema datastores is different than modelling for relational databases  Different NoSQL databases require different modelling  The techniques to apply are not revolutionary, but in the relational world they are unnecessary or even anti-patterns  A few things to look at:  Proper key and attribute naming  Building „indexes“ by hand for aggregation  Data denormalisation and duplication © 2012 triAGENS GmbH | 2012-08-26 22
  • 23. Picking „good“ keys – key length  In plain key-value stores, keys are the only thing that you can query  Keys are also present in other categories of NoSQL databases. Data elements are still identified using unique keys  Keys should be picked so that they are sensible and legible  Keys are normally stored just once, but...  ...in some NoSQL databases, key values may be duplicated a lot of times in secondary indexes, append-only log structures etc.  In these situations, keys should also be as short as can be tolerated to not waste disk space / RAM © 2012 triAGENS GmbH | 2012-08-26 23
  • 24. Key length – CouchDB example  Case: insert 1000 documents with unique keys into vanilla CouchDB  Key prefix is „very-long-id“  For each document, we also store an attribute named „very-long-value“  Setup: for i in `seq 1000 1999` do curl -X POST --header "Content-Type: application/json" --data "{ "_id": "very-long-id-$i", "very-long-value": "$i" }" http://127.0.0.1:5984/keytest done © 2012 triAGENS GmbH | 2012-08-26 24
  • 25. Key length – CouchDB example  Now check how often the key prefix occurs in the datafile: strings /db/keytest.couch | grep -c "very-long-id" strings /db/keytest.couch | grep -c "very-long-value"  Results:  the non-key attribute name „very-long-value“ is saved 1,000 times  the key prefix is stored 42,492 times instead of the expected 1,000 times !!  This is due to MVCC that organises data in copy-on-write B+ trees © 2012 triAGENS GmbH | 2012-08-26 25
  • 26. Picking „good“ keys – hierarchical keys  If in a key-value store only one global keyspace is available, use „hierarchical keys“ to distinguish keys from multiple applications or multiple users. This helps preventing unintended key collisions. Avoid these keys: Instead, use such keys:  user1  app1_user1  profile42  client6_profile42  item23  app1_client6_item23  If your datastore allows efficient key enumeration with prefixes, this also allows hierarchical operations by prefix, e.g. DELETE client6_*, GET app1_client6_item*  Don't need this in document stores due to collections © 2012 triAGENS GmbH | 2012-08-26 26
  • 27. Attribute names in value data  The value data you save is often a combination of attribute names and values, e.g. { „age“: 35, „gender“: „m“, „name“: „foo“ }  In a fully schema-free datastore (e.g. plain key-value store), the store treats data as opaque and cannot tell attribute names and values in the value data apart  In document stores, data is no more opaque. However, there are no fixed schemas. Each document, even if in the same collection, could have different attributes  This means non-key attribute names are saved redundantly for EACH record/document  Result: using attribute name „f“ will normally result in much lower disk footprint than using attribute name „firstName“.  Check these MongoDB-specific links: © 2012 triAGENS GmbH | 2012-08-26 27  http://christophermaier.name/blog/2011/05/22/MongoDB-key-names
  • 28. Aggregation – home-made indexes  NoSQL stores normally don't support aggregation queries, but you can workaround that by creating „index“ keys on your own. Initial data: key value user1 „{ state: „CA“, age: 35 }“ user2 „{ state: „CA“, age: 26 }“ user3 „{ state: „FL“, age: 21 }“ user42 „{ state: „FL“, age: 39 }“  From that you can derive „index“ keys that provide users by state. You can now run count queries easily: users_CA „user1, user2“ users_FL „user3, user42“ © 2012 triAGENS GmbH | 2012-08-26 28
  • 29. Aggregation – home-made indexes  „Index“ keys that provides users by age: key value users_39 „user42“ users_35 „user1“ users_21 „user3“ users_26 „user2“  Can also directly aggregate categories in „index“ keys if there are (too) many distinct category values: users_20_29 „user3, user2“ users_30_39 „user1, user42“ (I know, I should have used better key names) © 2012 triAGENS GmbH | 2012-08-26 29
  • 30. Aggregation – home-made indexes  Home-made „indexes“ provide about the same functionality as materialised views do in relational databases  And as in materialised views, this means data duplication!  The database won't maintain the „indexes“ for you, so you have to maintain them yourself (from out of the application)  Keeping the „indexes“ up-to-date may require data updates in several places  Can easily get ouf of sync or provide an inconsistent view of the data  Accuracy may still be good enough, but this depends on the use case © 2012 triAGENS GmbH | 2012-08-26 30
  • 31. Aggregation – hierarchical keys  If your datastore allows efficient key range enumeration using leftmost prefix queries, using hierarchical „index“ keys can also be used for multi-dimensional aggregation key query Americas_CA_Toronto Americas_US_New York Asia_CN_Beijing Query: GET Asia_* Asia_TW_Taipeh key Europe_DE_Berlin Query: GET Europe_* Europe_DE_Cologne Europe_FR_Paris Query: GET Europe_FR_* Europe_FR_Toulose © 2012 triAGENS GmbH | 2012-08-26 31
  • 32. Aggregation – key buckets  If the datastore does not allow querying attribute values, buckets with predictable key names can be used to support aggegation  Example case: users online on a website during the last x minutes  Key name convention: „usersOnline_${slot}“, with ${slot} being slot = (time() / 60).floor() * 60 (the current Unix timestamp rounded to a 60 second slot)  This will produce predictable key names such as  usersOnline_1345732920  usersOnline_1345732980  UsersOnline_1345733040  You can now easily save and query which users were online in which time slot and create reports © 2012 triAGENS GmbH | 2012-08-26 32
  • 33. Denormalisation – nesting sub elements  It may be beneficial to all nest sub elements of a logical unit in the unit itself instead of saving the sub elements separately  Document stores are built for exactly this case (nesting attributes), but this pattern can be used in key-value stores as well  Instead of saving all sub elements individually key value user1_name ... user1_address ... user1_likes ... Can store all sub elements of a user in one key / document only: user1 { name: ..., address: ..., likes: .... } © 2012 triAGENS GmbH | 2012-08-26 33
  • 34. Denormalisation – nesting sub elements  Benefit: nesting sub elements reduces the number of elements that keep data for a logical unit (here: „user“)  This reduces the number of read / write operations for each unit  Accessing data of a single key is normally an atomic operation, so nesting prevents sub elements from getting out sync, avoiding inconsistent reads and writes of logical units  In NoSQL databases, nesting is also a natural way to avoid joins, which are not offered by most NoSQL products © 2012 triAGENS GmbH | 2012-08-26 34
  • 35. Denormalisation – nesting sub elements  Downside: nesting will increase the data volume stored for each key, increasing the overhead for each read and write in both the datastore and the application  If you overdo nesting, you might end up having the same data stored in multiple places, and no clear responsibility for it  This will waste space and cause consistency issues  For example, a change to user2's „likes“ attribute requires 3 writes: key value user1 { likes: [ ... ], friends: [ { user2: { likes: [ ... ] } }, user15: { likes: [ ... ] } }, ... ] } user2 { likes: [ ... ], friends: [ { user67: { likes: [ ... ] } }, user1: { likes: [ ... ] } }, ... ] } user15 { likes: [ ... ], friends: [ { user2: { likes: [ ... ] } }, user99: { likes: [ ... ] } }, ... ] } © 2012 triAGENS GmbH | 2012-08-26 35
  • 36. Denormalisation – nesting sub elements  Instead of nesting such elements fully, prefer referencing them by key / id: key value user1 { likes: [ ... ], friends: [ user2, user15 ] } user2 { likes: [ ... ], friends: [ user67, user1 ] } user15 { likes: [ ... ], friends: [ user2, user99 ] }  This will allow you to update data in only one place, saving storage space and improving consistency  The price you pay: to retrieve all details for the referenced items, you now need additional read operations © 2012 triAGENS GmbH | 2012-08-26 36
  • 37. Summary  Data modelling for NoSQL databases is different than for relational databases:  Instead of normalising your schema, you'd rather denormalise your data  Data duplication is common to support analysis and aggregation queries, and for fast retrieval  Different NoSQL databases have different features and to be used effectively, may require different data models  To make best use of a specific store, you should have some knowledge about how it organises data internally and what guarantees it provides:  Atomic operations should be exploited where possible to improve consistency  Cross-record/document consistency is mostly not guaranteed. If your store at all offers it, it will be slowed down by it © 2012 triAGENS GmbH | 2012-08-26 37  In general logic is moved from the database to the application
  • 38. Misc  NoSQL user group Cologne: every 1st Wednesday / month http://www.nosql-cologne.org/  NoSQL matters conference Barcelona: October 6th, 2012 http://www.nosql-matters.org/  ArangoDB 1.0 release: September 2012 http://www.arangodb.org/ © 2012 triAGENS GmbH | 2012-08-26 38