SlideShare a Scribd company logo
1 of 76
Modern Database
   Systems
@spf13

                  AKA
Steve Francia




Chief Evangelist @
responsible for drivers,
integrations, web & docs
What’s the Point?
๏   Goal: Discover & identify ideal
    storage solution for our needs
๏   History is important
๏   Many options today
๏   Document databases are good
    for Genealogy
History of the
    World
Over 5500 years ago




     2 People
1804
1 Billion People
1927
2 Billion People
World Population Growth
World Population Growth
       (last ~200 years in Billions)
                                                 8



                                                 6



                                                 4
                                           7
                                    6
                             5
                      4                          2
               3
        2
 1
1804   1927                                      0
              1960   1974   1987   1999   2012
Really Big Data
In the last 50 years...

over 4 % of the world people
were born...

in less than 1 % of the time
History of
Databases
1970

๏ Oracle
       creates the relational
 database
๏ Everyone happily uses it for
 the next 43 years
What really
 happened
Let’s start at
the beginning
It’s a story about...

Storing & Retrieving
    Information
Even today we still use
the same mediums for
     data storage
With the advent of
the computer things
   really took off
1960 : DBMS Emerges
๏   Ordered set of fixed length fields
๏   Low level pointer operations (flat
    files)
๏   Most popular was IMS (created at
    IBM)
๏   Shockingly still in use today at IBM &
    American Airlines
Lots of Problems
๏   Complex and inflexible
๏   User had to know physical structure of the
    DB in order to query for information
๏   Adding a field to the DB required rewriting
    the underlying access/modification scheme
๏   Records isolated (no relations)
๏   Emphasis on records to be processed, not
    overall structure
1970 : Relational DB
๏   Edgar Frank “Ted” Codd
๏   Relational Database
    theory
๏   Codd’s 13 rules
    (aka 12 rules)
3 HUGE Advantages
๏   Data independence from hardware
    and storage implementation
๏   Ability to process more than one
    record at a time with a single
    operation
๏   Establishing a relationship
    between records
IBM vs Codd
๏ IBM   bet on IMS
๏ Codd   bets on relational DB
๏ Eventually
           2 relational
 prototypes emerge
Ingres

๏ Built   at UC Berkley
๏ Uses    QUEL
๏ Inspires   Sybase & MSSQL
System R
๏   Built at IBM
๏   Leads to SEQUEL... later SQL
๏   Evolved into SQL/DS which
    evolved into DB2
๏   Project concludes that relational
    model is viable
Oracle
๏   Larry Ellison watches IBM
๏   Starts Relational Software Inc.
๏   Oracle 1st commercial RDBMS
    released in 1979
๏   Beats IBM by 2 years to market
Entity Relationship
๏   Proposed by Peter
    Chen in 1976
๏   Focuses on data use
    and not logical table
    structure
1980s
๏ RDBMS    dominates
๏ Some fields (medicine,
 physics, multimedia) need
 more than RDBMS offers
๏ Object   Databases emerge
Object Databases
๏   Inspired by Entity Relationship
๏   More flexible than relational permits
๏   Tightly coupled with OO
    programming language (c++, later
    Java)
๏   Full object: data & methods stored
1990s
๏ Internet   emerges
๏ Data   demand spikes
๏ Databases used for
 archiving historical data
Early 2000s
๏ Internet   booms
๏ RDBMS   fails to scale
๏ Indesperation we take a
 step backwards
MemcacheD
๏1   dimensional
๏ No   persistence
๏ No   ACI or D
๏ but...
... FAST
2005 ish
๏   Relational + MemcacheD
    broken (and we didn’t know it)
๏   Scale redefined with high
    volume & social
๏   Infrastructure reinvented with
    cloud computing & SSDs
Alternatives Emerge

๏ Dynamo   / Key Value
๏ Document

๏ Graph
Modern Data
  Storage
A lot going on
Easiest to define databases in
broad terms
• What is a record?
 (data model)
• CAP : CA, AP, CP ?
 (infrastructure model)
Data Storage Structure
 1D           2D                            nD

Key     Key        Value   Key      Value(s)
        Key        Value   Key      Value(s)
Value   Key        Value   Key
        Key        Value     Key         Value
                             Key         Value(s)
                             Key
                                   Key     Value
                                   Key     Value(s)
Database structure
   1D         2D             nD



Key Value
            Relational   Document
Dynamo
 Graph
CAP Theorem
               Availability




Partitioning                  Consistency
CAP Theorem

xx
Node         Node




       App
CAP Theorem
               Availability


   Dynamo
                                          RDBMS
                    t
Key Value
                ten


                             Int
                                 o
              sis




                                  ler
NoSQLs
               on




                                   ant
            Inc




                    Unavailable
Partition                                Consistency
Tolerant            MongoDB
                    BigTable
Key Value
๏                       ๏   Often
    1 Dimensional
    storage (tupal)         MultiMaster...
๏
                            meaning
    Query key only          availability over
๏   Bucket index            consistency
    (range) on keys     ๏   Partitioning easy
๏   Records cannot be       thanks to single
    updated, only           value
    replaced

Cassandra, Redis, MemcacheD, Riak, DynamoDB
Relational
                    ๏   Single master
๏   2 Dimensional
    storage (map)       meaning
                        consistency >
๏   Query any           availability
    field           ๏   Partitioning hard
๏                       due to
    BTree Indexes       transactions &
                        joins

Oracle, MSSQL, MySQL, PostgreSQL, DB2
Document
๏                     ๏   Single master
    n Dimensional
    storage (hash         meaning
    w/ nesting)           consistency >
                          availability
๏   Query any field
                      ๏   Partitioning easy
    at any level
                          thanks to richer
๏   BTree Indexes         data model

MongoDB, CouchDB, RethinkDB
Graph
 ๏   1 Dimensional storage... but grouped to appear
     2D
 ๏   Differentiated by indexes
 ๏   Large indexes cover many relationships
 ๏   Query time depends on # records returned,
     not distance to get them
 ๏   Doesn’t require traversing to determine
     relationship

Neo4j, about 20 more... nobody talks much about
MongoDB for
 Genealogy
Right Data
  Model
Types of
              genealogy data
๏
    Events                ๏
                              Photographs
    (birth, death, etc)
                          ๏
๏                             Diaries & letters
    Official records
                          ๏
๏                             Ship passenger list
    Census
                          ๏
๏                             Occupation
    Names
                          ๏
๏                             and more
    Relationships
Challenges of
             genealogy data
๏
    Lots of possible data points... need flexible
    schema
๏
    Multiple versions of same data point
    (3 different dates for death date, 4 variations on
    name).
๏
    Lots of data associated with physical records
๏
    Multiple versions of same nodes
    (intelligent nondestructive merge needed)
๏
    Need to have meta data associated
Individual                               User
                           Events[]      • Name
• AFN                 • type             • Email Address
• Modification Date   • date             • Password
                      • contributor[]    • Individual_id
                      • record[]
     Name
• First[]
• Middle[]                  Location
• Last[]               • city
                       • state
                       • county
                                         Record
                                         • contributor
                       • country         • type
                       • coordinates[]   • thumbnail
                                         • content
                                         • description
                                         • tags[]
Individual
individual = {
   _id : ObjectId("4f2978dfaa999d9db02618ce"),
   AFN : '1XYK-KQJ',
   name: {
      first: ['john', 'johannes'],
      middle: 'peter',
      last: ['smith', 'sandvik']
    }
}


db.individual.find(
{name.first : ‘john’, name.middle : ‘peter’})
Individual.Events
events : [
    death : {
       date : ISODate('1989-07-14'),
       location : {
           city: 'pensacola',
           state: 'fl',
           county: 'escambia',
           country: 'usa'
           coordinates : [30.26,87.12]},
       contributor : ObjectId("4eeac...691")}]

db.individual.find(
{events.death.date : ISODate(‘1989-07-14’)})

db.individual.find(
{events.death.location : { $near:[30,90]}})
Event Versions
events : [
   birth : [ {
        date : ISODate('1928-04-06'),
        location : {
           city: 'brattleboro',
           state: 'vt',
           county: 'windham',
           country: 'usa'
           coordinates : [42.51,72.34]},
        contributor : ObjectId("4ee...00000"),
        records: ObjectId("4ed8a...7b000000")
   },
   {
        date : ISODate('1928-04-16'),
        location : {
           city: 'brattleboro',
           state: 'vt',
           county: 'windham',
           country: 'usa'
           coordinates : [42.51,72.34]},
        contributor : ObjectId("4ee...37bb"),
        records: ObjectId("4eea...0000c8"),
    }],
}
Query with Versioned Events
events : [
   birth : [
      { date : ISODate('1928-04-06')},
      { date : ISODate('1928-04-16')}
   ],
]




db.individual.find(
{events.birth.date : ISODate(‘1928-04-16’)})
Records
record1 = {
    _id : ObjectId("4ed8aea7d8562f7d7b")
    contributor : ObjectId("4eeab...1537bb"),
    type : 'birth certificate',
    thumbnail : BinData(0,"/9j/4AAQSkZJ...."),
    content : BinData(0,"j6b/Id11lWqs..."),
    tags : ['NY', 'certified'],
    description : "John's birth certificate"
}
Right Scale
MongoDB: Scale built in
๏   Intelligent replication
๏   Automatic partitioning of data
    (user configurable)
๏   Horizontal Scale
๏   Targeted Queries
๏   Parallel Processing
Intelligent Replication

   Node 1                          Node 2
   Secondary                       Secondary
                    Heartbeat
       Re




                                    on
          p




                                      i
                                  cat
         lic
            ati




                                  pli
               on




                                Re
                    Node 3
                     Primary
Scalable Architecture
                App Server   App Server   App Server




                 Mongos       Mongos       Mongos
     Config
    Node 1
     Server
    Secondary


     Config
    Node 1
     Server
    Secondary


     Config
    Node 1
     Server
    Secondary


                 Shard        Shard        Shard
x
High Availability in Shards

     Shard         Shard

                    Primary


     Mongod
              or
                   Secondary


                   Secondary
Targeted Requests
                 1
                     4

                 Mongos


         2

             3


     Shard       Shard    Shard
Parallel processing
               1
                        6

               Mongos 5


           2    2           2

           4        4       4


      Shard    Shard        Shard

       3           3            3
Right Feature
     Set
Broad Feature Set
๏   Rich query language
๏   Native support for over 12 languages
๏   GeoSpatial
๏   Text search
๏   Aggregation & MapReduce
๏   GridFS
    (distributed & replicated file storage)
๏   Integration with Hadoop, Solr & more
Last Year I
presented
on Graph in
MongoDB



      http://j.mp/XvJ3dl
FamilySearch
presented in
December
2012




      http://j.mp/X03TXp
http://j.mp/X03TXp
http://j.mp/X03TXp
http://j.mp/X03TXp
http://spf13.com
            http://github.com/spf13
            @spf13



Questions?
download at mongodb.org

More Related Content

What's hot

Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB James Serra
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsSpringPeople
 
Zed attack proxy [ What is ZAP(Zed Attack Proxy)? ]
Zed attack proxy [ What is ZAP(Zed Attack Proxy)? ]Zed attack proxy [ What is ZAP(Zed Attack Proxy)? ]
Zed attack proxy [ What is ZAP(Zed Attack Proxy)? ]raj upadhyay
 
백기선의 스프링 부트
백기선의 스프링 부트백기선의 스프링 부트
백기선의 스프링 부트Keesun Baik
 
Office 365 Incident Response 2019 B-Sides Orlando
Office 365 Incident Response 2019 B-Sides OrlandoOffice 365 Incident Response 2019 B-Sides Orlando
Office 365 Incident Response 2019 B-Sides OrlandoAlex Parsons
 
A Day In The Life Of A DBA Manager
A Day In The Life Of A DBA ManagerA Day In The Life Of A DBA Manager
A Day In The Life Of A DBA ManagerMahesh Vallampati
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBayMongoDB
 
Go로 새 프로젝트 시작하기
Go로 새 프로젝트 시작하기Go로 새 프로젝트 시작하기
Go로 새 프로젝트 시작하기Joonsung Lee
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon Web Services Korea
 
Aurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
Aurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 GamingAurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
Aurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 GamingAmazon Web Services Korea
 
Use Amazon Comprehend and Amazon SageMaker to Gain Insight from Text
Use Amazon Comprehend and Amazon SageMaker to Gain Insight from TextUse Amazon Comprehend and Amazon SageMaker to Gain Insight from Text
Use Amazon Comprehend and Amazon SageMaker to Gain Insight from TextAmazon Web Services
 
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018Amazon Web Services
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018Amazon Web Services
 
MongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB
 
Amazon SageMaker 오버뷰 - 강성문, AWS AI/ML 스페셜리스트 :: AIML 특집 웨비나
Amazon SageMaker 오버뷰 - 강성문, AWS AI/ML 스페셜리스트 :: AIML 특집 웨비나Amazon SageMaker 오버뷰 - 강성문, AWS AI/ML 스페셜리스트 :: AIML 특집 웨비나
Amazon SageMaker 오버뷰 - 강성문, AWS AI/ML 스페셜리스트 :: AIML 특집 웨비나Amazon Web Services Korea
 
Retail referencearchitecture productcatalog
Retail referencearchitecture productcatalogRetail referencearchitecture productcatalog
Retail referencearchitecture productcatalogMongoDB
 
Why the Brick Schema is a Game Changer for Smart Buildings?
Why the Brick Schema is a Game Changer for Smart Buildings?Why the Brick Schema is a Game Changer for Smart Buildings?
Why the Brick Schema is a Game Changer for Smart Buildings?Memoori
 
Integração do Zabbix com Grafana
Integração do Zabbix com GrafanaIntegração do Zabbix com Grafana
Integração do Zabbix com GrafanaAécio Pires
 
Ultimate Free SQL Server Toolkit
Ultimate Free SQL Server ToolkitUltimate Free SQL Server Toolkit
Ultimate Free SQL Server ToolkitKevin Kline
 

What's hot (20)

Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
 
Zed attack proxy [ What is ZAP(Zed Attack Proxy)? ]
Zed attack proxy [ What is ZAP(Zed Attack Proxy)? ]Zed attack proxy [ What is ZAP(Zed Attack Proxy)? ]
Zed attack proxy [ What is ZAP(Zed Attack Proxy)? ]
 
백기선의 스프링 부트
백기선의 스프링 부트백기선의 스프링 부트
백기선의 스프링 부트
 
Office 365 Incident Response 2019 B-Sides Orlando
Office 365 Incident Response 2019 B-Sides OrlandoOffice 365 Incident Response 2019 B-Sides Orlando
Office 365 Incident Response 2019 B-Sides Orlando
 
A Day In The Life Of A DBA Manager
A Day In The Life Of A DBA ManagerA Day In The Life Of A DBA Manager
A Day In The Life Of A DBA Manager
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 
Go로 새 프로젝트 시작하기
Go로 새 프로젝트 시작하기Go로 새 프로젝트 시작하기
Go로 새 프로젝트 시작하기
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
 
Aurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
Aurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 GamingAurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
Aurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
 
Use Amazon Comprehend and Amazon SageMaker to Gain Insight from Text
Use Amazon Comprehend and Amazon SageMaker to Gain Insight from TextUse Amazon Comprehend and Amazon SageMaker to Gain Insight from Text
Use Amazon Comprehend and Amazon SageMaker to Gain Insight from Text
 
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
Building Serverless ETL Pipelines with AWS Glue - AWS Summit Sydney 2018
 
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
What's New with Amazon Redshift ft. McDonald's (ANT350-R1) - AWS re:Invent 2018
 
Msbi Architecture
Msbi ArchitectureMsbi Architecture
Msbi Architecture
 
MongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB World 2019: The Sights (and Smells) of a Bad Query
MongoDB World 2019: The Sights (and Smells) of a Bad Query
 
Amazon SageMaker 오버뷰 - 강성문, AWS AI/ML 스페셜리스트 :: AIML 특집 웨비나
Amazon SageMaker 오버뷰 - 강성문, AWS AI/ML 스페셜리스트 :: AIML 특집 웨비나Amazon SageMaker 오버뷰 - 강성문, AWS AI/ML 스페셜리스트 :: AIML 특집 웨비나
Amazon SageMaker 오버뷰 - 강성문, AWS AI/ML 스페셜리스트 :: AIML 특집 웨비나
 
Retail referencearchitecture productcatalog
Retail referencearchitecture productcatalogRetail referencearchitecture productcatalog
Retail referencearchitecture productcatalog
 
Why the Brick Schema is a Game Changer for Smart Buildings?
Why the Brick Schema is a Game Changer for Smart Buildings?Why the Brick Schema is a Game Changer for Smart Buildings?
Why the Brick Schema is a Game Changer for Smart Buildings?
 
Integração do Zabbix com Grafana
Integração do Zabbix com GrafanaIntegração do Zabbix com Grafana
Integração do Zabbix com Grafana
 
Ultimate Free SQL Server Toolkit
Ultimate Free SQL Server ToolkitUltimate Free SQL Server Toolkit
Ultimate Free SQL Server Toolkit
 

Similar to Modern Database Systems (for Genealogy)

Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLYan Cui
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBWilliam LaForest
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBjhugg
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social WebBogdan Gaza
 
Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterestMohit Jain
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Saltmarch Media
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 OverviewDavid Chou
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistTony Rogerson
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
 
SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.Denis Reznik
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to CassandraJon Haddad
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
 
Back to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLBack to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLJoe Drumgoole
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 

Similar to Modern Database Systems (for Genealogy) (20)

Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
Nosql
NosqlNosql
Nosql
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterest
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 Overview
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/Specialist
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
 
SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
iForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQLiForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQL
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Back to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLBack to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQL
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 

More from Steven Francia

State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017Steven Francia
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in GoSteven Francia
 
The Future of the Operating System - Keynote LinuxCon 2015
The Future of the Operating System -  Keynote LinuxCon 2015The Future of the Operating System -  Keynote LinuxCon 2015
The Future of the Operating System - Keynote LinuxCon 2015Steven Francia
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)Steven Francia
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needsSteven Francia
 
7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid themSteven Francia
 
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Steven Francia
 
Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Steven Francia
 
Getting Started with Go
Getting Started with GoGetting Started with Go
Getting Started with GoSteven Francia
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Steven Francia
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopSteven Francia
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012Steven Francia
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of usSteven Francia
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoverySteven Francia
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center StrategiesSteven Francia
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataSteven Francia
 

More from Steven Francia (20)

State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in Go
 
The Future of the Operating System - Keynote LinuxCon 2015
The Future of the Operating System -  Keynote LinuxCon 2015The Future of the Operating System -  Keynote LinuxCon 2015
The Future of the Operating System - Keynote LinuxCon 2015
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needs
 
7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them
 
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
 
Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go
 
Getting Started with Go
Getting Started with GoGetting Started with Go
Getting Started with Go
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
Future of data
Future of dataFuture of data
Future of data
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of us
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center Strategies
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous Data
 
MongoDB and hadoop
MongoDB and hadoopMongoDB and hadoop
MongoDB and hadoop
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Modern Database Systems (for Genealogy)

  • 1. Modern Database Systems
  • 2. @spf13 AKA Steve Francia Chief Evangelist @ responsible for drivers, integrations, web & docs
  • 3. What’s the Point? ๏ Goal: Discover & identify ideal storage solution for our needs ๏ History is important ๏ Many options today ๏ Document databases are good for Genealogy
  • 5. Over 5500 years ago 2 People
  • 9. World Population Growth (last ~200 years in Billions) 8 6 4 7 6 5 4 2 3 2 1 1804 1927 0 1960 1974 1987 1999 2012
  • 10. Really Big Data In the last 50 years... over 4 % of the world people were born... in less than 1 % of the time
  • 12. 1970 ๏ Oracle creates the relational database ๏ Everyone happily uses it for the next 43 years
  • 14. Let’s start at the beginning
  • 15. It’s a story about... Storing & Retrieving Information
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Even today we still use the same mediums for data storage
  • 21.
  • 22.
  • 23. With the advent of the computer things really took off
  • 24. 1960 : DBMS Emerges ๏ Ordered set of fixed length fields ๏ Low level pointer operations (flat files) ๏ Most popular was IMS (created at IBM) ๏ Shockingly still in use today at IBM & American Airlines
  • 25. Lots of Problems ๏ Complex and inflexible ๏ User had to know physical structure of the DB in order to query for information ๏ Adding a field to the DB required rewriting the underlying access/modification scheme ๏ Records isolated (no relations) ๏ Emphasis on records to be processed, not overall structure
  • 26. 1970 : Relational DB ๏ Edgar Frank “Ted” Codd ๏ Relational Database theory ๏ Codd’s 13 rules (aka 12 rules)
  • 27. 3 HUGE Advantages ๏ Data independence from hardware and storage implementation ๏ Ability to process more than one record at a time with a single operation ๏ Establishing a relationship between records
  • 28. IBM vs Codd ๏ IBM bet on IMS ๏ Codd bets on relational DB ๏ Eventually 2 relational prototypes emerge
  • 29. Ingres ๏ Built at UC Berkley ๏ Uses QUEL ๏ Inspires Sybase & MSSQL
  • 30. System R ๏ Built at IBM ๏ Leads to SEQUEL... later SQL ๏ Evolved into SQL/DS which evolved into DB2 ๏ Project concludes that relational model is viable
  • 31. Oracle ๏ Larry Ellison watches IBM ๏ Starts Relational Software Inc. ๏ Oracle 1st commercial RDBMS released in 1979 ๏ Beats IBM by 2 years to market
  • 32. Entity Relationship ๏ Proposed by Peter Chen in 1976 ๏ Focuses on data use and not logical table structure
  • 33. 1980s ๏ RDBMS dominates ๏ Some fields (medicine, physics, multimedia) need more than RDBMS offers ๏ Object Databases emerge
  • 34. Object Databases ๏ Inspired by Entity Relationship ๏ More flexible than relational permits ๏ Tightly coupled with OO programming language (c++, later Java) ๏ Full object: data & methods stored
  • 35. 1990s ๏ Internet emerges ๏ Data demand spikes ๏ Databases used for archiving historical data
  • 36. Early 2000s ๏ Internet booms ๏ RDBMS fails to scale ๏ Indesperation we take a step backwards
  • 37. MemcacheD ๏1 dimensional ๏ No persistence ๏ No ACI or D ๏ but...
  • 39. 2005 ish ๏ Relational + MemcacheD broken (and we didn’t know it) ๏ Scale redefined with high volume & social ๏ Infrastructure reinvented with cloud computing & SSDs
  • 40. Alternatives Emerge ๏ Dynamo / Key Value ๏ Document ๏ Graph
  • 41. Modern Data Storage
  • 42. A lot going on Easiest to define databases in broad terms • What is a record? (data model) • CAP : CA, AP, CP ? (infrastructure model)
  • 43. Data Storage Structure 1D 2D nD Key Key Value Key Value(s) Key Value Key Value(s) Value Key Value Key Key Value Key Value Key Value(s) Key Key Value Key Value(s)
  • 44. Database structure 1D 2D nD Key Value Relational Document Dynamo Graph
  • 45. CAP Theorem Availability Partitioning Consistency
  • 47. CAP Theorem Availability Dynamo RDBMS t Key Value ten Int o sis ler NoSQLs on ant Inc Unavailable Partition Consistency Tolerant MongoDB BigTable
  • 48. Key Value ๏ ๏ Often 1 Dimensional storage (tupal) MultiMaster... ๏ meaning Query key only availability over ๏ Bucket index consistency (range) on keys ๏ Partitioning easy ๏ Records cannot be thanks to single updated, only value replaced Cassandra, Redis, MemcacheD, Riak, DynamoDB
  • 49. Relational ๏ Single master ๏ 2 Dimensional storage (map) meaning consistency > ๏ Query any availability field ๏ Partitioning hard ๏ due to BTree Indexes transactions & joins Oracle, MSSQL, MySQL, PostgreSQL, DB2
  • 50. Document ๏ ๏ Single master n Dimensional storage (hash meaning w/ nesting) consistency > availability ๏ Query any field ๏ Partitioning easy at any level thanks to richer ๏ BTree Indexes data model MongoDB, CouchDB, RethinkDB
  • 51. Graph ๏ 1 Dimensional storage... but grouped to appear 2D ๏ Differentiated by indexes ๏ Large indexes cover many relationships ๏ Query time depends on # records returned, not distance to get them ๏ Doesn’t require traversing to determine relationship Neo4j, about 20 more... nobody talks much about
  • 53. Right Data Model
  • 54. Types of genealogy data ๏ Events ๏ Photographs (birth, death, etc) ๏ ๏ Diaries & letters Official records ๏ ๏ Ship passenger list Census ๏ ๏ Occupation Names ๏ ๏ and more Relationships
  • 55. Challenges of genealogy data ๏ Lots of possible data points... need flexible schema ๏ Multiple versions of same data point (3 different dates for death date, 4 variations on name). ๏ Lots of data associated with physical records ๏ Multiple versions of same nodes (intelligent nondestructive merge needed) ๏ Need to have meta data associated
  • 56. Individual User Events[] • Name • AFN • type • Email Address • Modification Date • date • Password • contributor[] • Individual_id • record[] Name • First[] • Middle[] Location • Last[] • city • state • county Record • contributor • country • type • coordinates[] • thumbnail • content • description • tags[]
  • 57. Individual individual = { _id : ObjectId("4f2978dfaa999d9db02618ce"), AFN : '1XYK-KQJ', name: { first: ['john', 'johannes'], middle: 'peter', last: ['smith', 'sandvik'] } } db.individual.find( {name.first : ‘john’, name.middle : ‘peter’})
  • 58. Individual.Events events : [ death : { date : ISODate('1989-07-14'), location : { city: 'pensacola', state: 'fl', county: 'escambia', country: 'usa' coordinates : [30.26,87.12]}, contributor : ObjectId("4eeac...691")}] db.individual.find( {events.death.date : ISODate(‘1989-07-14’)}) db.individual.find( {events.death.location : { $near:[30,90]}})
  • 59. Event Versions events : [ birth : [ { date : ISODate('1928-04-06'), location : { city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...00000"), records: ObjectId("4ed8a...7b000000") }, { date : ISODate('1928-04-16'), location : { city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...37bb"), records: ObjectId("4eea...0000c8"), }], }
  • 60. Query with Versioned Events events : [ birth : [ { date : ISODate('1928-04-06')}, { date : ISODate('1928-04-16')} ], ] db.individual.find( {events.birth.date : ISODate(‘1928-04-16’)})
  • 61. Records record1 = { _id : ObjectId("4ed8aea7d8562f7d7b") contributor : ObjectId("4eeab...1537bb"), type : 'birth certificate', thumbnail : BinData(0,"/9j/4AAQSkZJ...."), content : BinData(0,"j6b/Id11lWqs..."), tags : ['NY', 'certified'], description : "John's birth certificate" }
  • 63. MongoDB: Scale built in ๏ Intelligent replication ๏ Automatic partitioning of data (user configurable) ๏ Horizontal Scale ๏ Targeted Queries ๏ Parallel Processing
  • 64. Intelligent Replication Node 1 Node 2 Secondary Secondary Heartbeat Re on p i cat lic ati pli on Re Node 3 Primary
  • 65. Scalable Architecture App Server App Server App Server Mongos Mongos Mongos Config Node 1 Server Secondary Config Node 1 Server Secondary Config Node 1 Server Secondary Shard Shard Shard
  • 66. x High Availability in Shards Shard Shard Primary Mongod or Secondary Secondary
  • 67. Targeted Requests 1 4 Mongos 2 3 Shard Shard Shard
  • 68. Parallel processing 1 6 Mongos 5 2 2 2 4 4 4 Shard Shard Shard 3 3 3
  • 70. Broad Feature Set ๏ Rich query language ๏ Native support for over 12 languages ๏ GeoSpatial ๏ Text search ๏ Aggregation & MapReduce ๏ GridFS (distributed & replicated file storage) ๏ Integration with Hadoop, Solr & more
  • 71. Last Year I presented on Graph in MongoDB http://j.mp/XvJ3dl
  • 76. http://spf13.com http://github.com/spf13 @spf13 Questions? download at mongodb.org