SlideShare a Scribd company logo
1 of 112
Storing
   the
 Family
Tree with
We’re going to talk about
MongoDB Intro & Fundamentals
MongoDB for Genealogy data
Scaling MongoDB for all the generations
The Family Tree
Storing a graph in MongoDB
Steve                  @sp

                     A
                      15+ years building
                      the internet
                         Father, husband,
                         skateboarder,
                         genealogist at ❤


Chief Solutions Architect @
responsible for drivers,
integrations, web & docs
Company behind MongoDB
Offices in NYC, Palo Alto, London & Dublin
100+ employees
Support, consulting, training
Mgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark Logic

Well Funded: Sequoia, Union Square, Flybridge
Introduction
     to
MongoD
A bit of
history
1974
The relational database is created
1979
1979   1994
1979   1994   1995
Computers in 1995
100 mhz Pentium
10 base T
16 MB ram
200 MB HD
Cloud in 1995
(Windows 95 cloud wallpaper)
Cell Phones in 2012
Dual core 1.5Ghz
802.11n (300+ Mbps)
1 GB ram
64 GB Solid State
MongoDB
         Application     Document
                         Oriented
    High                 { author : “steve”,
                           date : new Date(),

Performance
                           text : “About MongoDB...”,
                           tags : [“tech”, “database”]}




                           Fully
                         Consistent
 Horizontally Scalable
MongoDB philosophy
 Keep functionality when we can (key/value
 stores are great, but we need more)
 Non-relational (no joins) makes scaling
 horizontally practical
 Document data models are good
 Database technology should run anywhere
 virtualized, cloud, metal, etc
Under the hood
Written in C++
Runs nearly everywhere
Data serialized to BSON
Extensive use of memory-mapped files
i.e. read-through write-through
memory caching.
Database Landscape
Scalability & Performance


                            MemCache

                                             MongoDB



                                                  RDBMS



                               Depth of Functionality
“
MongoDB has the best
features of key/value
stores, document
databases and relational
databases in one.
         John Nunemaker
Relational made normalized
     data look like this
                      Category
                  • Name
                  • Url




                           Article
       User       • Name
                                              Tag
• Name            • Slug             • Name
• Email Address   • Publish date     • Url
                  • Text




                     Comment
                  • Comment
                  • Date
                  • Author
Document databases make
normalized data look like this
                            Article
                     • Name
                     • Slug
                     • Publish date
        User         • Text
   • Name            • Author
   • Email Address
                         Comment[]
                      • Comment
                      • Date
                      • Author

                            Tag[]
                      • Value

                         Category[]
                      • Value
But we’ve been using
a relational database
    for 40 years!
How do people store
documents in real life?
Think about a
doctors office
 There’s two ways they
could organize their files
Each document type
        in it’s own drawer
MRIs   X-rays   Lab   Invoices       Index



         1      1        1       1




         1      1        1       1




   History Medications Lab   Forms
Each document type
        in it’s own drawer
MRIs   X-rays   Lab   Invoices       Index



         1      1        1       1




         1      1        1       1




   History Medications Lab   Forms
Each document type
        in it’s own drawer
MRIs   X-rays   Lab   Invoices       Index



         1      1        1       1




         1      1        1       1




   History Medications Lab   Forms
2. Group related records


    Patient 1   Patient 2   Patient 3   ...




    Vendor 1    Vendor 2    Vendor 3
2. Group related records


    Patient 1               Patient 3   ...


            Patient 2

    Vendor 1     Vendor 2   Vendor 3
Databases work the same way
          Relation                               Docum


                                         Patient 1     Vendor 1


                                                            Article
              Category                                 • Name
            • Name                                     • Slug
            • Url                                      • Publish
                                          User             date
                                                       •   Text
                                   •   Name            •   Author
                                   •   Email Address
               Article
    User                     Tag
            • Name                                         Comment[]
• Name                   • Name
• Email     • Slug       • Url                         • Comment
  Address   • Publish
               date                                    • Date
                                                       • Author

              Comment                                        Tag[]
            • Comment                                  • Value
            • Date
            • Author
                                                        Category[]
                                                       • Value
Terminology
 RDBMS                 Mongo
Table, View   ➜   Collection
Row           ➜   Document
Index         ➜   Index
Join          ➜   Embedded
Foreign Key   ➜   Document
                  Reference
Partition     ➜   Shard
Why MongoDB
                   My Top 10 Reasons

10. Great developer experience
 9. Speaks your language
 8. Scale horizontally
 7. Fully consistent data w/atomic operations

1.It’s web scale
 6. Memory caching integrated
5. Open source
 4. Flexible, rich & structured data format not just K:V
 3. Ludicrously fast (without going plaid)
 2. Simplify infrastructure & application
Why MongoDB
                   My Top 10 Reasons

10. Great developer experience
 9. Speaks your language
 8. Scale horizontally
 7. Fully consistent data w/atomic operations

1.It’s web scale
 6. Memory caching integrated
5. Open source
 4. Flexible, rich & structured data format not just K:V
 3. Ludicrously fast (without going plaid)
 2. Simplify infrastructure & application
MongoDB
Use Cases
CMS / Blog
Needs:
• Business needed modern data store for rapid development and
  scale

Solution:
• Use PHP & MongoDB

Results:
• Real time statistics
• All data, images, etc stored together
  easy access, easy deployment, easy high availability
• No need for complex migrations
• Enabled very rapid development and growth
Photo Meta-Data
Problem:
• Business needed more flexibility than Oracle could deliver

Solution:
• Use MongoDB instead of Oracle

Results:
• Developed application in one sprint cycle
• 500% cost reduction compared to Oracle
• 900% performance improvement compared to Oracle
Customer Analytics
Problem:
• Deal with massive data volume across all customer sites

Solution:
• Use MongoDB to replace Google Analytics / Omniture options

Results:
• Less than one week to build prototype and prove business case
• Rapid deployment of new features
Archiving
Why MongoDB:
• Existing application built on MySQL
• Lots of friction with RDBMS based archive storage
• Needed more scalable archive storage backend
Solution:
• Keep MySQL for active data (100mil)
• MongoDB for archive (2+ billion)
Results:
• No more alter table statements taking over 2 months to run
• Sharding fixed vertical scale problem
• Very happily looking at other places to use MongoDB
Online Dictionary
Problem:
• MySQL could not scale to handle their 5B+ documents

Solution:
• Switched from MySQL to MongoDB

Results:
• Massive simplification of code base
• Eliminated need for external caching system
• 20x performance improvement over MySQL
E-commerce
Problem:
• Multi-vertical E-commerce impossible to model (efficiently) in
  RDBMS

Solution:
• Switched from MySQL to MongoDB

Results:
•   Massive simplification of code base
•   Rapidly build, halving time to market (and cost)
•   Eliminated need for external caching system
•   50x+ performance improvement over MySQL
Tons more
   MongoDB casts a wide net

  people keep coming up with
 new and brilliant ways to use it
In Good Company




   and 1000s more
MongoD
  B
Start with an
              (or array, hash, dict, e

place1 = {

   name : "10gen HQ",

 address : "578 Broadway 7th Floor",

   city : "New York",

    zip : "10011",
   tags : [ "business", "awesome" ]
}
Inserting the record
    Initial Data Load


               > db.places.insert(place1)

> db.places.insert(place1)
Querying
{

    name : "10gen HQ",

 address : "134 5th Avenue 3rd Floor",

    city : "New York",

     zip : "10011",
   tags : [ "business", "awesome" ]
}

> db.posts.findOne({ zip: "10011",
            tags: "awesome" })

> db.posts.find({tags: "business" })
Nested Documents
  { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
    author : "roger",
    date : "Sat Apr 24 2011 19:47:11",
    text : "About MongoDB...",
    tags : [ "tech", "databases" ],
    comments : [

         {

         
 
 author : "Fred",

         
 
 date : "Sat Apr 25 2010 20:51:03",

         
 
 text : "Best Post Ever!"

         
}
     ]
}
Object ID
> db.places.insert(place1)

object(MongoId)#4 (1) {
  ["$id"]=> string(24) "4e9cc76a4a1817fd21000000"
}

   4e9cc76a4a1817fd21000000
   |------||----||--||----|
     ts  mac pid inc
A More Complex Document

place1 = {
   name : "10gen HQ",
 address : "578 Broadway 7th Floor",
   city : "New York",
     zip : "10011",
   tags : [ "business", "awesome" ],
 latlong : [40.0,72.0],
     tips : [ { user : "ryan",
              time : 6/26/2011,
               tip : "stop by for office hours"},
   
           {.....}]
}
Indexing & Adv Querying
// Index nested documents
db.posts.ensureIndex({ "comments.author":1 })
db.posts.find({'comments.author':'Fred'})

// Regular Expressions
db.posts.find({'comments.author': /^Fr/})

// Index on tags (multi-key index)
db.posts.ensureIndex({ tags: 1})
db.posts.find( { tags: 'tech' } )

// geospatial index
db.posts.ensureIndex({ "author.location": "2d" })
db.posts.find({"author.location":{$near:[22,42]}})
Updating
place1 = {
    name : "10gen HQ",
> db.places.update(
 address : "578 Broadway 7th Floor",
  {name : "10gen HQ"},
    city : "New York",
  { $push :
     zip : "10011",
       { tips :
    tags : [ "business", "awesome" ],
 latlong {: user : "nosh",
              [40.0,72.0],
             tips : [ { user : "ryan",
              time : 6/26/2011, 
                   time : 6/26/2011,
               tiptip"Office by for office hours on
                     : : "stop hours are great!"
           }              Wednesdays from 4-6pm"}, 
       }         { user : "nosh",
                   time : 7/14/2011, 
  }
                    tip : "Office hours are great!"}
)              ]
}
Updating
place1 = {
    name : "10gen HQ",
> db.places.update(
 address : "578 Broadway 7th Floor",
  {name : "10gen HQ"},
    city : "New York",
  { $push :
     zip : "10011",
       { tips :
    tags : [ "business", "awesome" ],
 latlong {: user : "nosh",
              [40.0,72.0],
             tips : [ { user : "ryan",
              time : 6/26/2011, 
                   time : 6/26/2011,
               tiptip"Office by for office hours on
                     : : "stop hours are great!"
           }              Wednesdays from 4-6pm"}, 
       }         { user : "nosh",
                   time : 7/14/2011, 
  }
                    tip : "Office hours are great!"}
)              ]
}
Atomic
   Operations
$set   $unset       $rename

   $push     $pop     $pull


 $addToSet          $in
Cursors
$cursor = $c->find(array("foo" => "bar"));

foreach ($cursor as $id => $value) {
   echo "$id: ";
   var_dump( $value );
}

$a = iterator_to_array($cursor);
Paging
page_num = 3;
results_per_page = 10;

cursor = db.collection.find()
  .sort({ "ts" : -1 })
  .skip(page_num * results_per_page)
  .limit(results_per_page);
Grid FS
Storing Files




Under 16mb
Storing Big Files




>16mb stored in 16mb chunks
Storing Big Files




Works with replicated and
A better network FS
GridFS files are seamlessly sharded & replicated.
No OS constraints...
No file size limits
No naming constraints
No folder limits
Standard across different OSs
MongoDB automatically generates the MD5 hash of
the file
MongoDB for
 Genealogy
   Data
Types of
      genealogy data
Events (birth, death,   Photographs
etc)
                        Diaries & letters
Official records
                        Ship passenger list
Census
                        Occupation
Names
                        and more
Relationships
Challenges of
           genealogy data
Lots of possible data points... need flexible schema
Multiple versions of same data point
(3 different dates for death date, 4 variations on
name).
Data related to records
Multiple versions of same nodes
(intelligent nondestructive merge needed)
Need to have meta data associated
Genealo
 gy is
changin
   g
0   @I2@ INDI
1   NAME Charles Phillip /Ingalls/
1   SEX M
1   BIRT
2   DATE 10 JAN 1836
2   PLAC Cuba, Allegheny, NY
1   DEAT


                           Recog
2   DATE 08 JUN 1902
2   PLAC De Smet, Kingsbury, Dakota Territory
1   FAMC @F2@
1   FAMS @F3@


                            nize
0   @I3@ INDI
1   NAME Caroline Lake /Quiner/
1   SEX F
1   BIRT
2   DATE 12 DEC 1839
GEDCOM
File format, not a database
Handles the great variety of data well
Doesn’t really scale beyond a local user.
Doesn’t provide good mechanism for storing
external documents (birth certificates, etc).
Built to solve problem of sharing data
Genealogy &
              MongoDB

Genealogy is anything but rigid and fixed
Flexible schema fits genealogy data well
Packaging things together makes sense
Relating records doesn’t require a relational
database
Indivi
•AFN
•Modification Date
                      Events[]
                    •type
                    •date
    Name            •contributor[]
                    •record[]
 •First[]
 •Middle[]            Location
 •Last[]             •city
                     •state
                     •county
                     •country
Indivi                  Events[]
                                          Us
                                         • Name
• AFN                • type              • Email Address
• Modification Date   • date              • Password
                     • contributor[]     • Individual_id
                     • record[]
   Name
• First[]
• Middle[]              Location
• Last[]               • city
                       • state           Rec
                       • county          • contributor
                       • country         • type
                       • coordinates[]   • thumbnail
                                         • content
                                         • description
                                         • tags[]
Individual
individual = {
  _id : ObjectId("4f2978dfaa999d9db02618ce"),
  AFN : '1XYK-KQJ',
  name: {
     first: ['john', 'johannes'],
     middle: 'peter',
     last: ['smith', 'sandvik']
   }
}
Individual
individual = {
  _id : ObjectId("4f2978dfaa999d9db02618ce"),
  AFN : '1XYK-KQJ',
  name: {
     first: ['john', 'johannes'],
     middle: 'peter',
     last: ['smith', 'sandvik']
   }
}


db.individual.find(
{name.first : ‘john’, name.middle : ‘peter’})
Events
events : [
   death : {
    date : ISODate('1989-07-14'),
    location : {
      city: 'pensacola',
      state: 'fl',
      county: 'escambia',
      country: 'usa'
      coordinates : [30.26,87.12]},
    contributor : ObjectId("4eeac...691")}]
events : [
   death : {
                Events
    date : ISODate('1989-07-14'),
    location : {
      city: 'pensacola',
      state: 'fl',
      county: 'escambia',
      country: 'usa'
      coordinates : [30.26,87.12]},
    contributor : ObjectId("4eeac...691")}]

db.individual.find(
{events.death.date : ISODate(‘1989-07-14’)})

db.individual.find(
{events.death.location : { $near:[30,90]}})
Duplicate Events
events : [
  birth : [ {
      date : ISODate('1928-04-06'),
      location : {
        city: 'brattleboro',
        state: 'vt',
        county: 'windham',
        country: 'usa'
        coordinates : [42.51,72.34]},
      contributor : ObjectId("4ee...00000"),
      records: ObjectId("4ed8a...7b000000")
  },
county: 'windham',

Duplicate Events
            country: 'usa'
            coordinates : [42.51,72.34]},
          contributor : ObjectId("4ee...00000"),
          records: ObjectId("4ed8a...7b000000")
    },
    {
          date : ISODate('1928-04-16'),
          location : {
            city: 'brattleboro',
            state: 'vt',
            county: 'windham',
            country: 'usa'
            coordinates : [42.51,72.34]},
          contributor : ObjectId("4ee...37bb"),
          records: ObjectId("4eea...0000c8"),
    }],
}
Duplicate Events
events : [
  birth : [ { date : ISODate('1928-04-06')},
          { date : ISODate('1928-04-16')}],
]

db.individual.find(
{events.birth.date : ISODate(‘1928-04-16’)})

                     Same Query
                       Works!!
Multiple Events
marriage : [{
  date : ISODate('1939-08-11'),
  end_date : ISODate('1940-02-19'),
  to : ObjectId("4f297978aa999d9db02618cf"),
  location : {
    city: 'raleigh',
    state: 'nc',
    county: 'wake',
    country: 'usa'
    coordinates : [35.49,78.38]},
  contributor : ObjectId("4eeac...91537bb")},
{
  date : ISODate('1944-04-19'),
  to : ObjectId("4f2978dfaa999d9db02618ce"),
  location : {
marriage : [{


 Multiple Events
  date : ISODate('1939-08-11'),
  end_date : ISODate('1940-02-19'),
  to : ObjectId("4f297978aa999d9db02618cf"),
  location : {
    city: 'raleigh',
    state: 'nc',
    county: 'wake',
    country: 'usa'
    coordinates : [35.49,78.38]},
  contributor : ObjectId("4eeac...91537bb")},
{
  date : ISODate('1944-04-19'),
  to : ObjectId("4f2978dfaa999d9db02618ce"),
  location : {
    city: 'atlanta',
    state: 'ga',
    county: 'fulton',
    country: 'usa'
    coordinates : [33.45,84.23]},
    contributor : ObjectId("4eeb...37bb")}]
individual = {                              All
   _id : ObjectId("4f2978dfaa999d9db02618ce"),




                                          togeth
   AFN : '1XYK-KQJ',
   name: {
      first: ['john', 'johannes'],
      middle: 'peter',
      last: ['smith', 'sandvik']
   },
   events : [



                                            er
      birth : [
         {
             date : ISODate('1928-04-06'),
             location : {
                                   Text
                city: 'brattleboro',
                state: 'vt',
                county: 'windham',
                country: 'usa'
                coordinates : [42.51,72.34]
             },
             contributor : ObjectId("4eeabc958b691537bb000000"),
             records: ObjectId("4ed8aea7d8562f7d7b000000")
         },
         {
             date : ISODate('1928-04-16'),
             location : {
                city: 'brattleboro',
Records
record1 = {
   _id : ObjectId("4ed8aea7d8562f7d7b")
   contributor : ObjectId("4eeab...1537bb"),
   type : 'birth certificate',
   thumbnail : BinData(0,"/9j/4AAQSkZJ...."),
   content : BinData(0,"j6b/Id11lWqs..."),
   tags : ['NY', 'certified'],
   description : "John's birth certificate"
}
Users
user = {
  _id : ObjectId("4eeabc958b691537bb"),
  username : 'spf13',
  email_address : 'genealogy@spf13.com',
  password : 'a.long.passphrase18',
  individual_id : ObjectId("4f2f...0ce"),
}
Scaling
 MongoDB
 for all the
generation
Replica Sets
Primary         Primary    Primary

Secondary      Secondary   Secondary


Secondary       Arbiter    Secondary

                           Secondary

                           Secondary
Sharding
          App       App      App
         Server    Server   Server
         MongoS    MongoS    MongoS

                                           ConfigD
                                           ConfigD
                                           ConfigD


MongoD       MongoD     MongoD    MongoD

MongoD       MongoD     MongoD    MongoD

MongoD       MongoD     MongoD    MongoD
The Family
 Tree
It’s not a tree at all,
  It’s really a graph
     ... and an odd one at that
It would be easy if it
always looked like this
It would be easy if it
always looked like this
All sorts of mess
Step & adopted relationships
Duplicate nodes
Lots of missing nodes
Divorces and re-marriages
Multiple names for the same person
Multiple dates for the same event
How to make
sense of it all
Storing a
graph
   in
Graphs are important




Without them we couldn’t store family relationships
Trees / graphs
        in MongoDB
Since MongoDB data structures are
essentially objects, a good degree of
flexibility here.
Think of how you would structure them in
your application
Trees / graphs
        in MongoDB
Each node is stored as a document

Contains references to related nodes

What is “related” depends on your
application
References vs
         Relation
MongoDB uses references
Unlike foreign keys, references don’t
enforce integrity
Reference is really just a reference
For many applications a reference is
sufficient
Simple relationship
{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }
{   _id:   "e", parents: ["a", "b" ]}
{   _id:   "f", parents: ["c", "d" ]}
{   _id:   "g", parents: ["e", "f" ]}



•= b =allancestors of g: of'g'});'b'}).toArray();
  Easy to access b:
//find
//find all descendants
var
                             nodes in either direction
           db.family.find({ _id:
g db.family.findOne({_id:
•Good for trees / {graphs
descendantsFind = function(par) {
ancestorFind = function(child)

• if ( ! (i in par) return sets
   var rv
  Can==[];[]; { large rv;
  var rv
             grab
   for child.parents)
//finddb.family.find( { descendants of b:} ).toArray();
  var k = all db.family.find( { _id : :{ par[i]._id }).toArray();
        parents = direct parents $in : child.parents }
•Minimum amount of maintenance
  rv = rv.concat(parents);
       rv = rv.concat(k);
>forrv = irv.concat(descendantsFind(k)); : ‘b’})
     db.family.find({ parents
        (var in parents) {
•Balanced ancestorFind(parents[i]));
  }
   }
      rv = rv.concat(
   return rv;
•Implied relationships
}
}
  return rv;


descendantsFind(b);
ancestorFind(g);
Simple relationship
{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }
{   _id:   "e", parents: ["a", "b" ]}
{   _id:   "f", parents: ["c", "d" ]}
{   _id:   "g", parents: ["e", "f" ]}



•= b =allancestors of g: of'g'});'b'}).toArray();
  Easy to access b:
//find
//find all descendants
var
                             nodes in either direction
           db.family.find({ _id:
g db.family.findOne({_id:
•Good for trees / {graphs
descendantsFind = function(par) {
ancestorFind = function(child)

• if ( ! (i in par) return sets
   var rv
  Can==[];[]; { large rv;
  var rv
             grab
   for child.parents)
//finddb.family.find( { descendants of b:} ).toArray();
  var k = all db.family.find( { _id : :{ par[i]._id }).toArray();
        parents = direct parents $in : child.parents }
•Minimum amount of maintenance
  rv = rv.concat(parents);
       rv = rv.concat(k);
>forrv = irv.concat(descendantsFind(k)); : ‘b’})
     db.family.find({ parents
        (var in parents) {
•Balanced ancestorFind(parents[i]));
  }
   }
      rv = rv.concat(
   return rv;
•Implied relationships
}
}
  return rv;


descendantsFind(b);
ancestorFind(g);
Simple relationship
{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }
{   _id:   "e", parents: ["a", "b" ]}
{   _id:   "f", parents: ["c", "d" ]}
{   _id:   "g", parents: ["e", "f" ]}



•= b =allancestors of g: of'g'});'b'}).toArray();
  Easy to access b:
//find
//find all descendants
var
                             nodes in either direction
           db.family.find({ _id:
g db.family.findOne({_id:
•Good for trees / {graphs
descendantsFind = function(par) {
ancestorFind = function(child)

• if ( ! (i in par) return sets
   var rv
  Can==[];[]; { large rv;
  var rv
             grab
   for child.parents)
//finddb.family.find( { descendants of b:} ).toArray();
  var k = all db.family.find( { _id : :{ par[i]._id }).toArray();
        parents = direct parents $in : child.parents }
•Minimum amount of maintenance
  rv = rv.concat(parents);
       rv = rv.concat(k);
>forrv = irv.concat(descendantsFind(k)); : ‘b’})
     db.family.find({ parents
        (var in parents) {
•Balanced ancestorFind(parents[i]));
  }
   }
      rv = rv.concat(
   return rv;
•Implied relationships
}
}
  return rv;


descendantsFind(b);
ancestorFind(g);
Simple relationship
{   _id:   "a" } { _id: "b" } { _id: "c" } { _id: "d" }
{   _id:   "e", parents: ["a", "b" ]}
{   _id:   "f", parents: ["c", "d" ]}
{   _id:   "g", parents: ["e", "f" ]}



•= b =allancestors of g: of'g'});'b'}).toArray();
  Easy to access b:
//find
//find all descendants
var
                             nodes in either direction
           db.family.find({ _id:
g db.family.findOne({_id:
•Good for trees / {graphs
descendantsFind = function(par) {
ancestorFind = function(child)

• if ( ! (i in par) return sets
   var rv
  Can==[];[]; { large rv;
  var rv
             grab
   for child.parents)
//finddb.family.find( { descendants of b:} ).toArray();
  var k = all db.family.find( { _id : :{ par[i]._id }).toArray();
        parents = direct parents $in : child.parents }
•Minimum amount of maintenance
  rv = rv.concat(parents);
       rv = rv.concat(k);
>forrv = irv.concat(descendantsFind(k)); : ‘b’})
     db.family.find({ parents
        (var in parents) {
•Balanced ancestorFind(parents[i]));
  }
   }
      rv = rv.concat(
   return rv;
•Implied relationships
}
}
  return rv;


descendantsFind(b);
ancestorFind(g);
Bi-directional
 {   _id:   "a", children: ["e"] }
 {   _id:   "b", children: ["e"] }
 {   _id:   "c", children: ["f"] }
 {   _id:   "d", children: ["f"] }
 {   _id:   "e", children: ["g"], parents: ["a", "b" ]}
 {   _id:   "f", children: ["g"], parents: ["c", "d" ]}
 {   _id:   "g", children: [] , parents: ["e", "f"] }


•Doesn’t really add much beyond the first example
•More maintenance
•Duplication of each relationship
•Only real advantage is ability to grab all related
nodes (both directions) with one query.
Array of Ancestors
{   _id:   "a" }
{   _id:   "b" }
{   _id:   "c" }
{   _id:   "d" }
{   _id:   "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]}
{   _id:   "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]}
{   _id:   "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] }



Great for small trees (or subsets).
//find all descendants of b:
> db.tree.find({ ancestors: ‘b’})
Could be used to store X generations of ancestors
Optimized for retrieving entire tree
//find all direct descendants of b:
> db.tree.find({ parents: ‘b’})
Uses implied relationships
//find all ancestors of g:
No = db.tree.findOne( { _id: 'g'is )this person my grandson?
> g help on specifics... }
> db.tree.find( { _id: { $in : g.ancestors } )
Easier retrieval at expense of costlier maintenance
Array of Ancestors
{   _id:   "a" }
{   _id:   "b" }
{   _id:   "c" }
{   _id:   "d" }
{   _id:   "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]}
{   _id:   "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]}
{   _id:   "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] }



Great for small trees (or subsets).
//find all descendants of b:
> db.tree.find({ ancestors: ‘b’})
Could be used to store X generations of ancestors
Optimized for retrieving entire tree
//find all direct descendants of b:
> db.tree.find({ parents: ‘b’})
Uses implied relationships
//find all ancestors of g:
No = db.tree.findOne( { _id: 'g'is )this person my grandson?
> g help on specifics... }
> db.tree.find( { _id: { $in : g.ancestors } )
Easier retrieval at expense of costlier maintenance
Relations (basic)
{   _id     : "b",
    relations : [
       {
         id      : "a",
         relation : "parent"},
       {
         id      : "c",
         relation : "grandparent"},
       {
         id      : "d",
         relation : "parent"}]}
Relations (detailed)
{   _id     : "b",
    relations : [
       {
         id      : "a",
         relation : "parent",
         type      : "mother",
         subtype : "biological" },
       {
         id      : "c",
         relation : "parent",
         type      : "father",
         subtype : "adopted"},
       {
         id      : "d",
         relation : "parent",
         type      : "father",
         subtype : "biological"}]}
Shouldn’t I store my
family tree in a graph
     database?
   They are built to store trees after all
Graphs are great at
traversing deep in a tree

              • Is this node my
                relative?


              • Retrieve my paternal
                great, great, great,
                great grandpa
Graphs are great at
traversing deep in a tree

              • Is this node my
                relative?


              • Retrieve my paternal
                great, great, great,
                great grandpa
Graphs are great at
traversing deep in a tree

              • Is this node my
                relative?


              • Retrieve my paternal
                great, great, great,
                great grandpa
Unfortunately that’s not
how we commonly work
Typically we are working with a node and
it’s immediate neighbors
The significant majority of our operations
aren’t traversing

If those operations are
important, perhaps a
hybrid graph & document
solution makes sense
http://spf13.com
                           http://github.com/s
                           @spf13




Question
    download at mongodb.org
We’re hiring!! Contact us at jobs@10gen.com
MongoDB for Genealogy

More Related Content

What's hot

Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBMongoDB
 
MDM Institute: Why is Reference data mission critical now?
MDM Institute: Why is Reference data mission critical now?MDM Institute: Why is Reference data mission critical now?
MDM Institute: Why is Reference data mission critical now?Orchestra Networks
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceAbdelmonaim Remani
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
NoSQL @ CodeMash 2010
NoSQL @ CodeMash 2010NoSQL @ CodeMash 2010
NoSQL @ CodeMash 2010Ben Scofield
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Data Modeling and Relational to NoSQL
 Data Modeling and Relational to NoSQL  Data Modeling and Relational to NoSQL
Data Modeling and Relational to NoSQL DATAVERSITY
 
Introducción a NoSQL
Introducción a NoSQLIntroducción a NoSQL
Introducción a NoSQLCycle-IT
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...Marcin Bielak
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016DataStax
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMike Friedman
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 
MongoDB - NoSQL Overview
MongoDB - NoSQL OverviewMongoDB - NoSQL Overview
MongoDB - NoSQL OverviewCihan Özhan
 

What's hot (20)

Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
 
MDM Institute: Why is Reference data mission critical now?
MDM Institute: Why is Reference data mission critical now?MDM Institute: Why is Reference data mission critical now?
MDM Institute: Why is Reference data mission critical now?
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot Persistence
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
NoSQL @ CodeMash 2010
NoSQL @ CodeMash 2010NoSQL @ CodeMash 2010
NoSQL @ CodeMash 2010
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Introduction to Aerospike
Introduction to AerospikeIntroduction to Aerospike
Introduction to Aerospike
 
Data Modeling and Relational to NoSQL
 Data Modeling and Relational to NoSQL  Data Modeling and Relational to NoSQL
Data Modeling and Relational to NoSQL
 
Introducción a NoSQL
Introducción a NoSQLIntroducción a NoSQL
Introducción a NoSQL
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
MongoDB - NoSQL Overview
MongoDB - NoSQL OverviewMongoDB - NoSQL Overview
MongoDB - NoSQL Overview
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 

Similar to MongoDB for Genealogy

OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBSean Laurent
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesignMongoDB APAC
 
10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCupWebGeek Philippines
 
Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012hungarianhc
 
mongodb-120401144140-phpapp01 claud camputing
mongodb-120401144140-phpapp01 claud camputingmongodb-120401144140-phpapp01 claud camputing
mongodb-120401144140-phpapp01 claud camputingmoeincanada007
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 
No SQL : Which way to go? Presented at DDDMelbourne 2015
No SQL : Which way to go?  Presented at DDDMelbourne 2015No SQL : Which way to go?  Presented at DDDMelbourne 2015
No SQL : Which way to go? Presented at DDDMelbourne 2015Himanshu Desai
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsGeorge Stathis
 
How companies use NoSQL and Couchbase
How companies use NoSQL and CouchbaseHow companies use NoSQL and Couchbase
How companies use NoSQL and CouchbaseDipti Borkar
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...Prasoon Kumar
 
MongoDB by Emroz sardar.
MongoDB by Emroz sardar.MongoDB by Emroz sardar.
MongoDB by Emroz sardar.Emroz Sardar
 

Similar to MongoDB for Genealogy (20)

OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign
 
10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup
 
Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012
 
mongodb-120401144140-phpapp01 claud camputing
mongodb-120401144140-phpapp01 claud camputingmongodb-120401144140-phpapp01 claud camputing
mongodb-120401144140-phpapp01 claud camputing
 
MongoDB
MongoDBMongoDB
MongoDB
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
No SQL : Which way to go? Presented at DDDMelbourne 2015
No SQL : Which way to go?  Presented at DDDMelbourne 2015No SQL : Which way to go?  Presented at DDDMelbourne 2015
No SQL : Which way to go? Presented at DDDMelbourne 2015
 
NoSQL, which way to go?
NoSQL, which way to go?NoSQL, which way to go?
NoSQL, which way to go?
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
How companies use NoSQL and Couchbase
How companies use NoSQL and CouchbaseHow companies use NoSQL and Couchbase
How companies use NoSQL and Couchbase
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
 
mongoDB at Visibiz
mongoDB at VisibizmongoDB at Visibiz
mongoDB at Visibiz
 
MongoDB by Emroz sardar.
MongoDB by Emroz sardar.MongoDB by Emroz sardar.
MongoDB by Emroz sardar.
 
MongoDB and hadoop
MongoDB and hadoopMongoDB and hadoop
MongoDB and hadoop
 
MongoDB
MongoDBMongoDB
MongoDB
 

More from Steven Francia

State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017Steven Francia
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in GoSteven Francia
 
The Future of the Operating System - Keynote LinuxCon 2015
The Future of the Operating System -  Keynote LinuxCon 2015The Future of the Operating System -  Keynote LinuxCon 2015
The Future of the Operating System - Keynote LinuxCon 2015Steven Francia
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)Steven Francia
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needsSteven Francia
 
7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid themSteven Francia
 
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Steven Francia
 
Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Steven Francia
 
Getting Started with Go
Getting Started with GoGetting Started with Go
Getting Started with GoSteven Francia
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Steven Francia
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopSteven Francia
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012Steven Francia
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of usSteven Francia
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoverySteven Francia
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center StrategiesSteven Francia
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataSteven Francia
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsSteven Francia
 
Building your first application w/mongoDB MongoSV2011
Building your first application w/mongoDB MongoSV2011Building your first application w/mongoDB MongoSV2011
Building your first application w/mongoDB MongoSV2011Steven Francia
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsSteven Francia
 

More from Steven Francia (20)

State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in Go
 
The Future of the Operating System - Keynote LinuxCon 2015
The Future of the Operating System -  Keynote LinuxCon 2015The Future of the Operating System -  Keynote LinuxCon 2015
The Future of the Operating System - Keynote LinuxCon 2015
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needs
 
7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them
 
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
 
Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go
 
Getting Started with Go
Getting Started with GoGetting Started with Go
Getting Started with Go
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
Future of data
Future of dataFuture of data
Future of data
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of us
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center Strategies
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous Data
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
 
Building your first application w/mongoDB MongoSV2011
Building your first application w/mongoDB MongoSV2011Building your first application w/mongoDB MongoSV2011
Building your first application w/mongoDB MongoSV2011
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

MongoDB for Genealogy

  • 1. Storing the Family Tree with
  • 2. We’re going to talk about MongoDB Intro & Fundamentals MongoDB for Genealogy data Scaling MongoDB for all the generations The Family Tree Storing a graph in MongoDB
  • 3. Steve @sp A 15+ years building the internet Father, husband, skateboarder, genealogist at ❤ Chief Solutions Architect @ responsible for drivers, integrations, web & docs
  • 4. Company behind MongoDB Offices in NYC, Palo Alto, London & Dublin 100+ employees Support, consulting, training Mgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark Logic Well Funded: Sequoia, Union Square, Flybridge
  • 5. Introduction to MongoD
  • 8.
  • 9.
  • 10. 1979
  • 11. 1979 1994
  • 12. 1979 1994 1995
  • 13. Computers in 1995 100 mhz Pentium 10 base T 16 MB ram 200 MB HD
  • 14. Cloud in 1995 (Windows 95 cloud wallpaper)
  • 15. Cell Phones in 2012 Dual core 1.5Ghz 802.11n (300+ Mbps) 1 GB ram 64 GB Solid State
  • 16. MongoDB Application Document Oriented High { author : “steve”, date : new Date(), Performance text : “About MongoDB...”, tags : [“tech”, “database”]} Fully Consistent Horizontally Scalable
  • 17. MongoDB philosophy Keep functionality when we can (key/value stores are great, but we need more) Non-relational (no joins) makes scaling horizontally practical Document data models are good Database technology should run anywhere virtualized, cloud, metal, etc
  • 18. Under the hood Written in C++ Runs nearly everywhere Data serialized to BSON Extensive use of memory-mapped files i.e. read-through write-through memory caching.
  • 19. Database Landscape Scalability & Performance MemCache MongoDB RDBMS Depth of Functionality
  • 20. “ MongoDB has the best features of key/value stores, document databases and relational databases in one. John Nunemaker
  • 21. Relational made normalized data look like this Category • Name • Url Article User • Name Tag • Name • Slug • Name • Email Address • Publish date • Url • Text Comment • Comment • Date • Author
  • 22. Document databases make normalized data look like this Article • Name • Slug • Publish date User • Text • Name • Author • Email Address Comment[] • Comment • Date • Author Tag[] • Value Category[] • Value
  • 23. But we’ve been using a relational database for 40 years!
  • 24. How do people store documents in real life?
  • 25. Think about a doctors office There’s two ways they could organize their files
  • 26. Each document type in it’s own drawer MRIs X-rays Lab Invoices Index 1 1 1 1 1 1 1 1 History Medications Lab Forms
  • 27. Each document type in it’s own drawer MRIs X-rays Lab Invoices Index 1 1 1 1 1 1 1 1 History Medications Lab Forms
  • 28. Each document type in it’s own drawer MRIs X-rays Lab Invoices Index 1 1 1 1 1 1 1 1 History Medications Lab Forms
  • 29. 2. Group related records Patient 1 Patient 2 Patient 3 ... Vendor 1 Vendor 2 Vendor 3
  • 30. 2. Group related records Patient 1 Patient 3 ... Patient 2 Vendor 1 Vendor 2 Vendor 3
  • 31. Databases work the same way Relation Docum Patient 1 Vendor 1 Article Category • Name • Name • Slug • Url • Publish User date • Text • Name • Author • Email Address Article User Tag • Name Comment[] • Name • Name • Email • Slug • Url • Comment Address • Publish date • Date • Author Comment Tag[] • Comment • Value • Date • Author Category[] • Value
  • 32. Terminology RDBMS Mongo Table, View ➜ Collection Row ➜ Document Index ➜ Index Join ➜ Embedded Foreign Key ➜ Document Reference Partition ➜ Shard
  • 33. Why MongoDB My Top 10 Reasons 10. Great developer experience 9. Speaks your language 8. Scale horizontally 7. Fully consistent data w/atomic operations 1.It’s web scale 6. Memory caching integrated 5. Open source 4. Flexible, rich & structured data format not just K:V 3. Ludicrously fast (without going plaid) 2. Simplify infrastructure & application
  • 34. Why MongoDB My Top 10 Reasons 10. Great developer experience 9. Speaks your language 8. Scale horizontally 7. Fully consistent data w/atomic operations 1.It’s web scale 6. Memory caching integrated 5. Open source 4. Flexible, rich & structured data format not just K:V 3. Ludicrously fast (without going plaid) 2. Simplify infrastructure & application
  • 36. CMS / Blog Needs: • Business needed modern data store for rapid development and scale Solution: • Use PHP & MongoDB Results: • Real time statistics • All data, images, etc stored together easy access, easy deployment, easy high availability • No need for complex migrations • Enabled very rapid development and growth
  • 37. Photo Meta-Data Problem: • Business needed more flexibility than Oracle could deliver Solution: • Use MongoDB instead of Oracle Results: • Developed application in one sprint cycle • 500% cost reduction compared to Oracle • 900% performance improvement compared to Oracle
  • 38. Customer Analytics Problem: • Deal with massive data volume across all customer sites Solution: • Use MongoDB to replace Google Analytics / Omniture options Results: • Less than one week to build prototype and prove business case • Rapid deployment of new features
  • 39. Archiving Why MongoDB: • Existing application built on MySQL • Lots of friction with RDBMS based archive storage • Needed more scalable archive storage backend Solution: • Keep MySQL for active data (100mil) • MongoDB for archive (2+ billion) Results: • No more alter table statements taking over 2 months to run • Sharding fixed vertical scale problem • Very happily looking at other places to use MongoDB
  • 40. Online Dictionary Problem: • MySQL could not scale to handle their 5B+ documents Solution: • Switched from MySQL to MongoDB Results: • Massive simplification of code base • Eliminated need for external caching system • 20x performance improvement over MySQL
  • 41. E-commerce Problem: • Multi-vertical E-commerce impossible to model (efficiently) in RDBMS Solution: • Switched from MySQL to MongoDB Results: • Massive simplification of code base • Rapidly build, halving time to market (and cost) • Eliminated need for external caching system • 50x+ performance improvement over MySQL
  • 42. Tons more MongoDB casts a wide net people keep coming up with new and brilliant ways to use it
  • 43. In Good Company and 1000s more
  • 45. Start with an (or array, hash, dict, e place1 = { name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ] }
  • 46. Inserting the record Initial Data Load > db.places.insert(place1) > db.places.insert(place1)
  • 47. Querying { name : "10gen HQ", address : "134 5th Avenue 3rd Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ] } > db.posts.findOne({ zip: "10011", tags: "awesome" }) > db.posts.find({tags: "business" })
  • 48. Nested Documents { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat Apr 24 2011 19:47:11", text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [ { author : "Fred", date : "Sat Apr 25 2010 20:51:03", text : "Best Post Ever!" } ] }
  • 49. Object ID > db.places.insert(place1) object(MongoId)#4 (1) { ["$id"]=> string(24) "4e9cc76a4a1817fd21000000" } 4e9cc76a4a1817fd21000000 |------||----||--||----| ts mac pid inc
  • 50. A More Complex Document place1 = { name : "10gen HQ", address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "awesome" ], latlong : [40.0,72.0], tips : [ { user : "ryan", time : 6/26/2011, tip : "stop by for office hours"}, {.....}] }
  • 51. Indexing & Adv Querying // Index nested documents db.posts.ensureIndex({ "comments.author":1 }) db.posts.find({'comments.author':'Fred'}) // Regular Expressions db.posts.find({'comments.author': /^Fr/}) // Index on tags (multi-key index) db.posts.ensureIndex({ tags: 1}) db.posts.find( { tags: 'tech' } ) // geospatial index db.posts.ensureIndex({ "author.location": "2d" }) db.posts.find({"author.location":{$near:[22,42]}})
  • 52. Updating place1 = { name : "10gen HQ", > db.places.update( address : "578 Broadway 7th Floor", {name : "10gen HQ"}, city : "New York", { $push : zip : "10011", { tips : tags : [ "business", "awesome" ], latlong {: user : "nosh", [40.0,72.0], tips : [ { user : "ryan", time : 6/26/2011, time : 6/26/2011, tiptip"Office by for office hours on : : "stop hours are great!" } Wednesdays from 4-6pm"}, } { user : "nosh", time : 7/14/2011, } tip : "Office hours are great!"} ) ] }
  • 53. Updating place1 = { name : "10gen HQ", > db.places.update( address : "578 Broadway 7th Floor", {name : "10gen HQ"}, city : "New York", { $push : zip : "10011", { tips : tags : [ "business", "awesome" ], latlong {: user : "nosh", [40.0,72.0], tips : [ { user : "ryan", time : 6/26/2011, time : 6/26/2011, tiptip"Office by for office hours on : : "stop hours are great!" } Wednesdays from 4-6pm"}, } { user : "nosh", time : 7/14/2011, } tip : "Office hours are great!"} ) ] }
  • 54. Atomic Operations $set $unset $rename $push $pop $pull $addToSet $in
  • 55. Cursors $cursor = $c->find(array("foo" => "bar")); foreach ($cursor as $id => $value) { echo "$id: "; var_dump( $value ); } $a = iterator_to_array($cursor);
  • 56. Paging page_num = 3; results_per_page = 10; cursor = db.collection.find() .sort({ "ts" : -1 }) .skip(page_num * results_per_page) .limit(results_per_page);
  • 59. Storing Big Files >16mb stored in 16mb chunks
  • 60. Storing Big Files Works with replicated and
  • 61. A better network FS GridFS files are seamlessly sharded & replicated. No OS constraints... No file size limits No naming constraints No folder limits Standard across different OSs MongoDB automatically generates the MD5 hash of the file
  • 63. Types of genealogy data Events (birth, death, Photographs etc) Diaries & letters Official records Ship passenger list Census Occupation Names and more Relationships
  • 64. Challenges of genealogy data Lots of possible data points... need flexible schema Multiple versions of same data point (3 different dates for death date, 4 variations on name). Data related to records Multiple versions of same nodes (intelligent nondestructive merge needed) Need to have meta data associated
  • 66. 0 @I2@ INDI 1 NAME Charles Phillip /Ingalls/ 1 SEX M 1 BIRT 2 DATE 10 JAN 1836 2 PLAC Cuba, Allegheny, NY 1 DEAT Recog 2 DATE 08 JUN 1902 2 PLAC De Smet, Kingsbury, Dakota Territory 1 FAMC @F2@ 1 FAMS @F3@ nize 0 @I3@ INDI 1 NAME Caroline Lake /Quiner/ 1 SEX F 1 BIRT 2 DATE 12 DEC 1839
  • 67. GEDCOM File format, not a database Handles the great variety of data well Doesn’t really scale beyond a local user. Doesn’t provide good mechanism for storing external documents (birth certificates, etc). Built to solve problem of sharing data
  • 68. Genealogy & MongoDB Genealogy is anything but rigid and fixed Flexible schema fits genealogy data well Packaging things together makes sense Relating records doesn’t require a relational database
  • 69. Indivi •AFN •Modification Date Events[] •type •date Name •contributor[] •record[] •First[] •Middle[] Location •Last[] •city •state •county •country
  • 70. Indivi Events[] Us • Name • AFN • type • Email Address • Modification Date • date • Password • contributor[] • Individual_id • record[] Name • First[] • Middle[] Location • Last[] • city • state Rec • county • contributor • country • type • coordinates[] • thumbnail • content • description • tags[]
  • 71. Individual individual = { _id : ObjectId("4f2978dfaa999d9db02618ce"), AFN : '1XYK-KQJ', name: { first: ['john', 'johannes'], middle: 'peter', last: ['smith', 'sandvik'] } }
  • 72. Individual individual = { _id : ObjectId("4f2978dfaa999d9db02618ce"), AFN : '1XYK-KQJ', name: { first: ['john', 'johannes'], middle: 'peter', last: ['smith', 'sandvik'] } } db.individual.find( {name.first : ‘john’, name.middle : ‘peter’})
  • 73. Events events : [ death : { date : ISODate('1989-07-14'), location : { city: 'pensacola', state: 'fl', county: 'escambia', country: 'usa' coordinates : [30.26,87.12]}, contributor : ObjectId("4eeac...691")}]
  • 74. events : [ death : { Events date : ISODate('1989-07-14'), location : { city: 'pensacola', state: 'fl', county: 'escambia', country: 'usa' coordinates : [30.26,87.12]}, contributor : ObjectId("4eeac...691")}] db.individual.find( {events.death.date : ISODate(‘1989-07-14’)}) db.individual.find( {events.death.location : { $near:[30,90]}})
  • 75. Duplicate Events events : [ birth : [ { date : ISODate('1928-04-06'), location : { city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...00000"), records: ObjectId("4ed8a...7b000000") },
  • 76. county: 'windham', Duplicate Events country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...00000"), records: ObjectId("4ed8a...7b000000") }, { date : ISODate('1928-04-16'), location : { city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...37bb"), records: ObjectId("4eea...0000c8"), }], }
  • 77. Duplicate Events events : [ birth : [ { date : ISODate('1928-04-06')}, { date : ISODate('1928-04-16')}], ] db.individual.find( {events.birth.date : ISODate(‘1928-04-16’)}) Same Query Works!!
  • 78. Multiple Events marriage : [{ date : ISODate('1939-08-11'), end_date : ISODate('1940-02-19'), to : ObjectId("4f297978aa999d9db02618cf"), location : { city: 'raleigh', state: 'nc', county: 'wake', country: 'usa' coordinates : [35.49,78.38]}, contributor : ObjectId("4eeac...91537bb")}, { date : ISODate('1944-04-19'), to : ObjectId("4f2978dfaa999d9db02618ce"), location : {
  • 79. marriage : [{ Multiple Events date : ISODate('1939-08-11'), end_date : ISODate('1940-02-19'), to : ObjectId("4f297978aa999d9db02618cf"), location : { city: 'raleigh', state: 'nc', county: 'wake', country: 'usa' coordinates : [35.49,78.38]}, contributor : ObjectId("4eeac...91537bb")}, { date : ISODate('1944-04-19'), to : ObjectId("4f2978dfaa999d9db02618ce"), location : { city: 'atlanta', state: 'ga', county: 'fulton', country: 'usa' coordinates : [33.45,84.23]}, contributor : ObjectId("4eeb...37bb")}]
  • 80. individual = { All _id : ObjectId("4f2978dfaa999d9db02618ce"), togeth AFN : '1XYK-KQJ', name: { first: ['john', 'johannes'], middle: 'peter', last: ['smith', 'sandvik'] }, events : [ er birth : [ { date : ISODate('1928-04-06'), location : { Text city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34] }, contributor : ObjectId("4eeabc958b691537bb000000"), records: ObjectId("4ed8aea7d8562f7d7b000000") }, { date : ISODate('1928-04-16'), location : { city: 'brattleboro',
  • 81. Records record1 = { _id : ObjectId("4ed8aea7d8562f7d7b") contributor : ObjectId("4eeab...1537bb"), type : 'birth certificate', thumbnail : BinData(0,"/9j/4AAQSkZJ...."), content : BinData(0,"j6b/Id11lWqs..."), tags : ['NY', 'certified'], description : "John's birth certificate" }
  • 82. Users user = { _id : ObjectId("4eeabc958b691537bb"), username : 'spf13', email_address : 'genealogy@spf13.com', password : 'a.long.passphrase18', individual_id : ObjectId("4f2f...0ce"), }
  • 83. Scaling MongoDB for all the generation
  • 84. Replica Sets Primary Primary Primary Secondary Secondary Secondary Secondary Arbiter Secondary Secondary Secondary
  • 85. Sharding App App App Server Server Server MongoS MongoS MongoS ConfigD ConfigD ConfigD MongoD MongoD MongoD MongoD MongoD MongoD MongoD MongoD MongoD MongoD MongoD MongoD
  • 87. It’s not a tree at all, It’s really a graph ... and an odd one at that
  • 88. It would be easy if it always looked like this
  • 89. It would be easy if it always looked like this
  • 90. All sorts of mess Step & adopted relationships Duplicate nodes Lots of missing nodes Divorces and re-marriages Multiple names for the same person Multiple dates for the same event
  • 91. How to make sense of it all
  • 93. Graphs are important Without them we couldn’t store family relationships
  • 94. Trees / graphs in MongoDB Since MongoDB data structures are essentially objects, a good degree of flexibility here. Think of how you would structure them in your application
  • 95. Trees / graphs in MongoDB Each node is stored as a document Contains references to related nodes What is “related” depends on your application
  • 96. References vs Relation MongoDB uses references Unlike foreign keys, references don’t enforce integrity Reference is really just a reference For many applications a reference is sufficient
  • 97. Simple relationship { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", parents: ["a", "b" ]} { _id: "f", parents: ["c", "d" ]} { _id: "g", parents: ["e", "f" ]} •= b =allancestors of g: of'g'});'b'}).toArray(); Easy to access b: //find //find all descendants var nodes in either direction db.family.find({ _id: g db.family.findOne({_id: •Good for trees / {graphs descendantsFind = function(par) { ancestorFind = function(child) • if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents) //finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents } •Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k); >forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) { •Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv; •Implied relationships } } return rv; descendantsFind(b); ancestorFind(g);
  • 98. Simple relationship { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", parents: ["a", "b" ]} { _id: "f", parents: ["c", "d" ]} { _id: "g", parents: ["e", "f" ]} •= b =allancestors of g: of'g'});'b'}).toArray(); Easy to access b: //find //find all descendants var nodes in either direction db.family.find({ _id: g db.family.findOne({_id: •Good for trees / {graphs descendantsFind = function(par) { ancestorFind = function(child) • if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents) //finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents } •Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k); >forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) { •Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv; •Implied relationships } } return rv; descendantsFind(b); ancestorFind(g);
  • 99. Simple relationship { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", parents: ["a", "b" ]} { _id: "f", parents: ["c", "d" ]} { _id: "g", parents: ["e", "f" ]} •= b =allancestors of g: of'g'});'b'}).toArray(); Easy to access b: //find //find all descendants var nodes in either direction db.family.find({ _id: g db.family.findOne({_id: •Good for trees / {graphs descendantsFind = function(par) { ancestorFind = function(child) • if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents) //finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents } •Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k); >forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) { •Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv; •Implied relationships } } return rv; descendantsFind(b); ancestorFind(g);
  • 100. Simple relationship { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", parents: ["a", "b" ]} { _id: "f", parents: ["c", "d" ]} { _id: "g", parents: ["e", "f" ]} •= b =allancestors of g: of'g'});'b'}).toArray(); Easy to access b: //find //find all descendants var nodes in either direction db.family.find({ _id: g db.family.findOne({_id: •Good for trees / {graphs descendantsFind = function(par) { ancestorFind = function(child) • if ( ! (i in par) return sets var rv Can==[];[]; { large rv; var rv grab for child.parents) //finddb.family.find( { descendants of b:} ).toArray(); var k = all db.family.find( { _id : :{ par[i]._id }).toArray(); parents = direct parents $in : child.parents } •Minimum amount of maintenance rv = rv.concat(parents); rv = rv.concat(k); >forrv = irv.concat(descendantsFind(k)); : ‘b’}) db.family.find({ parents (var in parents) { •Balanced ancestorFind(parents[i])); } } rv = rv.concat( return rv; •Implied relationships } } return rv; descendantsFind(b); ancestorFind(g);
  • 101. Bi-directional { _id: "a", children: ["e"] } { _id: "b", children: ["e"] } { _id: "c", children: ["f"] } { _id: "d", children: ["f"] } { _id: "e", children: ["g"], parents: ["a", "b" ]} { _id: "f", children: ["g"], parents: ["c", "d" ]} { _id: "g", children: [] , parents: ["e", "f"] } •Doesn’t really add much beyond the first example •More maintenance •Duplication of each relationship •Only real advantage is ability to grab all related nodes (both directions) with one query.
  • 102. Array of Ancestors { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]} { _id: "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]} { _id: "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] } Great for small trees (or subsets). //find all descendants of b: > db.tree.find({ ancestors: ‘b’}) Could be used to store X generations of ancestors Optimized for retrieving entire tree //find all direct descendants of b: > db.tree.find({ parents: ‘b’}) Uses implied relationships //find all ancestors of g: No = db.tree.findOne( { _id: 'g'is )this person my grandson? > g help on specifics... } > db.tree.find( { _id: { $in : g.ancestors } ) Easier retrieval at expense of costlier maintenance
  • 103. Array of Ancestors { _id: "a" } { _id: "b" } { _id: "c" } { _id: "d" } { _id: "e", ancestors: [ "a", "b" ], parents: ["a", "b" ]} { _id: "f", ancestors: [ "c", "d" ], parents: ["c", "d" ]} { _id: "g", ancestors: [ "a", "b", "c", "d", "e", "f" ], parents: ["e", "f"] } Great for small trees (or subsets). //find all descendants of b: > db.tree.find({ ancestors: ‘b’}) Could be used to store X generations of ancestors Optimized for retrieving entire tree //find all direct descendants of b: > db.tree.find({ parents: ‘b’}) Uses implied relationships //find all ancestors of g: No = db.tree.findOne( { _id: 'g'is )this person my grandson? > g help on specifics... } > db.tree.find( { _id: { $in : g.ancestors } ) Easier retrieval at expense of costlier maintenance
  • 104. Relations (basic) { _id : "b", relations : [ { id : "a", relation : "parent"}, { id : "c", relation : "grandparent"}, { id : "d", relation : "parent"}]}
  • 105. Relations (detailed) { _id : "b", relations : [ { id : "a", relation : "parent", type : "mother", subtype : "biological" }, { id : "c", relation : "parent", type : "father", subtype : "adopted"}, { id : "d", relation : "parent", type : "father", subtype : "biological"}]}
  • 106. Shouldn’t I store my family tree in a graph database? They are built to store trees after all
  • 107. Graphs are great at traversing deep in a tree • Is this node my relative? • Retrieve my paternal great, great, great, great grandpa
  • 108. Graphs are great at traversing deep in a tree • Is this node my relative? • Retrieve my paternal great, great, great, great grandpa
  • 109. Graphs are great at traversing deep in a tree • Is this node my relative? • Retrieve my paternal great, great, great, great grandpa
  • 110. Unfortunately that’s not how we commonly work Typically we are working with a node and it’s immediate neighbors The significant majority of our operations aren’t traversing If those operations are important, perhaps a hybrid graph & document solution makes sense
  • 111. http://spf13.com http://github.com/s @spf13 Question download at mongodb.org We’re hiring!! Contact us at jobs@10gen.com

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  10. Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  11. Remember in 1995 there were around 10,000 websites. Mosiac, Lynx, Mozilla (pre netscape) and IE 2.0 were the only web browsers. \nApache (Dec ’95), Java (’96), PHP (June ’95), and .net didn’t exist yet. Linux just barely (1.0 in ’94)\n
  12. \n
  13. \n
  14. \n
  15. \n
  16. By reducing transactional semantics the db provides, one can still solve an interesting set of problems where performance is very important, and horizontal scaling then becomes easier.\n\n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n
  89. \n
  90. \n
  91. \n
  92. \n
  93. \n
  94. \n
  95. \n
  96. Store an array of the id of the ancestor of a given document\n
  97. Store an array of the id of the ancestor of a given document\n
  98. Store an array of the id of the ancestor of a given document\n
  99. Store an array of the id of the ancestor of a given document\n
  100. Store an array of the id of the ancestor of a given document\n
  101. Store an array of the id of the ancestor of a given document\n
  102. Store an array of the id of the ancestor of a given document\n
  103. \n
  104. \n
  105. \n
  106. \n
  107. \n
  108. \n
  109. \n
  110. \n
  111. \n
  112. \n
  113. \n
  114. \n
  115. \n
  116. \n
  117. \n