SlideShare a Scribd company logo
1 of 39
Download to read offline
MongoDB Schema Design:
                        Insights and Tradeoffs


                                     Montse Medina
                                    COO,

Saturday, May 5, 12
Social content is useful
                  in context


Saturday, May 5, 12
Social context is
       useful in context
Saturday, May 5, 12
Algorithms
                             +
                      Infrastructure




Saturday, May 5, 12
Technology Stack




                                Apache Kafka

Saturday, May 5, 12
Outline
    I. Schema design
        ‣    Relational vs. Document-oriented

        ‣    Schema-less design

        ‣    Case study: Publishers & Subscribers

    II. Lessons learned for schema design
    III. Things to remember about MongoDB
Saturday, May 5, 12
Outline
    I. Schema design
        ‣    Relational vs. Document-oriented

        ‣    Schema-less design

        ‣    Case study: Publishers & Subscribers

    II. Lessons learned for schema design
    III. Things to remember about MongoDB
Saturday, May 5, 12
Relational vs. Document-
                   oriented
                                                        Users
                                                 { id: 1,
               Users            Graph              name: “Robert”,
                                                   from:[2],
              id       name    from   to
                                                   to: [5,20]}

                                            vs
                                1     5
              1       Robert
                                1     20         { id: 2,
              2       Monica                       name:”Monica”,
                                2     1
              3       Lucas                        from:[23],
                                2     5            to:[1,5]}
             ...        ...    ...    ...

                                                 ...



Saturday, May 5, 12
Find all the “to” edges for user 5
                       Graph
                  from      to
                                                               Users
                      1     5          Blocks          { id: 5,
                                                         name: “Robert”,

                                              vs
                      1     20                           from:[1,2,4],
                      2     1                            to: [1,20,3,7,2]}
                      2     5
                                                       1 disk se
                      3     4                                    ek
                                                       guarante
                      3     23                                  ed !
                                                  ny
                      3     12
                      4     5                  ma
                                           as s
                      ...   ...         lly s a
                                     tia eek
                              P  ten k s
                                o is           es!
                                   d      ”e dg
                                      “to
Saturday, May 5, 12
Advantages of doc-oriented schema
         •Avoid joins
         •Disk locality when fetching relations (everything
             is stored within a doc record)



          Considerations for schema design
        •N to Many relations == Lists
        •Denormalization is more common

Saturday, May 5, 12
Outline
    I. Schema design
        ‣    Relational vs. Document-oriented

        ‣    Schema-less design

        ‣    Case study: Publishers & Subscribers

    II. Lessons learned for schema design
    III. Things to remember about MongoDB
Saturday, May 5, 12
Schema-less design
        {id: 1, network: Twitter, name: “Robert”,
         from:[2], to: [5,20], screenName: “robertE”}

        {id: 2, network: Facebook, name:”Maria”,
         from:[23], to:[1,5], likes: [“biking”, “hiking”]}
        ...



                                                            he sche maless
                                               L ev erage t         but put
                                                   ture of Mongo,
                                               na
                                                            n with ty p e s i n
                                                 p rotectio
                                                         you r code!

Saturday, May 5, 12
Outline
    I. Schema design
        ‣    Relational vs. Document-oriented

        ‣    Schema-less design

        ‣    Case study: Publishers & Subscribers

    II. Lessons learned for schema design
    III. Things to remember about MongoDB
Saturday, May 5, 12
Read-Friendly

                      Case Study: Publishers & Subscribers




Saturday, May 5, 12
Read-Friendly Approach
                                       Hi!


                                             Hi!



                                 Hi!
       Post:
       { _id: postId,
       owner: ownerId,
       recipient: recipientId,
       text: “message”, ...}

Saturday, May 5, 12
Read-Friendly Approach
                                    db.posts.find({recipient: uid})



                                            Sharding Key:
                                                 recipient



                      Fast retrieval, easy sharding
                      Slow writes, enormous amount of storage


Saturday, May 5, 12
Write-Friendly

                      Case Study: Publishers & Subscribers




Saturday, May 5, 12
Write-Friendly Approach

                                 Hi!




        Post:
        { _id: postId,
         owner: oId,
         text: “message”, ...}

Saturday, May 5, 12
Write-Friendly Approach

                             db.posts.find({owner: {$in:user.from}})


                                            Sharding Key:
                                                   ?



                      Fast writes, slim storage
                      Slow reads, harder queries


Saturday, May 5, 12
Hybrid Approach

                      Case Study: Publishers & Subscribers




Saturday, May 5, 12
Hybrid Approach

                               Hi!




     Post:
     { _id: postId,
       owner: ownerId,
       recipients: [u1, u2, u3, u5],
       text: “message”, ...}


Saturday, May 5, 12
Hybrid Approach

                                db.posts.find({recipients: uId})


                                          Sharding Key:
                                              random :)



                        Fast writes, slim storage,
                        reasonable read speed



Saturday, May 5, 12
Random sharding is not
                     random!      t he
           Best -- Impossible for our data         ize disk
                                                nim of
                                             Mi e r
                                                  b r sha rd!
                                             num pe
                                             seeks
            Worse



           Optimal solution




Saturday, May 5, 12
Outline
    I. Schema design
    II. Lessons learned for schema design
        ‣    Indexes

        ‣    Concurrency

        ‣    Reducing collection size

    III. Things to remember about MongoDB
Saturday, May 5, 12
Outline
    I. Schema design
    II. Lessons learned for schema design
        ‣    Indexes

        ‣    Concurrency

        ‣    Reducing collection size

    III. Things to remember about MongoDB
Saturday, May 5, 12
Indexes
                                           Primary Key
                       link: {                                                                ral
                                                                                         atu e
                                                                                    a n f th
                                    _id: ObjectId(...),
                                    url: “www.jetlore.com”,
                                                                              has content”,
                                    title: “Jetlore is a search platform for social ad o
                                                                        ata ste
                                                                     r d t in             tId
                                                               you se i
                                    description: “...”
                                                                                     j ec
                                }                           If
                                                                  , u fault     Ob
                                                               PK de


                      link: {
                                 _id: “www.jetlore.com”,
                                 title: “Jetlore is a search platform for social content”,
                                 description: “...”
                            }



Saturday, May 5, 12
Indexes
              Augment your schema to enable the
                    most selective index
                                                                                       ount”
                                                                                 ik esC
                                                                         w “l
                         post: {
                                                               a ne                           ient
                                                                                                   s: 1
                                                                                                        ,
                                   _id: ObjectId(...),
                                   recipients: [...],    Add                          r ec ip
                                                                               ex ( {
                                   likes: [...],          fie ld!        r eInd
                                   likesCount: ...,              s.e nsu )
                                                                 p ost nt: -1}
                                   ...}                     db. Cou
                                                                   s
                                                             lik e


                      Want all posts that a user can view sorted by
                      the number of likes




Saturday, May 5, 12
Indexes
                      Make sure to use the proper index

                           db.posts.find({recipients: uId}).sort({date: -1})
                                                                                      ith
                                                                                   tw
                                                                               tes ()
                                                                          a y s lain
                           db.posts.ensureIndex({recipients: 1})       Alw exp
                           db.posts.ensureIndex({date: 1})



                                                   vs               date: -1
                           db.posts.ensureIndex({recipients: 1, date:1})




Saturday, May 5, 12
Outline
    I. Schema design
    II. Lessons learned for schema design
        ‣    Indexes

        ‣    Concurrency

        ‣    Reducing collection size

    III. Things to remember about MongoDB
Saturday, May 5, 12
Concurrency
                         Try to avoid “save()” in drivers
                      thread1: { _id: u1,                    thread2: { _id: u1,
                                      name: “Robert”,                        name: “Bob”,
                                      from: [u2, u3]                         from: []
                                    }                                      }

                            db.users.update({_id: thread1._id}, {$set: {thread1.from}})

                        db.users.update({_id: thread2._id}, {$set: {name: thread2.name}})


                                                      …but!
                          db.users.update({_id: u1}, {$set: {_id: u1, name: ..., }}, true, false)




Saturday, May 5, 12
Concurrency
       Atomic Commutative Operators

                               db.users.update({_id: u1}, {$pull {to: u2}})

                           db.posts.update({_id: pId}, {$inc: {likesCount: 1}})




                      When updating lists and counters, instead of
                                 using $set, rely on
                               $inc, $addToSet, $pull



Saturday, May 5, 12
Concurrency
                                No Transactions

          user1: { _id: u1,
                                          User1 wants to
                 to: [u2, u3],            unsubscribe from user2.
                 from: [...], ...}

          user2: { _id: u2,               Ideally we would update
                 to: [...],
                 from: [u1, ...], ...}
                                          both users in one
                                          transaction                  ur
                                                                    yo
                                                            ti t in
                                                         en e
                                                      lem c o d
                                                 I mp

Saturday, May 5, 12
Outline
    I. Schema design
    II. Lessons learned for schema design
        ‣    Indexes

        ‣    Concurrency

        ‣    Reducing collection size

    III. Things to remember about MongoDB
Saturday, May 5, 12
Reducing collection size
                                   Name your fields with short
                                           names!

     post: {
                      owner: ObjectId,
                      messageText: “loving Jetlore”,
                      mediaUrl: “www.jetlore.com”,
                      mediaTitle: “Jetlore is a user analytics & search platform for social content”
                }
                                                       vs
     post: {
                      o: ObjectId,
                      t: “loving Jetlore”,
                      mu: “www.jetlore.com”,
                      mt: “Jetlore is a user analytics & search platform for social content”
                }


Saturday, May 5, 12
Outline
I. Schema design
II. Lessons learned for schema design
III. Things to remember about MongoDB
     ‣   Single lock

     ‣   ($or + sort) query doesn’t use indexes properly

     ‣   Indexes with 2 list fields

     ‣   Record iterators + update
Saturday, May 5, 12
$or & sort query doesn’t use the proper
                        index
            db.posts.find({$or: [{recipients: uId}, {privacy: Public}]}).sort({date: -1})


                            db.posts.ensureIndex({recipients: 1, date: -1})

                              db.posts.ensureIndex({privacy: 1, date: -1})



                         Indexes with 2 list fields

       post: { _id: ObjectId(...),
              recipients: [...],
                                           db.posts.ensureIndex({recipients: 1, links: 1})
              links: [...],
             ... }



Saturday, May 5, 12
Record iterators +
                          updating
      var posts = db.posts.find().skip(n).limit(t)
      while (posts.hasNext()) {
        var post = posts.next()
        db.posts.update({_id: post._id}, {$set: {text: NewText}})
      }

                      Sort by a field that will not change
                         or rename the old collection

      var posts = db.posts.find().sort({date: 1}).skip(n).limit(t)

      db.posts.renameCollection(“oldPosts”)
      var posts = db.oldPosts.find().skip(n).limit(t)
      while (posts.hasNext()) {
        var post = posts.next()
        db.posts.update({_id: post._id}, {$set: {text: NewText}})
      }

Saturday, May 5, 12
The take aways

    I. What is more important?

        •      Writes: Optimize for easy inserts/updates

        •      Reads: Optimize for easy querying

    II. Denormalize to enable the most selective index

    III. Concurrency: design to leverage commutative
      operators


Saturday, May 5, 12
Thank you!
                      Try our tech


                               powered by




Saturday, May 5, 12

More Related Content

Recently uploaded

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Recently uploaded (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)

  • 1. MongoDB Schema Design: Insights and Tradeoffs Montse Medina COO, Saturday, May 5, 12
  • 2. Social content is useful in context Saturday, May 5, 12
  • 3. Social context is useful in context Saturday, May 5, 12
  • 4. Algorithms + Infrastructure Saturday, May 5, 12
  • 5. Technology Stack Apache Kafka Saturday, May 5, 12
  • 6. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDB Saturday, May 5, 12
  • 7. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDB Saturday, May 5, 12
  • 8. Relational vs. Document- oriented Users { id: 1, Users Graph name: “Robert”, from:[2], id name from to to: [5,20]} vs 1 5 1 Robert 1 20 { id: 2, 2 Monica name:”Monica”, 2 1 3 Lucas from:[23], 2 5 to:[1,5]} ... ... ... ... ... Saturday, May 5, 12
  • 9. Find all the “to” edges for user 5 Graph from to Users 1 5 Blocks { id: 5, name: “Robert”, vs 1 20 from:[1,2,4], 2 1 to: [1,20,3,7,2]} 2 5 1 disk se 3 4 ek guarante 3 23 ed ! ny 3 12 4 5 ma as s ... ... lly s a tia eek P ten k s o is es! d ”e dg “to Saturday, May 5, 12
  • 10. Advantages of doc-oriented schema •Avoid joins •Disk locality when fetching relations (everything is stored within a doc record) Considerations for schema design •N to Many relations == Lists •Denormalization is more common Saturday, May 5, 12
  • 11. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDB Saturday, May 5, 12
  • 12. Schema-less design {id: 1, network: Twitter, name: “Robert”, from:[2], to: [5,20], screenName: “robertE”} {id: 2, network: Facebook, name:”Maria”, from:[23], to:[1,5], likes: [“biking”, “hiking”]} ... he sche maless L ev erage t but put ture of Mongo, na n with ty p e s i n p rotectio you r code! Saturday, May 5, 12
  • 13. Outline I. Schema design ‣ Relational vs. Document-oriented ‣ Schema-less design ‣ Case study: Publishers & Subscribers II. Lessons learned for schema design III. Things to remember about MongoDB Saturday, May 5, 12
  • 14. Read-Friendly Case Study: Publishers & Subscribers Saturday, May 5, 12
  • 15. Read-Friendly Approach Hi! Hi! Hi! Post: { _id: postId, owner: ownerId, recipient: recipientId, text: “message”, ...} Saturday, May 5, 12
  • 16. Read-Friendly Approach db.posts.find({recipient: uid}) Sharding Key: recipient Fast retrieval, easy sharding Slow writes, enormous amount of storage Saturday, May 5, 12
  • 17. Write-Friendly Case Study: Publishers & Subscribers Saturday, May 5, 12
  • 18. Write-Friendly Approach Hi! Post: { _id: postId, owner: oId, text: “message”, ...} Saturday, May 5, 12
  • 19. Write-Friendly Approach db.posts.find({owner: {$in:user.from}}) Sharding Key: ? Fast writes, slim storage Slow reads, harder queries Saturday, May 5, 12
  • 20. Hybrid Approach Case Study: Publishers & Subscribers Saturday, May 5, 12
  • 21. Hybrid Approach Hi! Post: { _id: postId, owner: ownerId, recipients: [u1, u2, u3, u5], text: “message”, ...} Saturday, May 5, 12
  • 22. Hybrid Approach db.posts.find({recipients: uId}) Sharding Key: random :) Fast writes, slim storage, reasonable read speed Saturday, May 5, 12
  • 23. Random sharding is not random! t he Best -- Impossible for our data ize disk nim of Mi e r b r sha rd! num pe seeks Worse Optimal solution Saturday, May 5, 12
  • 24. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDB Saturday, May 5, 12
  • 25. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDB Saturday, May 5, 12
  • 26. Indexes Primary Key link: { ral atu e a n f th _id: ObjectId(...), url: “www.jetlore.com”, has content”, title: “Jetlore is a search platform for social ad o ata ste r d t in tId you se i description: “...” j ec } If , u fault Ob PK de link: { _id: “www.jetlore.com”, title: “Jetlore is a search platform for social content”, description: “...” } Saturday, May 5, 12
  • 27. Indexes Augment your schema to enable the most selective index ount” ik esC w “l post: { a ne ient s: 1 , _id: ObjectId(...), recipients: [...], Add r ec ip ex ( { likes: [...], fie ld! r eInd likesCount: ..., s.e nsu ) p ost nt: -1} ...} db. Cou s lik e Want all posts that a user can view sorted by the number of likes Saturday, May 5, 12
  • 28. Indexes Make sure to use the proper index db.posts.find({recipients: uId}).sort({date: -1}) ith tw tes () a y s lain db.posts.ensureIndex({recipients: 1}) Alw exp db.posts.ensureIndex({date: 1}) vs date: -1 db.posts.ensureIndex({recipients: 1, date:1}) Saturday, May 5, 12
  • 29. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDB Saturday, May 5, 12
  • 30. Concurrency Try to avoid “save()” in drivers thread1: { _id: u1, thread2: { _id: u1, name: “Robert”, name: “Bob”, from: [u2, u3] from: [] } } db.users.update({_id: thread1._id}, {$set: {thread1.from}}) db.users.update({_id: thread2._id}, {$set: {name: thread2.name}}) …but! db.users.update({_id: u1}, {$set: {_id: u1, name: ..., }}, true, false) Saturday, May 5, 12
  • 31. Concurrency Atomic Commutative Operators db.users.update({_id: u1}, {$pull {to: u2}}) db.posts.update({_id: pId}, {$inc: {likesCount: 1}}) When updating lists and counters, instead of using $set, rely on $inc, $addToSet, $pull Saturday, May 5, 12
  • 32. Concurrency No Transactions user1: { _id: u1, User1 wants to to: [u2, u3], unsubscribe from user2. from: [...], ...} user2: { _id: u2, Ideally we would update to: [...], from: [u1, ...], ...} both users in one transaction ur yo ti t in en e lem c o d I mp Saturday, May 5, 12
  • 33. Outline I. Schema design II. Lessons learned for schema design ‣ Indexes ‣ Concurrency ‣ Reducing collection size III. Things to remember about MongoDB Saturday, May 5, 12
  • 34. Reducing collection size Name your fields with short names! post: { owner: ObjectId, messageText: “loving Jetlore”, mediaUrl: “www.jetlore.com”, mediaTitle: “Jetlore is a user analytics & search platform for social content” } vs post: { o: ObjectId, t: “loving Jetlore”, mu: “www.jetlore.com”, mt: “Jetlore is a user analytics & search platform for social content” } Saturday, May 5, 12
  • 35. Outline I. Schema design II. Lessons learned for schema design III. Things to remember about MongoDB ‣ Single lock ‣ ($or + sort) query doesn’t use indexes properly ‣ Indexes with 2 list fields ‣ Record iterators + update Saturday, May 5, 12
  • 36. $or & sort query doesn’t use the proper index db.posts.find({$or: [{recipients: uId}, {privacy: Public}]}).sort({date: -1}) db.posts.ensureIndex({recipients: 1, date: -1}) db.posts.ensureIndex({privacy: 1, date: -1}) Indexes with 2 list fields post: { _id: ObjectId(...), recipients: [...], db.posts.ensureIndex({recipients: 1, links: 1}) links: [...], ... } Saturday, May 5, 12
  • 37. Record iterators + updating var posts = db.posts.find().skip(n).limit(t) while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}}) } Sort by a field that will not change or rename the old collection var posts = db.posts.find().sort({date: 1}).skip(n).limit(t) db.posts.renameCollection(“oldPosts”) var posts = db.oldPosts.find().skip(n).limit(t) while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}}) } Saturday, May 5, 12
  • 38. The take aways I. What is more important? • Writes: Optimize for easy inserts/updates • Reads: Optimize for easy querying II. Denormalize to enable the most selective index III. Concurrency: design to leverage commutative operators Saturday, May 5, 12
  • 39. Thank you! Try our tech powered by Saturday, May 5, 12