A presentation for the MongoDB WICS Summit.
If you're comfortable with relational databases but not so comfortable with document databases like MongoDB, this is the session for you! We'll walk through an example of how you would store the same data in both a relational database and a document database. Then we'll discuss how common relational database terms and concepts map to terms in document databases. You'll leave feeling comfortable with the basics of how to store data in document databases.
55. Additional
Resources
• SQL to MongoDB Blog (blog series)
• Quick Start: MongoDB and Node.js (blog series)
• Advanced Schema Design Patterns (webinar)
• Building with Patterns: A Summary (blog series)
• M320: Data Modeling (MongoDB University Course)
• JSON Schema Validation – Locking down your model
the smart way (blog)
@Lauren_Schaefer
Hey everyone! I don’t know about you, but the pandemic has me eating a lot of comfort food and watching a lot of comfort tv lately. My favorite comfort tv is the show Parks and Recreation. One of the things that makes the show so amazing is the characters. I want to introduce you to Ron Swanson.
Ron is an old school guy. He’s a bit set in his ways, he likes his privacy, and he likes to stay off the grid.
Now in Season 6, episode 14, Ron discovers Yelp. He loves the idea of being able to review places he’s been. However, Yelp is way too “on-the-grid” for Ron. So Ron uses the Internet to look up the physical addresses of where he wants to snail-mail his reviews and then pulls out his big old typewriter to start typing his reviews.
Ron writes some pretty great reviews. Here is one of my favorites.
Dear Frozen Yoguart,
You are the celery of desserts
http://www.breathtakingandinappropriate.com/2014/09/ron-swanson-has-no-time-for-frozen.html
Be ice cream or be nothing
http://www.breathtakingandinappropriate.com/2014/09/ron-swanson-has-no-time-for-frozen.html
Zero stars
http://www.breathtakingandinappropriate.com/2014/09/ron-swanson-has-no-time-for-frozen.html
Now this is a pretty great review. But I see 3 problems with his approach.
Snail mail is way slower than posting the review to yelp where it will be instantly available
The businesses he’s reviewing may never open the review because they may just assume it’s junk mail
No one else will benefit from the review. I don’t know about you, but I live for these kinds of reviews online.
Ron was inspired by Yelp and he saw the value in the technology, but he didn’t change his old school mindset in order to really get the value out of it.
This is what we see sometimes as people move from tabular/relational databases to document databases like MongoDB. People see the value of document databases and are inspired by the technology, but they bring with them their old mindsets, so they don’t get the full value of document databases.
I don’t want this to happen to you. I want to see you be really successful as you work with document databases.
Before we dive in, let me introduce myself. My name is Lauren Schaefer. I took a database course in college that was all about best practices for relational databases. I began my career at IBM where I spent 8 years as a software engineer. For a lot of that time I used DB2. Toward the end of my time there, my team started getting flexibility in what database we wanted to use, so I started trying out NoSQL databases. To be honest, I didn’t really get the hype. Without a doubt, NoSQL databases were easy to get started using. But I brought with me my relational database mindset, so I kept thinking about my data in rows and columns even though I wasn’t using rows and columns anymore. It worked, but it wasn’t great. I joined MongoDB about a year and a half ago, and I’ve worked through the process of changing my mindset in how I think about storing data. And I’m so happy I did, because it’s really easy to work with data in the apps that I build now. Today, I’m going to share with you what I’ve learned as I’ve gone on the journey of moving from tables to documents.
I’d love to connect with you on social. You can find me on both Twitter and TikTok with the handle Lauren_Schaefer. I don’t really know what I’m doing on TikTok, but I’m having a lot of fun doing it.
Today we’re going to be talking about moving from tables to documents. When I say tables, I mean a tabular or relational database. For example, you might have experience using MySQL or Oracle. We’ll dive into what document databases are in just a few minutes.
If you’re here, I’m going to assume
You have experience with relational databases
Minimal to no experience with document databases
Today, we’re going to go on a mental journey from tables to documents.
I’m going to kick things off by working through an example of how you would model the same data in both tables and documents.
Then I’ll map the terms and concepts you’re familiar with in tabular databases to similar terms and concepts in document databases.
Then we’ll wrap up with some Q&A. I’m happy to answer questions about what I’m talking about here today, what life is like here at MongoDB, how I got to where I am in my career, how I balance being a working mom, or whatever else is on your mind.
Let’s jump right in
Let’s talk about documents.
[CLICK] No, I’m not talking about Word documents.
I’m talking about JSON documents. JSON stands for JavaScript Object Notation.
If you’ve used any of the C-family of programming languages such as C, C#, Go, Java, JavaScript, PHP, or Python, documents will probably feel pretty comfortable to you.
Documents typically store information about one object as well as any information related to that object.
Every document begins and ends with curly braces.
Inside of those curly braces are field/value pairs. The great thing about documents is that they can be incredibly rich. Values can be a variety of types including strings, numbers, arrays, dates, timestamps, or even objects. So you can have objects within a document. You’ll see what that looks like in just a moment.
When people talk about document databases, they’ll often use the term nonrelational. But that doesn’t mean document databases don’t store relationships. That was a double negative. Stick with me. Document databases store relationships really well – it’s just different than the way relational databases do.
Let’s walk through an example of how you would model the same data in a relational, tabular database vs a document database.
Let's say we need to store information about a user named Leslie.
Let’s begin with her contact information. In a relational database, we'll create a table named Users. We can create columns for each piece of contact information we need to store: first name, last name, cell phone number, and city. To ensure we have a unique way to identify each row, we'll include an ID column.
Now let's store that same information in a document. We can create a new document for Leslie where we'll add field/value pairs for each piece of contact information we need to store. As you can see we have field/value pairs for first name, last name, cell and city. We'll use _id to uniquely identify each document. We'll store this document in a collection named Users.
Now that we've stored Leslie's contact information, let's store the coordinates of her current location.
When using a relational database, we'll need to split the latitude and longitude between two columns.
Document databases support arrays, so we can store the latitude and longitude together in a single field. We’ll call that field location.
We're successfully storing Leslie's contact information and current location. Now let's store her hobbies.
When using a relational database, we could choose to add more columns to the Users table.
However, since a single user could have many hobbies (meaning we need to represent a one-to-many relationship), we're more likely to create a separate table just for hobbies. Each row in the table will contain information about one hobby for one user. When we need to retrieve Leslie's hobbies, we'll join the Users table and our new Hobbies table.
Since document databases support arrays, we can simply add a new field named "hobbies" to our existing document for Leslie. The array can contain as many or as few hobbies as we need. When we need to retrieve Leslie's hobbies, we don't need to do an expensive join to bring the data together; we can simply retrieve her document in the Users collection.
Let's say we also need to store Leslie's job history.
Just as we did with hobbies, we're likely to create a separate table just for job history information. Each row in the table will contain information about one job for one user.
So far, we've used arrays to store geolocation data and a list of Strings. Arrays can contain values of any type, including objects.
Let's create an object for each job Leslie has held and store those objects in an array. As you can see we have a job history field that stores an array. Inside of that array, we have an object for when she was the Deputy Director, an object for when she was a City Councillor, and an object for when she was director of the National Parks Service’s Midwest branch.
Now that we've decided how we'll store information about our users in both tables and documents, let's store information about Ron. Ron will have almost all of the same information as Leslie. However, Ron does his best to stay off the grid, so he will not be storing his location in the system.
Let's begin by examining how we would store Ron's information in the same tables that we used for Leslie's. When using a relational database, we are required to input a value for every cell in the table. We will represent Ron's lack of location data with NULL. The problem with using NULL is that it's unclear whether the data does not exist or if the data is just unknown, so many people discourage the use of NULL.
In document databases, we have the option of representing Ron's lack of location data in two ways: we can omit the location field from the document or we can set location to null. Best practices suggest that we omit the location field to save space. You can choose if you want omitted fields and fields set to null to represent different things in your applications.
Ron has some hobbies and job history, so we’ll add his information to those tables.
And we can add that information to his document as well. The structure of Ron’s document looks pretty similar to Leslie’s.
Let's say we are feeling pretty good about our data models and decide to launch our apps using them.
Then we discover we need to store information about a new user: Lauren Burhug. She's a fourth grade student who Ron teaches about government. We need to store a lot of the same information about Lauren as we did with Leslie and Ron: her first name, last name, city, and hobbies. However, Lauren doesn't have a cell phone, location data, or job history. We also discover that we need to store a new piece of information: her school.
Let's begin by storing Lauren's information in the tables as they already exist.
We can create a new document for Lauren and include the data we have for her in it.
Now let’s talk about how to store information about Lauren’s school in our tables. We have two options. We can choose to add a column to the existing Users table, or we can create a new table named Schools.
Let's say we choose to add a column named "school" to the Users table. Depending on our access rights to the database, we may need to talk to the DBA and convince them to add the column.
Maybe we have to do a little begging to get our DBA to add the column. Maybe we have to do a little bribing – maybe we bring our DBA their favorite donut. Or maybe we bring our manager along to pressure the DBA into agreeing.
If our DBA agrees, the database will likely need to be taken down, the "school" column will need to be added, NULL values will be stored in every row in the Users table where a user does not have a school, and the database will need to be brought back up. It’s doable but it can be a little painful.
Now let’s talk about how to store Lauren’s school in documents.
We can simply add a new field named "school" to Lauren's document. We do not need to make any modifications to Leslie's document or Ron's document when we add the new "school" field to Lauren's document. Document databases have a flexible schema, so every document in a collection does not need to have the same fields.
Some of you might be starting to panic at the idea of a flexible schema. (I know I started to panic a little when I was introduced to the idea.)
http://gph.is/28MrIOY
Don't panic! This flexibility can be hugely valuable as your application's requirements evolve and change.
Also, some document databases like MongoDB provide schema validation so you can lock down your schema as much or as little as you’d like when you’re ready.
http://gph.is/28MrIOY
Now that we’re starting to get the idea of how tables and documents are similar and different, let’s do some explicit term mapping. On the left side of the screen you’ll see tabular or relational database terms and on the right side of the screen you’ll see document database terms.
First up, we saw this a bit in our earlier example: a row maps to a document
Or, depending on how you’ve normalized your data, rows from multiple tables could map to a single document.
A column maps roughly to a field.
In a relational database, groups of rows are stored in tables.
In a document database, groups of documents are stored in collections.
So tables map to collections.
The next few terms will probably feel pretty comfortable to those of you with relational database backgrounds as the terminology is basically the same between the two.
Just like you store groups of tables in a relational database, you store groups of collections in a document database.
Indexes are fairly similar between the two. Indexes help speed up your read queries.
Views are pretty similar in both.
There are a few different ways to handle joins in document databases.
The general recommendation is that, if you have related information that you would put in a separate table in a relational database, you should embed that information in a single document when working in a document database. The rule of thumb is that data that is accessed together should be stored together. Let me say that again: data this accessed together should be stored together. So, if you’ll be frequently accessing information together that you would have put in separate tables, you should likely just embed it in a document.
Depending on the document database you’re using, there are other options for joins as well. MongoDB supports references between the documents similar to how you would use a foreign key. MongoDB also has an operation called $lookup to support a left outer join. I’m not going to go any deeper into those today, but I want you to know that the options exist.
Finally, let’s talk about ACID transactions. Transactions group database operations together so they either all succeed or none succeed. If you did some research online about relational databases vs document databases before coming here today, you probably saw something about document databases not supporting transactions as a major drawback. If you care about data integrity—and really, who doesn’t?—that’s a pretty scary sounding drawback.
Some document databases support ACID transactions while others do not. In relational databases, we call these multi-record ACID transactions. In document databases, we call these multi-document ACID transactions.
However, when you model your data for document databases, you’ll find that most of the time you don’t actually need to use a transaction. So don’t get freaked out if you’re looking at drawbacks of document databases and see “no transactions” listed. MongoDB supports transactions, but chances are good that you won’t actually need them.
To wrap up this section, I created this term mapping summary for you. It’s way too much information for you to read now. But you can take a screenshot and tweet it. Or you can print it and hang it up at your desk. Whatever you need to do.
The first three are the most important
A row maps to a document
A column maps to a field
A table maps to a collection
If I had to sum up this presentation in one idea, I would say this. Don’t be Ron Swanson--in this particular case because Ron Swanson is amazing in so many other ways—but don’t be Ron Swanson
http://gph.is/XK6p3t
Change your mindset and get the full value of document databases
http://gph.is/XK6p3t
At the end of these slides, I’ve included a list of additional resources. The top resource listed here is a blog series that I wrote that has content very similar to what I presented today. If later you want to reference the material I covered, that’s a great place to start.
One of the best ways to learn a new technology is by trying it out. MongoDB Atlas is a fully-managed document database service. It has a perpetually free tier that is great for learning if you’d like to try it out. If you hop back a slide, I have a link to a Quick Start blog series that will walk you through trying it out.
If you’d like to get a copy of my slides, check out my Twitter page. I’m Lauren_Schaefer.
Now I want to open the floor for Q&A. You should be able to ask questions in the chat. I’m happy to answer questions about what I talked about today. I’m also happy to answer any other questions you may have like what life is like here at MongoDB or how I got to where I am in my career. I’m married with a 4 year old, so I’m also happy to talk about how I balance being a working mom. Please ask whatever is on your mind.
http://gph.is/XK6p3t