Unlike relational databases, document databases like CouchDB and MongoDB do not directly support entity relationships. This talk will explore patterns of modeling one-to-many and many-to-many entity relationships in a document database. These patterns include using an embedded JSON array, relating documents using identifiers, using a list of keys, and using relationship documents. This talk will explore how these entity relationship patterns equate to how entities are joined in a relational database. We'll take a look at the relevant differences between document databases and relational databases. For example, document databases do not have tables, each document can have its own schema, there is no built-in concept of relationships between documents, views/indexes are queried directly instead of being used to optimize more generalized queries, a column within a result set can contain a mix of logical data types, and there is typically no support for transactions across document boundaries.
2. When to Choose a
Document Database
You’re using a relational database, but have been relying
heavily on denormalization to optimize read performance
You would like to give up consistency in exchange for a
high level of concurrency
Your data model is a “fit” for documents (e.g. a CMS)
3. When Not to Choose a
Document Database
Your data fits better in a relational model—SQL is a powerful
and mature language for working with relational data sets
Consistency is critical to your application
You haven’t bothered exploring scalability options for
your current database
4. Incremental Map/Reduce
"How fucked is my NoSQL database?" howfuckedismydatabase.com. 2009. http://howfuckedismydatabase.com/nosql/ (24 October 2012).
7. SQL Query Joining
Publishers and Books
SELECT
`publisher`.`id`,
`publisher`.`name`,
`book`.`title`
FROM `publisher`
FULL OUTER JOIN `book`
ON `publisher`.`id` = `book`.`publisher_id`
ORDER BY
`publisher`.`id`,
`book`.`title`;
8. Joined Result Set
Publisher (“left”) Book “right”
publisher.id publisher.name book.title
Building iPhone Apps with
oreilly O'Reilly Media
HTML, CSS, and JavaScript
CouchDB: The Definitive
oreilly O'Reilly Media
Guide
DocBook: The Definitive
oreilly O'Reilly Media
Guide
oreilly O'Reilly Media RESTful Web Services
9. Collated Result Set
key id value
["oreilly",0] "oreilly" "O'Reilly Media" Publisher
"Building iPhone Apps with
["oreilly",1] "oreilly"
HTML, CSS, and JavaScript"
"CouchDB: The Definitive
["oreilly",1] "oreilly"
Guide"
Books
"DocBook: The Definitive
["oreilly",1] "oreilly"
Guide"
["oreilly",1] "oreilly" "RESTful Web Services"
10. View Result Sets
Made up of columns and rows
Every row has the same three columns:
• key
• id
• value
Columns can contain a mixture of logical data types
13. Embedded Entities
A single document represents the “one” entity
Nested entities (JSON Array) represents the “many” entities
Simplest way to create a one to many relationship
14. Example: Publisher
with Nested Books
{
"_id":"oreilly",
"collection":"publisher",
"name":"O'Reilly Media",
"books":[
{ "title":"CouchDB: The Definitive Guide" },
{ "title":"RESTful Web Services" },
{ "title":"DocBook: The Definitive Guide" },
{ "title":"Building iPhone Apps with HTML, CSS,
and JavaScript" }
]
}
15. Map Function
function(doc) {
if ("publisher" == doc.collection) {
emit([doc._id, 0], doc.name);
for (var i in doc.books) {
emit([doc._id, 1], doc.books[i].title);
}
}
}
16. Result Set
key id value
["oreilly",0] "oreilly" "O'Reilly Media"
"Building iPhone Apps with
["oreilly",1] "oreilly"
HTML, CSS, and JavaScript"
"CouchDB: The Definitive
["oreilly",1] "oreilly"
Guide"
"DocBook: The Definitive
["oreilly",1] "oreilly"
Guide"
["oreilly",1] "oreilly" "RESTful Web Services"
17. Limitations
Only works if there aren’t a large number of related entities:
• Too many nested entities can result in very large documents
• Slow to transfer between client and server
• Unwieldy to modify
• Time-consuming to index
19. Related Documents
A document representing the “one” entity
Separate documents for each “many” entity
Each “many” entity references its related
“one” entity by the “one” entity’s document identifier
Makes for smaller documents
Reduces the probability of document update conflicts
21. Example: Related Book
{
"_id":"9780596155896",
"collection":"book",
"title":"CouchDB: The Definitive Guide",
"publisher":"oreilly"
}
22. Map Function
function(doc) {
if ("publisher" == doc.collection) {
emit([doc._id, 0], doc.name);
}
if ("book" == doc.collection) {
emit([doc.publisher, 1], doc.title);
}
}
23. Result Set
key id value
["oreilly",0] "oreilly" "O'Reilly Media"
"CouchDB: The Definitive
["oreilly",1] "9780596155896"
Guide"
["oreilly",1] "9780596529260" "RESTful Web Services"
"Building iPhone Apps with
["oreilly",1] "9780596805791"
HTML, CSS, and JavaScript"
"DocBook: The Definitive
["oreilly",1] "9781565925809"
Guide"
24. Limitations
When retrieving the entity on the “right” side of the relationship,
one cannot include any data from the entity on the “left” side of
the relationship without the use of an additional query
Only works for one to many relationships
27. List of Keys
A document representing each “many” entity on the “left” side
of the relationship
Separate documents for each “many” entity on the “right” side
of the relationship
Each “many” entity on the “right” side of the relationship
maintains a list of document identifiers for its related “many”
entities on the “left” side of the relationship
37. Map Function
function(doc) {
if ("author" == doc.collection) {
emit([doc._id, 0], doc.name);
for (var i in doc.books) {
emit([doc._id, 1], {"_id":doc.books[i]});
}
}
}
38. Result Set
key id value
["muellner",0] "muellner" "Leonard Muellner"
["muellner",1] "muellner" {"_id":"9781565925809"}
["walsh",0] "walsh" "Norman Walsh"
["walsh",1] "walsh" {"_id":"9780596805029"}
["walsh",1] "walsh" {"_id":"9781565920514"}
["walsh",1] "walsh" {"_id":"9781565925809"}
39. Including Docs
include_docs=true
key id value doc (truncated)
["muellner",0] "muellner" … {"name":"Leonard Muellner"}
["muellner",1] "muellner" … {"title":"DocBook: The Definitive Guide"}
["walsh",0] "walsh" … {"name":"Norman Walsh"}
["walsh",1] "walsh" … {"title":"DocBook 5: The Definitive Guide"}
["walsh",1] "walsh" … {"title":"Making TeX Work"}
["walsh",1] "walsh" … {"title":"DocBook: The Definitive Guide"}
45. Example: Book
{
"_id":"9781565925809",
"collection":"book",
"title":"DocBook: The Definitive Guide",
"authors":[
"muellner",
"walsh"
]
}
46. Map Function
function(doc) {
if ("author" == doc.collection) {
emit([doc._id, 0], doc.name);
}
if ("book" == doc.collection) {
for (var i in doc.authors) {
emit([doc.authors[i], 1], doc.title);
}
}
}
47. Result Set
key id value
["muellner",0] "muellner" "Leonard Muellner"
["muellner",1] "9781565925809" "DocBook: The Definitive Guide"
["walsh",0] "walsh" "Norman Walsh"
["walsh",1] "9780596805029" "DocBook 5: The Definitive Guide"
["walsh",1] "9781565920514" "Making TeX Work"
["walsh",1] "9781565925809" "DocBook: The Definitive Guide"
48. Limitations
Queries from the “right” side of the relationship cannot include
any data from entities on the “left” side of the relationship
(without the use of include_docs)
A document representing an entity with lots of relationships
could become quite large
50. Relationship Documents
A document representing each “many” entity on the “left” side
of the relationship
Separate documents for each “many” entity on the “right” side
of the relationship
Neither the “left” nor “right” side of the relationship contain any
direct references to each other
For each distinct relationship, a separate document includes the
document identifiers for both the “left” and “right” sides of the
relationship
51. Example: Book
{
"_id":"9780596805029",
"collection":"book",
"title":"DocBook 5: The Definitive Guide"
}
52. Example: Book
{
"_id":"9781565920514",
"collection":"book",
"title":"Making TeX Work"
}
53. Example: Book
{
"_id":"9781565925809",
"collection":"book",
"title":"DocBook: The Definitive Guide"
}
65. Map Function
function(doc) {
if ("author" == doc.collection) {
emit([doc._id, 0], doc.name);
}
if ("book-author" == doc.collection) {
emit([doc.author, 1], {"_id":doc.book});
}
}
66. Result Set
key id value
["muellner",0] "muellner" "Leonard Muellner"
["muellner",1] "44006720" {"_id":"9781565925809"}
["walsh",0] "walsh" "Norman Walsh"
["walsh",1] "44005f2c" {"_id":"9780596805029"}
["walsh",1] "44005f72" {"_id":"9781565920514"}
["walsh",1] "44006b0d" {"_id":"9781565925809"}
67. Including Docs
include_docs=true
key id value doc (truncated)
["muellner",0] … … {"name":"Leonard Muellner"}
["muellner",1] … … {"title":"DocBook: The Definitive Guide"}
["walsh",0] … … {"name":"Norman Walsh"}
["walsh",1] … … {"title":"DocBook 5: The Definitive Guide"}
["walsh",1] … … {"title":"Making TeX Work"}
["walsh",1] … … {"title":"DocBook: The Definitive Guide"}
68. Limitations
Queries can only contain data from the “left” or “right” side of the
relationship (without the use of include_docs)
Maintaining relationship documents may require more work
71. Features
Includes a CouchDB client library and ODM
Maps documents using Doctrine’s persistence semantics
Maps CouchDB views to PHP objects
Document conflict resolution support
Includes a write-behind feature for increased performance
73. Persisting an Entity[1]
$blogPost = new BlogPost();
$blogPost->setHeadline("Hello World!");
$blogPost->setText("This is a blog post going to
be saved into CouchDB");
$blogPost->setPublishDate(new DateTime("now"));
$dm->persist($blogPost);
$dm->flush();
1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#architecture
74. Querying an Entity[1]
// $dm is an instance of DoctrineODMCouchDB
DocumentManager
$blogPost = $dm->find("MyAppDocumentBlogPost",
$theUUID);
1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#querying
88. Document Databases Compared
to Relational Databases
Document databases have no tables (and therefore no columns)
Indexes (views) are queried directly, instead of being used to
optimize more generalized queries
Result set columns can contain a mix of logical data types
No built-in concept of relationships between documents
Related entities can be embedded in a document, referenced from
a document, or both
89. Caveats
No referential integrity
No atomic transactions across document boundaries
Some patterns may involve denormalized (i.e. redundant) data
Data inconsistencies are inevitable (i.e. eventual consistency)
Consider the implications of replication—what may seem
consistent with one database may not be consistent across nodes
(e.g. referencing entities that don’t yet exist on the node)
90. Additional Techniques
Use the startkey and endkey parameters to retrieve one entity and
its related entities:
startkey=["9781565925809"]&endkey=["9781565925809",{}]
Define a reduce function and use grouping levels
Use UUIDs rather than natural keys for better performance
Use the bulk document API when writing Relationship Documents
When using the List of Keys or Relationship Documents patterns,
denormalize data so that you can have data from the “right” and
“left” side of the relationship within your query results
91. Cheat Sheet
Embedded Related Relationship
List of Keys
Entities Documents Documents
One to Many ✓ ✓
Many to Many ✓ ✓
<= N* Relations ✓ ✓
> N* Relations ✓ ✓
* where N is a large number for your system