2. About Me
Me: Bioinformatics, help scientists with big
data
Lots of data in lots of formats
Ruby and Ruby on Rails
But that doesn’t matter for CouchDB!
Initially interested in CouchDB for AWS
deployment
2
6. Key-Value Databases
Datastore of values indexed
by keys (duh!)
Must provide the ID for all
operations
Hash or B-Tree
Hash is FAST, but only allows
single-value lookups
B-Tree is slower, but allows
range queries
Horizontally scalable
5
7. CouchDB
Schema free, document oriented database
Javascript Object Notation (JSON)
HTTP protocol using REST operations
No direct native language drivers *
Javascript is the lingua franca
ACID & MVCC guarantees on a per-
document basis
Map-Reduce model for indexing and views
Back-ups and replication are simple
6
8. CouchDB
Schema free, document oriented database
Javascript Object Notation (JSON)
HTTP protocol using REST operations
No direct native language drivers *
Javascript is the lingua franca
ACID & MVCC guarantees on a per-
document basis
Map-Reduce model for indexing and views
Back-ups and replication are simple
* Hovercraft: http://github.com/jchris/hovercraft/
6
13. REST
Representational State Transfer
Clients-Server separation with uniform
interface
Load-balancing, caching, authorization & authentication,
proxies
Stateless - client is responsible for creating a self-
sufficient request
7
14. REST
Representational State Transfer
Clients-Server separation with uniform
interface
Load-balancing, caching, authorization & authentication,
proxies
Stateless - client is responsible for creating a self-
sufficient request
Resources are cacheable - servers must mark
non-cacheable resources as such
7
15. REST
Representational State Transfer
Clients-Server separation with uniform
interface
Load-balancing, caching, authorization & authentication,
proxies
Stateless - client is responsible for creating a self-
sufficient request
Resources are cacheable - servers must mark
non-cacheable resources as such
Only 5 HTTP verbs
7
16. REST
Representational State Transfer
Clients-Server separation with uniform
interface
Load-balancing, caching, authorization & authentication,
proxies
Stateless - client is responsible for creating a self-
sufficient request
Resources are cacheable - servers must mark
non-cacheable resources as such
Only 5 HTTP verbs
GET, PUT, POST, DELETE, HEAD
7
17. CouchDB
REST/CRUD
GET read
PUT create or update
DELETE delete something
POST bulk operations
8
18. CouchDB assumes:
Each document is completely independent
and should be self-sufficient
An operation on a document is ACID
compliant
Operations across documents are not ACID
Built for distributed applications
You can live with slightly stale data being
served to clients
9
19. MVCC Row/Table Lock CouchDB
Multi-Version
Concurrency Control
RDBMS enforces consistency
using read/write locks
Instead of locks, CouchDB
just serve up old data
Multi-document (mutli-row)
transactional semantics
must be handled by the
application
10
20. Database API
$ curl -X PUT http://127.0.0.1:5984/friendbook
{"ok":true}
Try it Again: {"error":"db_exists"}
11
21. Database API
Protocol
$ curl -X PUT http://127.0.0.1:5984/friendbook
{"ok":true}
Try it Again: {"error":"db_exists"}
11
22. Database API
CouchDB server
$ curl -X PUT http://127.0.0.1:5984/friendbook
{"ok":true}
Try it Again: {"error":"db_exists"}
11
23. Database API
DB name
$ curl -X PUT http://127.0.0.1:5984/friendbook
{"ok":true}
Try it Again: {"error":"db_exists"}
11
24. Database API
$ curl -X PUT http://127.0.0.1:5984/friendbook
{"ok":true}
Try it Again: {"error":"db_exists"}
$ curl -X DELETE http://localhost:5984/friendbook
{"ok":true}
11
25. Backups & Replication
Backup: simply copy the database files
Replicate: send a POST request with a source and
target database
Source and target DB’s can either be local (just the db
name) or remote (full URL)
“continous”: true option will register the target to
the source’s _changes notification API
$ curl -X POST http://localhost:5984/_replicate
-d '{"source":"db", "target":"db-replica",
"continuous":true}'
12
26. Backups & Replication
Backup: simply copy the database files
Replicate: send a POST request with a source and
target database
Source and target DB’s can either be local (just the db
name) or remote (full URL)
“continous”: true option will register the target to
the source’s _changes notification API
$ curl -X POST http://localhost:5984/_replicate
-d '{"source":"db", "target":"db-replica",
"continuous":true}'
Takes up a port
12
27. Inserting a document
$ curl -X PUT http://localhost:5984/friendbook/j_doe
-d @j_doe.json
{"ok":true,
"id":"j_doe",
"rev":"1-062af1c4ac73287b7e07396c86243432"}
13
28. Inserting a document
$ curl -X PUT http://localhost:5984/friendbook/j_doe
-d @j_doe.json
{"ok":true,
"id":"j_doe",
"rev":"1-062af1c4ac73287b7e07396c86243432"}
CouchDB can provide you with unique IDs:
$ curl -X GET http://localhost:5984/_uuids
{"uuids":["d1dde0996a4db7c1ebc78fb89c01b9e6"]}
$ curl -X GET http://localhost:5984/_uuids?count=10
*POSTing a new document to the database URL will auto-generate a UUID for the ID
13
34. Update is a full write
http://horicky.blogspot.com/2009/11/nosql-patterns.html
16
35. Deleting a document
DELETE requires the revision as URL parameter or the E-Tag
HTTP header.
$ curl -X DELETE http://localhost:5984/friendbook/j_doe?
rev=2-0629239b53a8d146a3a3c4c63e2dbfd0
{"ok":true,"id":"j_doe",
"rev":"3-57673a4b7b662bb916cc374a92318c6b"}
Returns a revision number for the delete, used
for synchronization and the changes API
$ curl -X GET http://localhost:5984/friendbook/j_doe
{"error":"not_found","reason":"deleted"}
17
36. Notables
MVCC != version control system
POST to /db/_compact deletes all older vesions
Deletes only keep metadata around for
synchronization and merge conflict resolution
To “roll back a transaction” you must:
Retrieve all related records, cache these
Insert any updates to records.
On failure, use the returned revision numbers to
re-insert the older record as a new one
18
39. Our Example Problem
Hello world? Blog? Twitter clone?
Let’s store all human proteins instead
19
40. Our Example Problem
Hello world? Blog? Twitter clone?
Let’s store all human proteins instead
LOCUS YP_003024029 227 aa linear PRI 09-JUL-2009
DEFINITION cytochrome c oxidase subunit II [Homo sapiens].
ACCESSION YP_003024029
VERSION YP_003024029.1 GI:251831110
DBLINK Project:30353
DBSOURCE REFSEQ: accession NC_012920.1
KEYWORDS .
SOURCE mitochondrion Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
19
41. Our Example Problem
Hello world? Blog? Twitter clone?
Let’s store all human proteins instead
LOCUS YP_003024029 227 aa linear PRI 09-JUL-2009
DEFINITION cytochrome c oxidase subunit II [Homo sapiens].
ACCESSION YP_003024029
VERSION YP_003024029.1 GI:251831110
DBLINK Project:30353
FEATURES
DBSOURCE REFSEQ: accession NC_012920.1 Location/Qualifiers
KEYWORDS . source 1..227
SOURCE /organism="Homo sapiens"
mitochondrion Homo sapiens (human)
ORGANISM Homo sapiens /organelle="mitochondrion"
/isolation_source="caucasian"
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
/db_xref="taxon:9606"
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo./tissue_type="placenta"
/country="United Kingdom: Great Britain"
/note="this is the rCRS"
Protein 1..227
/product="cytochrome c oxidase subunit II"
/calculated_mol_wt=25434
19 http://www.ncbi.nlm.nih.gov/
42. Our Example Problem
Hello world? Blog? Twitter clone?
Let’s store all human proteins instead
LOCUS YP_003024029 227 aa linear PRI 09-JUL-2009
DEFINITION cytochrome c oxidase subunit II [Homo sapiens].
ACCESSION YP_003024029
VERSION YP_003024029.1 GI:251831110
DBLINK Project:30353
FEATURES
DBSOURCE REFSEQ: accession NC_012920.1 Location/Qualifiers
KEYWORDS . source 1..227
SOURCE /organism="Homo sapiens"
mitochondrion Homo sapiens (human)
ORGANISM Homo sapiens /organelle="mitochondrion"
/isolation_source="caucasian"
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
/db_xref="taxon:9606"
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo./tissue_type="placenta"
/country="United Kingdom: Great Britain"
/note="this is the rCRS"
Protein 1..227
/product="cytochrome c oxidase subunit II"
/calculated_mol_wt=25434
19 http://www.ncbi.nlm.nih.gov/
46. Futon : A Couchapp
This one is
going to be
a bit tougher
20
47. Design Documents
The key to using CouchDB as more than a
key-value store
Just another JSON document,
Contains javascript functions
Functions are executed within CouchDB
Map-reduce views, data validation,
alternate formatting, ...
JS libraries & data (PNG images)
21
48. {
"_id" : "_design/gb",
"language" : "javascript",
"views" : {
"gi" : {
"map" : "function(doc) { emit(doc.gi, doc._id) }"
},
"dbXref" : {
"map" : "function(doc) {
var ftLen= doc.features.length;
for ( var i=0; i < ftLen; i++ ) {
var ft = doc.features[i];
var qLen = ft.qualifiers.length;
for (var j = 0; j < qLen; j++) {
var ql = ft.qualifiers[j];
if (ql.qualifier.match('db_xref') ) {
emit(ql.value, doc._id);
}
}
}
}"
}
55. GET by the indexed key
GET /refseq_human/_design/gb/_view/dbXref?key="GeneID:10"
{ "total_rows":7,"offset":2,
"rows":[
{ "id":"NP_000006",
"key":"GeneID:10",
"value":"NP_000006"
}
]
}
26
57. ReReduce
Map function: Reduce function:
function(keys,values,rereduce) {
function(doc) {
if (rereduce){
if(doc.foodz){
return sum(values)
doc.foodz.forEach(
} else {
function(food) {
return values.length
emit(food,1);
}
})}}
}
Same result, but this let’s us put in some useful value in the map, as opposed 1 repeated ad
nauseam
Could also output null to save space since indexes store the emitted values
28
58. ReReduce
Map function: Reduce function:
function(keys,values,rereduce) {
function(doc) {
if (rereduce){
if(doc.foodz){
return sum(values)
doc.foodz.forEach(
} else {
function(food) {
return values.length
emit(food,1);
}
})}}
} true / false
Same result, but this let’s us put in some useful value in the map, as opposed 1 repeated ad
nauseam
Could also output null to save space since indexes store the emitted values
28
59. “Joins”
A reduce function could create a virtual doc by collating
different doc types, but I don’t recommend it
Map function:
function(doc) {
if (doc.type == "post") {
map([doc._id, 0], doc);
} else if (doc.type == "comment") {
map([doc.post, doc.created_at], doc);
}
}
29
61. The CAP theory : applies when business
logic is separate from storage
Consistency vs. Availability
vs. Partition tolerance
RDBMS = enforced
consistency
PAXOS = quorum
consistency
CouchDB (and others) =
eventual consistency
and horizontally
scalable
http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
31
62. Considerations
Server & Data replication
Load balancing and fail-over
Data partitioning and distribution
Query distribution and results collation
32
74. Data Partitioning
Partition data using URI
components C
CouchDB-Lounge’s A
dumbproxy module
nginx module
HAProxy URI B
http://tv.com/shows/1234 A
36
75. Data Partitioning
Partition data using URI
components C
CouchDB-Lounge’s A
dumbproxy module
nginx module
HAProxy URI B
http://tv.com/shows/1234 A
http://tv.com/shows/34671 B
But wait, they weren’t synchronizing?!?
36
78. Conflicts
Conflicting documents are tagged with
_conflict: true
Conflicts are resolved using the vector
clock
The “winning” document becomes the
most current version
The loser becomes the version previous to
the winner
39
This talk was given Dec 7, 2009, Pearl Harbor day.
This talk was given Dec 7, 2009, Pearl Harbor day.
This talk was given Dec 7, 2009, Pearl Harbor day.
- If the key is a DateTime, then B-tree is a much better choice
Highlighted words covered later in order that they appear
Other stuff, but this is the most relevant for the discussion
Most user&#x2019;s browsers only support GET and POST, but that is changing
Other stuff, but this is the most relevant for the discussion
Most user&#x2019;s browsers only support GET and POST, but that is changing
Other stuff, but this is the most relevant for the discussion
Most user&#x2019;s browsers only support GET and POST, but that is changing
Other stuff, but this is the most relevant for the discussion
Most user&#x2019;s browsers only support GET and POST, but that is changing
Other stuff, but this is the most relevant for the discussion
Most user&#x2019;s browsers only support GET and POST, but that is changing
Other stuff, but this is the most relevant for the discussion
Most user&#x2019;s browsers only support GET and POST, but that is changing
Other stuff, but this is the most relevant for the discussion
Most user&#x2019;s browsers only support GET and POST, but that is changing
CRUD = Create Read Update Delete
In a perfect world, the documents should be self-sufficient, but sometimes reality gets in the way and documents will have to relate to each other. See GAE foreign key references
Keep in mind we are talking theory here. Most RDBMS today use MVCC as well for row level read while a write is happening. Optimistic locking is another technique to enable concurrent data access and writes.
MySQL MyISAM is a notable exception in that it does table level locks on write. Use InnoDB.
Next is the API discussions
Append-only file structure ensures that your DB is always valid, even during mid-write server failures.
You must provide an ID for the insert. This is in contrast to RDBMS auto-generated primary keys.
UUIDs are good for distributed systems, since duplicate ID likelihood is small
Typically you GET the full document, revise it within the application, then submit the entire JSON document back as a PUT operation
You cannot delete a specific revision! The revision number is only there so that the server can definitively say you are talking about the most recent record.
You need delete rev for replication of delete operations on other servers that are being synced to this one.
Might also be able to delete a particualr version. Will have to check that.
Note: I could&#x2019;ve made GI a number, but did not in this case
Zipcodes would be a bad thing to turn into numbers, b/c of possible leading zeros
Note: I could&#x2019;ve made GI a number, but did not in this case
Zipcodes would be a bad thing to turn into numbers, b/c of possible leading zeros
Note: I could&#x2019;ve made GI a number, but did not in this case
Zipcodes would be a bad thing to turn into numbers, b/c of possible leading zeros
Best practice = One design document per application or set of requirements
Next: Map-Reduce Views
Edit this slide. Maybe just show a full design document.
See the CouchDB book for more information on rereduce and how it takes advantage of the B-tree index
Reduce functions create an index with the emitted values. You would be duplicating all of your data (Not sure about map indexes)
Instead emit a collection of docs and collate them on the client.
Brewer&#x2019;s CAP Theorem http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
Partition tolerance encompasses both business logic and data partitioning.
PAXOS will override more recent updates to a disconnected resource if it did not vote on a previous transaction.
Load balancing and failover are separate concerns, you don&#x2019;t want your failover to be dependent on servers that are part of your load balance infrastructure.
We&#x2019;ll handle the easy stuff first, data replication and load balancing
HAProxy added consistent hashing in version 1.3.21 but use 1.3.22
HAProxy added consistent hashing in version 1.3.21 but use 1.3.22
HAProxy added consistent hashing in version 1.3.21 but use 1.3.22
HAProxy added consistent hashing in version 1.3.21 but use 1.3.22
HAProxy added consistent hashing in version 1.3.21 but use 1.3.22