2. Facts about Map/Reduce
Programming paradigm, popularized and patented by Google
Great for parallel jobs
No Joins between documents
In CouchDB: Map/Reduce in JavaScript (default)
Also Possible with other languages
Workflow
1. Map function builds a list of key/value pairs
2. Reduce function reduces the list ( to a single Value)
Oliver Kurowski, @okurow
3. Simple Map Example
A List of Cars
Id: 1 Id: 2 Id: 3 Id: 4 Id: 5
make: Audi make: Audi make: VW make: VW make: VW
model: A3 model: A4 model: Golf model: Golf model: Polo
year: 2000 year: 2009 year: 2009 year: 2008 year: 2010
price: 5.400 price: 16.000 price: 15.000 price: 9.000 price: 12.000
Step 1: Make a list, ordered by Price
Function(doc) {
emit (doc.price, doc.id);
}
Key Value
Step 2: Result: Key , Value
5.400 , 1
9.000 , 4
12.000 , 5
15.000 , 3
16.000 , 2
Oliver Kurowski, @okurow
4. Querying Maps
Original Map Key , Value
5.400 , 1
9.000 , 4
12.000 , 5
15.000 , 3
16.000 , 2
All keys
startkey=10.000 & endkey=15.500 from 10.000
Key , Value to < 15.500
12.000 , 5
15.000 , 4
Exact
key=10.000 Key , Value key, so no
result
endkey=10.000 Key , Value
5.400 , 1
All
keys, less
than 10.000
Oliver Kurowski, @okurow
5. Map Function
Has one document as input
Can emit all JSON-Types as key and value:
- Special Values: null, true, false
- Numbers: 1e-17, 1.5, 200
- Strings : “+“, “1“, “Ab“, “Audi“
- Arrays: [1], [1,2], [1,“Audi“,true]
- Objects: {“price“:1300,“sold“:true}
Results are ordered by key ( or revers)
(order with mixed types: see above)
In CouchDB: Each result has also the doc._id
{"total_rows":5,"offset":0,
"rows":[
{"id":"1","key":"Audi","value":1}, {"id":"
2","key":"Audi","value":1}, {"id":"3","key":
"VW","value":1}, {"id":"4","key":"VW","va
lue":1}, {"id":"5","key":"VW","value":1} ]}
Oliver Kurowski, @okurow
6. Reduce Function
Has arrays of keys and values as input
Should reduce the result of a map to a single value
Javascript (Other languages possible)
In CouchDB: some simple built-in native erlang functions
(_sum,_count,_stats)
Is automaticaly called after the map-function has finished
Can be ignored with “reduce=false“
Is needed for grouping
Oliver Kurowski, @okurow
7. Simple Map/Reduce Example
A List of Cars
Id: 1 Id: 2 Id: 3 Id: 4 Id: 5
make: Audi make: Audi make: VW make: VW make: VW
model: A3 model: A4 model: Golf model: Golf model: Polo
year: 2000 year: 2009 year: 2009 year: 2008 year: 2010
price: 5.400 price: 16.000 price: 15.000 price: 9.000 price: 12.000
Step 1: Make a map, ordered by make
Function(doc) {
emit (doc.make, 1);
}
Value
Key
=1
Result: Key , Value
Audi , 1
Audi , 1
VW, 1
VW, 1
VW, 1
Oliver Kurowski, @okurow
8. Simple Map/Reduce Example
Result: Key , Value
Audi , 1
Audi , 1
VW , 1
VW , 1
VW , 1
Step 2: Write a “sum“-reduce
function(keys,values) {
return sum(values);
}
Result: Key , Value
null ,5
Oliver Kurowski, @okurow
9. Simple Map/Reduce Example
Step 3: Querying
- key=“Audi“ Key , Value
null , 2
Step 4: Grouping by keys
- group=true Key , Value
Audi , 2
VW , 3
Step 5: Use only the map Function
- reduce=false Key , Value Like
Audi ,1 having no
Audi ,1 reduce-
VW ,1 function
VW ,1
VW ,1
Oliver Kurowski, @okurow
10. Array-Key Map/Reduce Example
A List of cars (again)
Id: 1 Id: 2 Id: 3 Id: 4 Id: 5
make: Audi make: Audi make: VW make: VW make: VW
model: A3 model: A4 model: Golf model: Golf model: Polo
year: 2000 year: 2009 year: 2009 year: 2008 year: 2010
price: 5.400 price: 16.000 price: 15.000 price: 9.000 price: 12.000
Step 1: Make a map, with array as key
Function(doc) {
emit ([doc.make,doc.model,doc.year], 1);
}
Result (with group=true):
Key , Value
[Audi, A3, 2000] , 1
[Audi, A4, 2009] , 1
[VW, Golf, 2008] , 1
[VW, Golf, 2009] , 1
[VW, Polo, 2010] , 1
Oliver Kurowski, @okurow
14. Examples:
Get all car makes: Key , Value
[Audi] , 2
- group_level=1 [VW] , 3
Get all models from VW:
- startkey=[“VW“]&endkey=[“VW“,{}]&group_level=2
Key , Value
[VW, Golf] , 2
[VW, Polo] , 1
Get all years of VW Golf:
- startkey=[“VW“,“Golf“]&endkey=[“VW“,“Golf“,{}]&group_level=3
Key , Value
[VW, Golf, 2008] , 1
[VW, Golf, 2009] , 1
Oliver Kurowski, @okurow
15. Reduce / Rereduce:
A rule to use reduce-functions:
The input of a reduce function does not only accept the
result of a map, but also the result of itself
Function(doc) { Key , Value function(keys,values) {
Key , Value
emit (doc.make,1); Audi , 2 return sum(values);
null , 5
} VW , 3 }
Why ?
A reduce function can be used more than just once
If the map is too large, then it will be split and each part runs
through the reduce function, finally all the results run through
the same reduce function again.
Oliver Kurowski, @okurow
20. Where does Map/Reduce live ?
Map/Reduce functions are stored in a design document
in the “views“ key:
{
“_id“:“_design/example“,
“views“: {
“simplereduce“: {
“map“: “function(doc) { emit(doc.make,1); }“,
“reduce“: “function (keys, values) { return sum (values); }“
}
}
}
Map/reduce functions start when a view is called:
http://localhost:5984/mapreduce/_design/example/_view/simplereduce
http://localhost:5984/mapreduce/_design/example/_view/simplereduce?key=“Audi“
http://localhost:5984/mapreduce/_design/example/_view/simplereduce?key=“VW“&group=true
Oliver Kurowski, @okurow
21. View calling
All documents in the database are called by a view once
After the first call: Only new and changed docs are called by the function
when calling the view again
The results are stored in CouchDB internal B+tree
The result, that you receive is the stored B+tree result
That means: If a view is called first, it could take a little time to build the tree
before you get the results.
If there are no changes to docs, the next time you call, the result is presented
instantly
Key queries like startkey and endkey are performed on the B+tree result, no
rebuild needed
There are serveral parameters for calling a view:
limit, skip, include_docs=true, key, startkey, endkey, descending, stale(ok,upd
ate_after),group, group_level, reduce (=false)
Oliver Kurowski, @okurow
22. View calling parameters
limit: limits the output
skip: skips a number of documents
include_docs=true: when no reduce, docs are sent with the map-list
key, startkey,endkey: should be known now
startkey_docid=x: only docs with id>=x
endkey_docid=x: only docs with id<x
descending=true: reverse order. When using start/endkey, they must be
changed
Stale=ok: do not start indexing, just deliver the stored result
Stale=update_after: deliver old results, start indexing after that
Group, group_level,reduce=false: should be known
Oliver Kurowski, @okurow