6. Example Documents:
{
{
FirstName:"Jonathan",
FirstName:"Bob",
Address:"15 Wanamassa Point Road",
Address:"5 Oak St.",
Children:[
Hobby:"sailing"
{Name:"Michael",Age:10},
}
{Name:"Jennifer", Age:8},
{Name:"Samantha", Age:5},
{Name:"Elena", Age:2}
]
}
there are no empty 'fields', and it does
not require explicitly stating if other
pieces of information are left out
14. Single Node
SSD vs HDD
Map Reduce Aggregation Framework
60 60
Most User Mentioned
Most Hashed-Tags Most User Mentioned
50 50 Most Hashed-Tags
Most Shared URLs
Most Shared URLs
40 40
30 30
20 20
10 10
0 0
HDD SSD HDD SSD
19. Replica-Set
Single Node vs Replica Set:
Aggregation Framework
25
Single Node
Replica Set
20
15
Time (minutes)
10
5
0
User Mentioned Hashed-Tags Shared URLs
20. Replica-Set
Single Node vs Replica Set:
Map Reduce
80
Single Node
70 Replica Set
60
50
Time (minutes)
40
30
20
10
0
User Mentioned Hashed-Tags Shared URLs
21. Replicating Data did not improve
query performance
(MapReduce and Aggregation Framework) . . .
so we tried
Sharding
22. Sharding
One BIG stack
Two or more smaller stacks
23. democratic
uses a
hierarchical scaling
approach
Master-Slave
Replication
24. But Deployment is an
issue with MongoDB . . .
If you want to shard . . .
26. The investment to go from
single node --> sharded cluster
is very HIGH!
27. By omitting Fail-Safe features
For our tests, we kind of cheated and used:
Note: X is variable
2 X Shards → 6 2 nodes
2 1 mongos (load balancer) → 2 0.5 node
3 1 mongod (config servers) → 3 0.5 node
-------------------------------------------
Total: 11 3 nodes
NOT RECOMMENDED
IN PRODUCTION
31. We couldn't perform queries using
the Aggregation Framework in the
shards, because it was limited by:
- Output at every stage of the
aggregation can only contain 16MB
- >10% RAM usage
MongoDB would
fail if the limits
were reached
32. 4 Cores
8GB RAM
Sharded Cluster
Map Reduce: On 3 modest machines
Most hash tags (mins)
30.00 Most user mentions (mins)
Most shared urls (mins)
25.00
20.00
Time (minutes)
15.00
10.00
5.00
0.00
2 4 8
Most
Number of Shards
Optimal
33. 4 Cores
8GB RAM
Sharded Cluster
Map Reduce: On 3 modest machines
Single Node Three Nodes
60.00
60
Most hash tags (mins)
Most Hashed-Tags Most user mentions (mins)
50.00
50 Most User Mentioned Most shared urls (mins)
Most Shared URLs
40.00
40
Time (minutes)
Time (minutes)
30 30.00
20 20.00
10 10.00
0 0.00
HDD SSD 2 4 8
Number of Shards
34. 4 Cores
8GB RAM
Sharded Cluster
Map Reduce: On 3 modest machines
Single Node Three Nodes
60.00
60 60
Most Hashed-Tags Most hash tags (mins)
Most Hashed-Tags
Most User Mentioned Most user mentions (mins)
50.00
50 50 Most User Mentioned
Most Shared URLs Most shared urls (mins)
Most Shared URLs
40 40 40.00
Time (minutes)
Time (minutes)
Time (minutes)
30 30 30.00
20 20 20.00
10 10 10.00
0 0 0.00
HDD
HDD SSD
SSD 2 4 8
Number of Shards
35. Performance of Single Node (SSD)
is very comparable to 3 nodes (HDD)
of different number of shards!
(Single Node with SSD is
2-3 minutes slower than 3
nodes . . . [40 GB corpus])
Perhaps its worth purchasing a good
SSD over investing more nodes to
improve aggregate performance . . .
37. 32 Cores
128GB RAM
Sharded Cluster
Map Reduce: On a Big Box
12.00
Most hash tags (mins)
Most user mentions (mins) Most
10.00
Most shared urls (mins)
Optimal
8.00
Time (minutes)
6.00
4.00
2.00
0.00
2 4 8 16 24 28 30
Number of Shards
38.
39. On a SINGLE NODE , while
CouchBase
is FASTER than
MongoDB's MapReduce
MongoDB's Aggregation Framework
is FASTER than
CouchBase
43. Other users have found similar issues,
Couchbase Support says:
“One known issue with the current developer
preview of Couchbase is that it slows down
on smaller clusters. The views are optimal
on very large clusters.” [1]
But they did not specify what is “very large”
[1] Couchbase performance - real tests - it's horrible - or am I doing somethin wrong?
47. Simple flat file aggregation with Java
took approx 60 mins
On the same machine, MongoDB took
approx
30 mins (MapReduce)
10 mins (Aggregation Framework)
48. Deployment Options
140.00 2500
1 Node
Total Execution
Hard drive HDD Time 2200
Map Reduce MR Costs
120.00
1950 2000
100.00
Approximate Total Cost (GBP)
Total Execution Time (mins)
1 Node 1500
80.00 SSD
MR
60.00 1000 1000
900 1000
3 Nodes
40.00650
HDD MR
MR 500
20.00 AF Big Box
1 Node #2 HDD
0.00 Fast SSD 0
MR
Aggregate
Framework
49. But!
Has problems with:
- Full Text Search
- Social Network Graphs
- Aggregation
50. 's Full Text Search
is slow
1 simple query on a single
node, 44GB corpus took
approx
10 mins!
51. Solution:
A Fast Full Text Search Engine
The same search took only a
fraction of a second
53. Solution:
A Popular Graph Database
Where you can obtain a
network graph
with VERY FEW queries
54. Example:
START john=node:node_auto_index(name = 'John')
MATCH john-[:friend]->()-[:friend]->fof
RETURN john, fof
The above query
fetches all nodes who
are friends of friends
of “Johns”