Benchmarking MongoDB and CouchBase

Benchmarking
No-SQL Databases

Alex Voss
Chris Choi

TOP 2 Questions

Should a social scientist buy
MORE or UPGRADE
computers?

Which DATABASE(s)?

Document Oriented
Databases

The central concept of a
document-oriented database
is the notion of a Document

Both

and

Uses JSON to store these
Documents

Example Documents:
{
{
FirstName:"Jonathan",
FirstName:"Bob",
Address:"15 Wanamassa Point Road",
Address:"5 Oak St.",
Children:[
Hobby:"sailing"
{Name:"Michael",Age:10},
}
{Name:"Jennifer", Age:8},
{Name:"Samantha", Age:5},
{Name:"Elena", Age:2}
]
}

there are no empty 'fields', and it does
not require explicitly stating if other
pieces of information are left out

Database Test
Aggregate/Count:

Most User Mentioned

Most Hashed Tags

Most Shared URLs

Database Test
Aggregate/Count: Typical questions

Most User Mentioned

Most Hashed Tags

Most Shared URLs

Has two query
approaches:

Aggregation
Map Reduce &
Framework

Map Reduce

For Aggregation

Uses JavaScript

Aggregation Framework

Similar to Map Reduce

But Uses C++ (Faster!)

Single Node
SSD vs HDD

Map Reduce Aggregation Framework
60 60
Most User Mentioned
Most Hashed-Tags Most User Mentioned
50 50 Most Hashed-Tags
Most Shared URLs
Most Shared URLs
40 40

30 30

20 20

10 10

0 0
HDD SSD HDD SSD

Single Node


is roughly 2 times faster than

Map Reduce

When we have more than 1
node ~ Scaling

Generally . . .

Scaling
BIGGER
Vertically and
FATTER
machine!

Scaling
Horizontally
MORE machines!

Replica-Set
Master-Slave
Replication

Replicates data
from master to
slave

Only one node is in
charge (the Master), and
data only travels in one
direction (to the Slave)

Replica-Set
Single Node vs Replica Set:

25
Single Node
Replica Set
20

15
Time (minutes)

10

5

0
User Mentioned Hashed-Tags Shared URLs

Replica-Set
Single Node vs Replica Set:
Map Reduce

80
Single Node
70 Replica Set

60

50
Time (minutes)

40

30

20

10

0
User Mentioned Hashed-Tags Shared URLs

Replicating Data did not improve
query performance
(MapReduce and Aggregation Framework) . . .
so we tried
Sharding

Sharding

One BIG stack

Two or more smaller stacks

democratic
uses a
hierarchical scaling
approach

Master-Slave
Replication

But Deployment is an
issue with MongoDB . . .

If you want to shard . . .

documentation suggests
minimum 11 nodes, to build a

fail-safe sharded cluster

2 Shards → 6 nodes (3 each)

2 mongos (load balancer) → 2 nodes

3 mongod (config servers) → 3 nodes
-------------------------------------------

Total: 11 nodes

The investment to go from

single node --> sharded cluster

is very HIGH!

By omitting Fail-Safe features

For our tests, we kind of cheated and used:
Note: X is variable
2 X Shards → 6 2 nodes

2 1 mongos (load balancer) → 2 0.5 node

3 1 mongod (config servers) → 3 0.5 node
-------------------------------------------

Total: 11 3 nodes
NOT RECOMMENDED
IN PRODUCTION

Which looks something like this:

4 Shard configuration

Or this:

8 Shard configuration

Another problem is with
querying in a sharded
environment . . .

We couldn't perform queries using
the Aggregation Framework in the
shards, because it was limited by:

- Output at every stage of the
aggregation can only contain 16MB
- >10% RAM usage

MongoDB would
fail if the limits
were reached

4 Cores
8GB RAM
Sharded Cluster
Map Reduce: On 3 modest machines

Most hash tags (mins)
30.00 Most user mentions (mins)
Most shared urls (mins)

25.00

20.00
Time (minutes)

15.00

10.00

5.00

0.00
2 4 8

Most
Number of Shards

Optimal

4 Cores
8GB RAM
Sharded Cluster

Single Node Three Nodes
60.00
60
Most Hashed-Tags Most user mentions (mins)
50.00
50 Most User Mentioned Most shared urls (mins)
Most Shared URLs
40.00
40

Time (minutes)
Time (minutes)

30 30.00

20 20.00

10 10.00

0 0.00
HDD SSD 2 4 8
Number of Shards

4 Cores
8GB RAM
Sharded Cluster

Single Node Three Nodes
60.00
60 60
Most Hashed-Tags Most hash tags (mins)
Most Hashed-Tags
Most User Mentioned Most user mentions (mins)
50.00
50 50 Most User Mentioned
Most Shared URLs Most shared urls (mins)
Most Shared URLs
40 40 40.00

Time (minutes)
Time (minutes)
Time (minutes)

30 30 30.00

20 20 20.00

10 10 10.00

0 0 0.00
HDD
HDD SSD
SSD 2 4 8
Number of Shards

Performance of Single Node (SSD)
is very comparable to 3 nodes (HDD)
of different number of shards!
(Single Node with SSD is
2-3 minutes slower than 3
nodes . . . [40 GB corpus])

Perhaps its worth purchasing a good
SSD over investing more nodes to
improve aggregate performance . . .

What happens when we scale vertically

with ONE BIG BOX?

we get --->

32 Cores
128GB RAM
Sharded Cluster
Map Reduce: On a Big Box

12.00
Most user mentions (mins) Most
10.00
Most shared urls (mins)
Optimal
8.00
Time (minutes)

6.00

4.00

2.00

0.00
2 4 8 16 24 28 30
Number of Shards

On a SINGLE NODE , while
CouchBase
is FASTER than
MongoDB's MapReduce

MongoDB's Aggregation Framework
is FASTER than
CouchBase

That said adding more
Couchbase nodes was
disappointing. . .

1 2 VS 3
VS
Node Nodes Nodes
One Node
90 Two Nodes

Three Nodes
80

70

60

50

40

30

20

10

0
Most user mentioned Most hashed tags Most shared urls

Adding more nodes did not
improve
query performance . . .

Other users have found similar issues,
Couchbase Support says:

“One known issue with the current developer
preview of Couchbase is that it slows down
on smaller clusters. The views are optimal
on very large clusters.” [1]

But they did not specify what is “very large”

[1] Couchbase performance - real tests - it's horrible - or am I doing somethin wrong?

The obvious answer is

From a performance standpoint.

Simple flat file aggregation with Java
took approx 60 mins

On the same machine, MongoDB took
approx

30 mins (MapReduce)
10 mins (Aggregation Framework)

Deployment Options
140.00 2500
1 Node
Total Execution
Hard drive HDD Time 2200
Map Reduce MR Costs
120.00
1950 2000

100.00

Approximate Total Cost (GBP)
Total Execution Time (mins)

1 Node 1500
80.00 SSD
MR

60.00 1000 1000
900 1000

3 Nodes
40.00650
HDD MR
MR 500

20.00 AF Big Box
1 Node #2 HDD
0.00 Fast SSD 0
MR

Aggregate
Framework

But!

Has problems with:

- Full Text Search
- Social Network Graphs
- Aggregation

's Full Text Search
is slow

1 simple query on a single
node, 44GB corpus took
approx
10 mins!

Solution:

A Fast Full Text Search Engine
The same search took only a
fraction of a second

To create a
Social
Network
Graph with

Can be Complex,
since you query multiple
times to obtain
relationships

Solution:

A Popular Graph Database
Where you can obtain a
network graph
with VERY FEW queries

Example:

START john=node:node_auto_index(name = 'John')
MATCH john-[:friend]->()-[:friend]->fof
RETURN john, fof

The above query
fetches all nodes who
are friends of friends
of “Johns”

With Aggregation

Is perhaps not the best,
we have yet to play with

Or even

An SQL language to
build MapReduce
queries

With

Moral of the story

Use the right tools
for the right job

Future work
We aim to experiment with:

With possibility in integrating both
with

Benchmarking MongoDB and CouchBase

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Benchmarking MongoDB and CouchBase

Similar to Benchmarking MongoDB and CouchBase (20)

Benchmarking MongoDB and CouchBase