This talk gives a small introduction to MongoDB sharding and gives a complete overview about everything a beginner needs to know to get started with sharding.
2. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
About me
• Product engineer at ServerDensity
• Working with mongoDB in production for more than 4 years
• Python and php programmer
• Pybcn co-organizer
• FOSDEM volunteer
3. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
What is sharding?
It’s the system MongoDB uses to:
• Distribute writes
• Distribute primary reads
• Distribute data
• Or, in other words, grow horizontally and scale
4. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
What does it look like?
• Like this:
5. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
What does it look like?
• Or like this:
6. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
Nomenclature:
• Shard:
• Logical data partition
• Each shard is handled by a server or replica set
• Shard key:
• Key that all documents MUST have
• Decided by the user
• Chunk:
• Logical data partition inside a shard
• They be split into 2 smaller chunks
• They can be moved to another shard for balancing
7. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
What does each component do?
• Mongos processes route data
• Config servers hold metadata:
• What chunks are there
• What shard holds each chunk
• Which chunks are being migrated
• The shard servers hold the actual data
8. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
How does it work?
Whenever you read/write data this happens:
1. You run your query in your shell/driver
2. Your driver contacts the mongos process (a proxy)
3. The mongos process retrieves metadata from the config servers
4. Based on the metadata, asks the shards affected by the query to run
their part of the job
5. Mongos returns the result
9. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
Data partitioning
Your data will be split in chunks based on your shard key:
10. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
Choosing a good shard key
In order to get a good shard key it has to:
• Be used in ALL queries
• Allow a huge amount of possible values:
• Sha1 hash -> good
• Phone number -> not bad
• Zip code -> bad
• Boolean -> awful
• Have values evenly distributed across all the key space
If your shard key has a big cardinality, but it’s not evenly distributed
across the key space: use a hashed shard key
11. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
Chunk partitioning
Whenever a chunk reaches certain size, the mongos process will try to
split it into two:
This will fail if all docs in this chunk belong to the same shard key value
12. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
Balancing
• Inevitably, some shards will get more chunks than others
• The sharded cluster will automatically move chunks from crowded
shards to under-populated shards:
• It’s possible to start/stop and customize the balancing algorithm
• It’s possible to manually move chunks around
13. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
HA in a sharded cluster
In order to achieve HA in a sharded cluster you’ll need:
• 3 config servers:
• As long as 1 is up you’ll be able to read/write into the collection
• If a config server is down the metadata collection will be read-
only, so you won’t be able to:
• Split chunks
• Balance the cluster
• Add shards
• N shards; each one with, at least:
• 2 data bearing-nodes
• An arbiter or another data-bearing node
14. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
Demo time!
Creating a new demo sharded cluster:
sudo service mongod stop
mkdir shard0
mkdir shard1
mkdir config
# Start the config server
mongod --fork --syslog --configsvr --dbpath config --port 27019
# Start the shard servers
mongod --fork --syslog --dbpath shard0 --port 30000
mongod --fork --syslog --dbpath shard1 --port 30001
# Start the mongos process
mongos --fork --syslog --configdb localhost:27019
# Add shards
mongo initSharding.js
15. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
Demo time!
Creating a new demo sharded cluster:
//Creating shards
sh.addShard("localhost:30000");
sh.addShard("localhost:30001");
//Adding test data
for (i = 0; i < 10000; i++) {
db.testdata.insert({"i": i})
}
//Creating index
db.testdata.createIndex({"i": 1});
//Enabling sharding
sh.enableSharding("test")
sh.shardCollection("test.testdata", {i:1})
//Manually splitting chunks
for(i = 1; i < 20; i++) {
sh.splitAt("test.testdata", {"i": i*500})
}
//Status
print(sh.status(true));
16. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
Questions?
17. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
We’re hiring!
We’re looking for awesome engineers!
Talk to me after the presentation or go to:
https://www.serverdensity.com/jobs/
18. Jordi Soucheiron - @jordixou
Barcelona MongoDB User Group – 2015-06-29
Code
https://github.com/jsoucheiron/mongodb-barcelona-sharding-introduction
Slides
http://www.slideshare.net/jordixou (soon)