2. Who am I?
Juan Antonio Roy Couto
Financial Software Developer
Twitter: @juanroycouto
Linkedin: https://www.linkedin.com/in/juanroycouto
Personal blog: http://www.juanroy.es
Contributor at: http://www.mongodbspain.com
Charrosfera member: http://www.charrosfera.com
Email: juanroycouto@gmail.com
MongoDB User Group
Tag-based sharding
3. ❏ Cluster overview
❏ Definitions
❏ Steps for balancing
❏ Steps to split a chunk
❏ Migration steps
❏ Normal MongoDB operation
❏ Pre-splitting
❏ Commands to split a chunk
❏ Tag-based sharding overview
❏ Tag your shards
❏ Tag your chunk ranges
Table of Contents
MongoDB User Group
Tag-based sharding
4. ❏ Replica set
❏ Shards
❏ config servers
❏ config database
❏ mongos
Cluster overview
MongoDB User Group
Tag-based sharding
5. Cluster overview
Replica Set
● High availability
● Data safety
● Disaster recovery
MongoDB User Group
Tag-based sharding
Replica Set
Secondary
Secondary
Primary
6. Scale out
Even data distribution across all of the
shards based on a shardkey
A shardkey range belongs to only one
shard
More efficient queries
Cluster overview
Shards
MongoDB User Group
Tag-based sharding
Cluster
Shard 0 Shard 2Shard 1
A-I J-Q R-Z
8. Cluster overview
Config servers
MongoDB User Group
Tag-based sharding
● config database
● Identical information (consistency check).
● Metadata:
○ Cluster shards list
○ Data per shard (chunk ranges)
○ ...
● Don’t sync from each other.
● Default Config server (All mongos read it)
9. Cluster overview
config database
Collections:
● changelog: splits and migration information
● chunks *
● collections * (only sharded)
● databases *
● lockpings
● locks
● mongos
● settings
● shards *
● system.indexes
● tags
● version *
MongoDB User Group
Tag-based sharding
10. ● Receives client requests and returns results.
● Reads the metadata and sends the query to the necessary
shard/shards
● Does not store data
● Keeps a cache version of the metadata. We can refresh it by:
○ mongos>db.runCommand( { flushRouterConfig : 1 } )
○ or restarting the server
Cluster overview
mongos
MongoDB User Group
Tag-based sharding
11. MongoDB User Group
Tag-based shardingDefinitions
● Range: Data division based on the values of the shardkey.
● Chunk: They are not physical data. Chunks are just a logical
grouping of data into ranges (64MB by default).
● Split: Chunk division. No data is moved.
● Migration: Chunk movements between shards in order to get an
even distribution. Only one chunk is moved at a time.
● Balanced system: The same number of chunks per shard.
● Balancer: Checks if a migration is needed and starts it.
16. Useful for storing data directly
in the shards (massive
data loads).
Avoid bottlenecks.
MongoDB does not need to
split or migrate chunks.
After the split, the migration
must be finished before
data loading.
Pre-splitting
MongoDB User Group
Tag-based sharding
Cluster
Shard 0 Shard 2Shard 1
Chunk 1
Chunk 5
Chunk 3
Chunk 4
Chunk 2
17. Splitting a chunk:
mongos>for (var i=0; i<20, i++) {
sh.splitAt(“testdb.presplit”, { x : 1000*i } );
}
Querying existing chunks:
mongos>use config
mongos>db.chunks.find( { ns : “testdb.presplit” } )
Commands to split a chunk
MongoDB User Group
Tag-based sharding
18. Tags are used when you want to pin ranges to a specific shard.
Tag-based sharding overview
MongoDB User Group
Tag-based sharding
shard0
EMEA
shard1
APAC
shard2
LATAM
shard3
NORAM
20. mongos>sh.addTagRange( namespace, minimum, maximum, tag )
mongos>sh.addTagRange( “testdb.tagrange”,
{ “x” : 0 },
{ “x” : 1000 },
“EMEA” )
minimum: the minimum value (inclusive) of the shard key range to include in the tag.
maximum: the maximum value (exclusive) of the shard key range to include in the tag.
Tag your chunk ranges
MongoDB User Group
Tag-based sharding