8. ● Pick something and go with it
● Make mistakes along the way
● Correct the mistakes you can
● Work around the ones you can’t
How a Startup Gets Started
25. Single mongos per client problems we encountered:
Router Architecture
26. Router Architecture
Single mongos per client problems we encountered:
● thousands of connections to config servers
● config server CPU load
● configdb propagation delays
29. Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
● much faster configdb propagation
30. Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
● much faster configdb propagation
Disadvantages:
31. Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
● much faster configdb propagation
Disadvantages:
● additional network hop
● fewer points of failure
50. The Balancer and Me
Why wouldn’t you run the balancer in the first place?
● great question
● for us, it’s because we deleted a ton of data at one point, and left a
bunch of holes
○ we turned it off while deleting this data
○ and then were unable to turn it back on
● but maybe you start without it
● or maybe you need to turn it off for maintenance and forget to turn
it back on
Obviously, don’t do this. But if you do, here’s what happens...
51. The Balancer and Me
Fresh, new, empty cluster… But no balancer running.
69. So what can we do?
1. add IOPS
The Balancer and Me
70. So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
The Balancer and Me
71. So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
The Balancer and Me
72. So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
4. approach a balanced state
The Balancer and Me
73. So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
4. approach a balanced state
5. hold your breath
The Balancer and Me
74. So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
4. approach a balanced state
5. hold your breath
6. try re-enabling the balancer
The Balancer and Me
76. How to manually balance:
1. determine a chunk on a hot shard
2. monitor effects on both the source and target shards
3. move the chunk
4. allow the system to settle
5. repeat
The Balancer and Me
77. How to manually balance:
1. determine a chunk on a hot shard
mongos> db.chunks.find({"shard":"<shard_name>",
"ns":"<db_name>.<collection>"}).limit(1).pretty()
You’ll get a single chunk (as both min and max); note its shard key and
ObjectId.
The Balancer and Me
78. How to manually balance:
1. determine a chunk on a hot shard
"min" : {
"unsymbolized_hash" :
"1572663b72e87[...]",
"_id" : ObjectId("50b97db98238[...]")
},
The Balancer and Me
79. How to manually balance:
1. determine a chunk on a hot shard
2. monitor effects on both the source and target shards
iostat -xhm 1
mongostat
The Balancer and Me
80. How to manually balance:
1. determine a chunk on a hot shard
2. monitor effects on both the source and target shards
3. move the chunk
mongos> sh.moveChunk("<db_name>.<collection>", {
"unsymbolized_hash" : "1572663b72e87[...]",
"_id" : ObjectId("50b97db98238[...]") },
"<target_shard>")
The Balancer and Me
81. How to manually balance:
1. determine a chunk on a hot shard
2. monitor effects on both the source and target shards
3. move the chunk
4. allow the system to settle
5. repeat
The Balancer and Me
83. ● Design ahead of time
o “NoSQL” lets you play it by ear
o but some of these decisions will bite you later
● Be willing to correct past mistakes
o dedicate time and resources to adapting
o learn how to live with the mistakes you can’t correct
Summary
84. References
● MongoDB Blog post:http://blog.mongodb.org/post/77278906988/crittercism-scaling-to-
billions-of-requests-per-day-on
● MongoDB Documentation on mongos
routers:http://docs.mongodb.org/master/core/sharded-cluster-query-routing/
● MongoDB Documentation on the
balancer:http://docs.mongodb.org/manual/tutorial/manage-sharded-cluster-balancer/
● MongoDB Documentation on shard
keys:http://docs.mongodb.org/manual/core/sharding-shard-key/
Crittercism: http://www.crittercism.com/
I’m going to tell you the story of how we’ve scaled to handle over 30k req/s using a storage strategy based on MongoDB
Between proposing this talk and now, we’ve actually grown some more, and now top 40-45k r/s on a daily basis
This is about 3.5B requests per day
this is a preview of a talk I’ll be giving at MongoDB World, June 23-25 in NYC
you can still register
and of course Crittercism will be there
some advice from our experience about things to do and things not to do
I’ll be sure to leave time for Q&A
I’ll tell you how Crittercism got started, some of the lessons we’ve learned along the way, and some advice we can share based on those experiences
September 2010 (from Wayback Machine)
Started as a “feedback widget”
Enable mobile app developers to allow their users to provide “criticism” of their apps (outside of the app store)
Not just a star rating
this is pretty easy -
set up a (mongo) db, put an api in front of it, collect user feedback from our SDK
added more types of data we collect
volume starts getting large, so let’s count app loads in a memory-based data store (redis), and persist it to mongo
then we added user metadata as well, but that’s a different kind of data and a different volume and access pattern, so let’s add dynamodb into the mix
our volume keeps going up, so let’s cache this app data to make our responses faster
then we added APM, which introduced a lot of different data types and structures
so we added another ingest API and postgres into the mix
(but obviously we’re not going to talk about that part here…)
today (2014) - what it’s evolved into
collecting tons of detailed analytics data - crash reports, groupings
Geo data launched in 2013 (just kidding, this is stored in postgres)
iPad app launched in 2014 - more aggregations of performance data (more ways to view it)
lots to deal with...
so we started as a way for people to “criticize” your apps
then we helped you catch bugs, so we’re the ones doing the “criticism”
so how do we handle 40k/s on mongodb?
we don’t, but that’s our ingest rate, and most of it ends up in mongodb
the takeaway here is to be willing to use whatever works
2-year period
went from 700/s (60M/day)
to 40-45k/s (3.8B/day)
one of the biggest things we did to help ourselves scale was to consolidate the mongos routers
default, first-pass architecture (for a sharded cluster): one mongos per client machine
each client process connects to a local mongos router
each mongos routes queries and returns results
could mean your application is reading stale data, or can’t find the data it needs when it needs it (and maybe it has to retry, which means it’s now slower)
move the mongos routers to their own tier
be smart about how you route to them
(we use chef to keep it within the same AZ)
be aware that this does introduce some disadvantages, too
This is a fundamental design decision that will have huge implications for a long time, so think about it carefully.
Hard (impossible) to change after the fact!
Say you have 4 shards. Let’s say each of the NHL teams that made the playoffs this year has an app, and we shard by app_id.
Say you have 4 shards. Let’s say each of the NHL teams that made the playoffs this year has an app, and we shard by app_id.
Let’s distribute them evenly, as is likely to be the case (assuming a sufficiently randomly-generated app_id)
this looks nice and even, right?
So now it’s time for the Western Conference Finals, and the Blackhawks are playing the Kings
So those 2 apps are going to get heavy use, but they’re on the same shard, so uh-oh...
Now this shard isn’t happy
Higher load, slower response time for queries to this shard (which are your most common queries due to these apps’ popularity)
so let’s add another shard
That might help if we have more teams’ apps to add
Those new apps had somewhere to go, to keep our cluster balanced
But this hasn’t helped our uneven access pattern at all
Only option now is to vertically scale the problem shard
and hopefully that cools it off, but now we have an uneven cluster to manage.
and what happens next year, when it’s two different teams in the conference finals?
maybe we get lucky and they’re on different shards… but even then, maybe the access is uneven enough that those 2 shards still get hot.
so maybe you just live with this and have heterogenous shard servers. (this is probably a much lesser evil than trying to re-shard.)
lesson: you’re going to have to live with the shard key you choose, so choose wisely!
another option might’ve been to spread data for each app_id across all shards--but then your queries will likely be slower (due to having to read from many/all shards).
it’s a trade-off.
The balancer is a super-important part of a sharded mongo cluster… You should love it.
Start with an empty cluster, and start filling it with data
(we’ll denote “fullness” by going from green to red)
This is an example of what can happen when the balancer is not running
Okay, so now we have a very unbalanced cluster. 3 of our replica sets are very full, one is pretty full, and the newest one is hardly in use.
(remember that the balancer isn’t running in this scenario)
The balancer will see the full shards and one near-empty one, and will want to move a ton of chunks all at once, causing severe I/O strain on the system.
(no way to tell the balancer to chill)
remember that all of these chunk moves are causing updates to your configdb, places load on your config servers, and has to propagate to all mongos routers, too
you’re going to be adding a lot of I/O to the system when you move chunks, and it still has to be able to perform its normal functions, so over-provision
we’re in AWS so we just go for PIOPS… but if you’re on physical hardware, consider RAIDing wider, or upgrading your SAN, or...
updating the configdb (when you move chunks) puts load on your config servers, so make sure they’re ready to handle it
this is tedious and will take a LONG time (more detail in a minute)
gradually you’ll get to a happier place
take a deep breath before you...
be ready to turn it off and return to step 3 if needed, then try again
(this was step 3)
here’s an example from our “rawcrashlog” collection (hash and _id truncated)
start both commands running on both the source and target
don’t need to specify source shard, since your shard key (unsymbolized_hash in our case) and _id are sufficient for mongo to know where it’s coming from
watch your monitoring (iostat/mongostat) -- look for spikes in page faults, queued reads/writes, database lock percentages.
obviously look at your application monitoring too, to ensure no adverse effects.
use MMS as well (e.g., lock %, page faults)
if everything looks good, keep going. if not, you need to start over with more IOPS, more config server capacity, etc.
seems obvious, but not always the case.
and if you’re not running it, you can embark on this tedious journey to get it running again.
best-case scenario is to make all of the right choices up front… but you’re probably not going to do that. (though hopefully you can learn a bit from our experience and minimize the wrong choices you make).
the good news is MongoDB is still working for us, despite the headaches we’ve had to deal with.
reminder that MongoDB World is right around the corner
along with all of these great presenters, I’ll be giving a version of this talk there, and would love to meet you