5. ● Pick something and go with it
● Make mistakes along the way
How a Startup Gets Started
6. ● Pick something and go with it
● Make mistakes along the way
● Correct the mistakes you can
How a Startup Gets Started
7. ● Pick something and go with it
● Make mistakes along the way
● Correct the mistakes you can
● Work around the ones you can’t
How a Startup Gets Started
60. Router Architecture
Single mongos per client problems we encountered:
● thousands of connections to config servers
● config server CPU load
61. Router Architecture
Single mongos per client problems we encountered:
● thousands of connections to config servers
● config server CPU load
● configdb propagation delays
68. Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
69. Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
● much faster configdb propagation
70. Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
● much faster configdb propagation
Disadvantages:
71. Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
● much faster configdb propagation
Disadvantages:
● additional network hop
72. Router Architecture
Separate mongos tier advantages:
● greatly reduced number of connections to each mongod
● far fewer hosts talking to the config servers
● much faster configdb propagation
Disadvantages:
● additional network hop
● host failure has a larger effect
82. Router Architecture - Evolve!
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongos
client
process
application server
mongos
client
process
application server
Client Application(s) MongoDB ClusterMaybe at first,
doing the
mongos-per-host
architecture
is fine.
83. Maybe at first,
doing the
mongos-per-host
architecture
is fine.
And it will probably
remain fine
for quite a while.
Router Architecture - Evolve!
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongos
client
process
application server
mongos
client
process
application server
Client Application(s) MongoDB Cluster
84. Router Architecture - Evolve!
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongod
server
mongod
server
mongod
server
replica set
mongos
client
process
application server
mongos
client
process
application server
Client Application(s) MongoDB ClusterRouter TierThis is an area
where you can
and should be
willing to adapt
as you go
(and as needed).
141. The Balancing Act
Why wouldn’t you run the balancer in the first place?
● great question
142. The Balancing Act
Why wouldn’t you run the balancer in the first place?
● great question
● for us, it’s because we deleted some old data at one point, and left
a bunch of holes
143. The Balancing Act
Why wouldn’t you run the balancer in the first place?
● great question
● for us, it’s because we deleted some old data at one point, and left
a bunch of holes
○ we turned it off while deleting this data
144. The Balancing Act
Why wouldn’t you run the balancer in the first place?
● great question
● for us, it’s because we deleted some old data at one point, and left
a bunch of holes
○ we turned it off while deleting this data
○ and then were unable to turn it back on
145. The Balancing Act
Why wouldn’t you run the balancer in the first place?
● great question
● for us, it’s because we deleted some old data at one point, and left
a bunch of holes
○ we turned it off while deleting this data
○ and then were unable to turn it back on
● but maybe you start without it
146. The Balancing Act
Why wouldn’t you run the balancer in the first place?
● great question
● for us, it’s because we deleted some old data at one point, and left
a bunch of holes
○ we turned it off while deleting this data
○ and then were unable to turn it back on
● but maybe you start without it
● or maybe you need to turn it off for maintenance and forget to turn
it back on
147. The Balancing Act
Why wouldn’t you run the balancer in the first place?
● great question
● for us, it’s because we deleted some old data at one point, and left
a bunch of holes
○ we turned it off while deleting this data
○ and then were unable to turn it back on
● but maybe you start without it
● or maybe you need to turn it off for maintenance and forget to turn
it back on
Obviously, don’t do this. But if you do, here’s what happens...
185. So what can we do?
1. add IOPS
The Balancing Act
186. So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
The Balancing Act
187. So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
The Balancing Act
188. So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
4. approach a balanced state
The Balancing Act
189. So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
4. approach a balanced state
5. hold your breath
The Balancing Act
190. So what can we do?
1. add IOPS
2. make sure your config servers have plenty of CPU (and IOPS)
3. slowly move chunks manually
4. approach a balanced state
5. hold your breath
6. try re-enabling the balancer
The Balancing Act
191. How to manually balance:
1. determine a chunk on a hot shard
2. monitor effects on both the source and target shards
3. move the chunk
4. allow the system to settle
5. repeat
The Balancing Act
193. ● Design ahead of time
o “NoSQL” lets you play it by ear
o but some of these decisions will bite you later
● Be willing to correct past mistakes
o dedicate time and resources to adapting
o learn how to live with the mistakes you can’t correct
Summary
194. References
● MongoDB Blog post (details on shard
migration):http://blog.mongodb.org/post/77278906988/crittercism-scaling-to-billions-of-
requests-per-day-on
● MongoDB Webinar (details on manual chunk
migrations):http://www.mongodb.com/presentations/webinar-back-basics-3-scaling-30000-requests-
second-mongodb
● Documentation on mongos routers:http://docs.mongodb.org/master/core/sharded-
cluster-query-routing/
● Documentation on the balancer:http://docs.mongodb.org/manual/tutorial/manage-
sharded-cluster-balancer/
● Documentation on shard keys:http://docs.mongodb.org/manual/core/sharding-shard-
key/
Crittercism: http://www.crittercism.com/ to learn more,
and http://www.crittercism.com/careers/ if you want to help us!
I’m Mike, I run Ops at Crittercism.
I’m going to tell you the story of how we’ve scaled to handle over 30k req/s using a storage strategy based on MongoDB
Between proposing this talk and now, we’ve actually grown some more, and now top 40-45k r/s on a daily basis
This is about 3.5B requests per day
This is really the story of learning as you go
I’ll tell you how Crittercism got started, some of the lessons we’ve learned along the way, and some advice we can share based on those experiences
I’ll tell you how Crittercism got started, some of the lessons we’ve learned along the way, and some advice we can share based on those experiences
I’ll tell you how Crittercism got started, some of the lessons we’ve learned along the way, and some advice we can share based on those experiences
I’ll tell you how Crittercism got started, some of the lessons we’ve learned along the way, and some advice we can share based on those experiences
some advice from our experience about things to do and things not to do
I’ll give you a brief overview of what we’re doing
some advice based on what we’ve learned related to router architecture
I’ll talk about some sharding considerations and the issues that can arise
I’ll tell you a story about the Mongo Balancer
I’ll be sure to leave time for Q&A
First let me tell you a bit about who we are and the problem we’re trying to solve
so they made a dating app, which shall remain unnamed
and it went over about as well as the dating scene in The Social Network
poor star rating, and they didn’t know why
So they made a “feedback widget”, and pivoted
September 2010 (from Wayback Machine)
Enable mobile app developers to allow their users to provide “criticism” of their apps (outside of the app store)
Not just a star rating
October 2011
added crash reports to help improve ratings
now we’re the ones helping you self-criticize
added live stats to see app performance in real-time
now they’re happy
the dating app didn’t pan out, but in the process of making it better, we’ve come to provide something that helps everybody improve their apps
today (2014) - what it’s evolved into
collecting tons of detailed analytics data - crash reports, groupings
Geo data launched in 2013 (just kidding, this is stored in postgres)
API & iPad app launched in 2014 - more aggregations of performance data (more ways to view it)
this guy feels overwhelmed at times
so how do we deal with all of this?
so what do we do with all of this data?
we started by setting up a db (mongo, of course)
we’ve used mongo from the start
why mongo? has RDBMS characteristics, has both OLTP and warehose-like properties, lots of flexibility, and it scales
put an ingest API in front of it
collect user feedback from our feedback widget SDK
then we start storing crash data in mongodb, too
but what makes crash data more useful is when you have app load data as well
-> crash rate (which is a differentiating feature for us)
you start catching more errors, but you still want to know about them
so let’s add handled exceptions as well
we realized crash reporting was really the product, so we discontinued the feedback
and our volume kept going up, especially app loads
app loads are the highest-volume component here, so let’s count them in a memory-based data store (redis), and batch up the writes before persisting the data to mongo
add user metadata as well, to help support desks
but that’s a different kind of data and a different volume and access pattern, so let’s add dynamodb into the mix
our volume keeps going up, so let’s cache this app data to make our responses faster
added APM, which introduced a lot of different data types and structures
so we added another ingest API and postgres into the mix (but obviously that’s not going to be part of this talk…)
so we’ve scaled to 40k/s by being willing to adapt incrementally, and willing to use whatever works / whatever it takes
2-year period
went from 700/s (60M/day)
to 40-45k/s (3.8B/day)
one of the biggest things we did to help ourselves scale was to consolidate the mongos routers
start with a sharded mongodb cluster
add your application servers
each application server has a local mongos process
each client process connects to a local mongos router
mongos routers talk to mongods to read and write data
mongos routes queries and returns results
the mongos knows where data resides thanks to the config servers, which keep track of the shard topology
(location of data throughout the cluster)
mongos routers talk to config servers as well, to maintain an updated version of the configdb
and the config servers also talk to the mongods
now let’s zoom out a bit...
and you’re going to grow, so you’re going to add more and more application servers
and they’re all maintaining these connections between
their local mongos, the config servers, and the shard servers
(not showing all the lines here, but you get the idea)
could mean your application is reading stale data, or can’t find the data it needs when it needs it (and maybe it has to retry, which means it’s now slower)
so we went from this...
to this
closer view
move the mongos routers to their own tier
be smart about how you route to them
(we use chef to keep it within the same AZ)
due to connection re-use from mongos to mongod
due to far fewer mongos processes
far fewer nodes for it to propagate to
be aware that this does introduce some disadvantages, too
we reduce this by keeping it in the same availability zone / data center
let’s look at what that implies
in the mongos-per-app-server setup, if one fails...
only that one application server is affected
but with a separate mongos tier, if one mongos router fails...
all app servers connected to it will be affected
so be aware of this, and take it into account
so maybe increase the number of mongos routers
(but still far fewer than you had before)
account for which % of your app servers going down you can tolerate
(also depends on what your driver allows you to do and how it behaves)
So it’s great to have aspects of your architecture that you can change over time.
But some things you can’t...
This is a fundamental design decision that will have huge implications for a long time, so think about it carefully.
Say you have 4 shards. Let’s say each of the World Cup teams has an app, and we shard by app_id.
Let’s distribute them evenly, as is likely to be the case.
Now, tomorrow the US and Germany are going to play each other
So those 2 apps are going to get heavy use, but they happen to be on the same shard, so uh-oh...
So those 2 apps are going to get heavy use, but they happen to be on the same shard, so uh-oh...
Now this shard isn’t happy
Higher load, more lock contention, slower response time for queries to this shard (which are your most common queries due to these apps’ popularity at this time)
So let’s add another shard (scale horizontally)...
That might help if we had more teams’ apps to add
Those new apps had somewhere to go, which is nice.
But this hasn’t helped our uneven access pattern at all.
So what else can we do? We can try scaling that shard vertically - by performing a migration procedure (see my blog post for details).
And hopefully it now cools off
But the next day there will be a different game... will those two teams’ apps be on different shards?
even if so, maybe now we have 2 pretty-hot shards instead of 1 super-hot one
so maybe you decide to just live with heterogenous shard servers to manage (probably much lesser evil than trying to re-shard)
We could shard on something other than app_id (for us, maybe that’d be crash_id, which is a randomly-generated hash)
and spread the data for each app across all shards
So now when the US vs Germany game happens tomorrow...
Now we’re reading a bit from many shards, rather than a lot from few shards
but now our queries will be a bit slower (due to having to read from many more shards)
so understand the trade-off
All of this is assuming that your cluster is balanced...
The balancer is a super-important part of a sharded mongo cluster… You should love it.
Start with an empty cluster, and start filling it with data
(we’ll denote “fullness” by going from green to red)
This is an example of what can happen when the balancer is not running
Okay, so now we have a very unbalanced cluster. 3 of our replica sets are very full, one is pretty full, and the newest one is hardly in use.
The balancer will see the full shards and one near-empty one, and will want to move a ton of chunks all at once, causing severe I/O strain on the system.
(another version)
you’re going to be adding a lot of I/O to the system when you move chunks, and it still has to be able to perform its normal functions, so over-provision
updating the configdb (when you move chunks) puts load on your config servers, so make sure they’re ready to handle it
this is tedious and will take a LONG time (more detail in a minute)
gradually you’ll get to a happier place
take a deep breath before you...
be ready to turn it off and return to step 3 if needed, then try again
See MongoDB webinar I gave (in references) for details on this procedure
seems obvious, but not always the case
best-case scenario is to make all of the right choices up front… but you’re probably not going to do that. (though hopefully you can learn a bit from our experience and minimize the wrong choices you make).
the good news is MongoDB is still working for us, despite the headaches we’ve had to deal with.