Chris Merz, Manager of Operations, MapMyFitness
The MMF user base more than doubled in 2011, beginning an era of rapid data growth. With Big Data come Big Data Headaches. The traditional MySQL solution for our suite of web applications had hit its ceiling. MongoDB was chosen as the candidate for exploration into NoSQL implementations, and now serves as our go-to data store for rapid application deployment. This talk will detail several of the MongoDB use cases at MMF, from serving 2TB+ of geolocation data, to time-series data for live tracking, to user sessions, app logging, and beyond. Topics will include migration patterns, indexing practices, backend storage choices, and application access patterns, monitoring, and more.
2. Introduction
l
MapMyFitness
founded
in
2007
l
Offices
in
Denver,
CO
&
AusRn,
T X
(w/
associates
in
S F,
Boston,
New
York,
L A,
and
Chicago)
l
Over
11
million
registered
users
l
~60
million
geo-‐data
routes
(runs,
rides,
walks,
hikes,
etc)
l
Core
sites,
mobile
apps,
A PI,
white-‐label
(MapMyRun,
MapMyRide,
MapMyWalk,
MapMyTri,
MapMyHike,
MapMyFitness,
MapMyRace)
3. Platform Overview and Background
• Origins
in
the
L AMP
stack
(Linux-‐Apache-‐MySQL-‐PHP)
• Scaled
well
to
~2
million
users
• Redesigned
in
Python/Django
• MySQL
backend
not
sufficient
“How
to
scale
from
2.5
to
6
million
users?”
4. Functional Scaling
• IdenRfy
high-‐growth
/
large-‐data
collecRons
• Must
be
able
to
live
outside
the
exisRng
relaRonal
schema
• Integrate
via
remote
resource
mapping
tables
in
the
R DBMS
• FuncRonal
Scaling
can
facilitate
movement
towards
a
Service
Oriented
Architecture
5. Use Case 1: Route Data Store
• Geo-‐locaRon
data
stored
in
json
blocks
• MySQL
→
S3
→
File
Server
→
MongoDB
• IniRal
size
of
~500GB,
~18
million
objects
• 3
member
replica
set
• Dedicated
iron
servers
with
24GB
R AM
8. Solution Summary
MigraRon
PaSern:
• RESTful
A PI
modified
to
use
Mongo
P HP
driver
• Implemented
a
'pass
thru'
migraRon
funcRon
• Batch
'backfill'
migraRons
via
pass-‐thru
• Data
transform
handled
in
P HP
code
9. SAN storage and MongoDB
l
Needed
to
quickly
expand
available
disk
l
Implemented
high-‐end
SAN
subsystem
l
Impressive
i/o
performance
with
MongoDB
l
MigraRon
to
SAN
painless
thanks
to
OpLog
l
Easily
expandable
due
to
the
use
of
X FS
l
Over
100
million
objects,
~7TB
of
data
10. “Gotchas”
a.k.a. Lessons Learned
• Pay
aSenRon
to
potenRal
document
size
(URlize
GridFS
for
larger
objects)
• Allocate
enough
R AM
for
indexes!
(Especially
important
for
Large
data
collecRons)
• File
dump
backups
may
not
scale
for
T B+
size
datasets.
(URlize
delayed
and
'hidden'
member
for
DR)
• Evaluate
filesystem
choice
carefully
(hint:
xfs)
11. Use Case 2: Django Session Store
• Django
sessions
not
scaling
in
MySQL
• Modified
core
methods
to
use
MongoDB
• Cutover
of
new
data
(Test
for
Mongo
data,
fallback
to
MySQL)
• MigraRon
of
data
via
export/import
(Simple
python
transform
script
using
pymongo)
12. Use Case 3: Athletic Live Tracking
• Beta
feature
uRlized
T T
+
MySQL
(did
not
scale
for
large
events)
• Required
to
be
“burstable”
for
Live
Events
(deployable
in
'The
Cloud')
• Data
size
relaRvely
small
(compared
to
Routes
D B)
• “Live”
data,
no
archiving
required
13. Use Case 3: Athletic Live Tracking
• RS
Cloud,
3+n
MongoDB
replica
set
• Quickly
scalable
via
MongoDB
replicaRon
• Highly
opRmized,
indexes
for
every
query
• Low
administraRon
overhead
(vs
MySQL)
“Gotchas”
l
Know
your
applicaRon
(tune
indexes
and
'find()'
ops
accordingly)
l
Know
your
driver
(python
pooling
driver
defaults
way
too
14. As a DBA: Ease of Administration
• ReplicaRon
made
elegant
(as
compared
with
MySQL)
• Ridiculously
simple
to
add
add'l
members
• Be
sure
to
run
IniRalSync
from
a
secondary
rs.add(
“host”
:
“livetrack_db09”,
“iniRalSync”
:
{
“state”
:
2
}
)
15. Use Case 4: Micro-Messaging Framework
• IniRal
use
case
providing
'micro-‐goals'
(user-‐defined
stats
aggregaRon)
• MongoDB
for
persistence
of
aggregates
• Python
server
+
RabbitMQ
(AMQP)
• Implemented
between
Django
and
MySQL
(service
subscribes
to
'interesRng'
stats)
• Horizontally
scalable
into
the
cloud,
with
base
capacity
on
dedicated
iron
• Messaging
system
expanded
to
handle
real-‐Rme
course
analysis
and
push
noRficaRons
16. Indexing Patterns or “Know Your App”
• Proper
indexing
criRcal
to
performance
at
scale
• MongoDB
is
ulRmately
flexible,
being
schemaless
(mongo
gives
you
enough
rope
to
hang
yourself)
• Avoid
un-‐indexed
queries
at
all
costs
(no.
really.
quickest
way
to
crater
your
app)
• Onus
on
DevOps
to
match
applicaRon
to
indexes
(know
your
query
profile,
never
assume)
• Shoot
for
'covered
queries'
wherever
possible
(answer
can
be
obtained
from
indexes
only)
17. Use Case 5: API Logging DB
• MongoDB
is
great
for
logging
(especially
if
you
log
in
json
format!)
• Good
applicaRon
for
capped
collecRons
(cap
by
data
size,
or
T TL)
• Running
with
'safe
mode'
off
for
speed
(fire-‐n-‐forget
logging
can
reduce
latency)
• Cloud
servers
are
a
good
fit
for
logging
apps
18. Capped Collections
• Used
for
retaining
a
fixed
amount
of
data
(based
on
data
size,
not
number
of
rows)
• URlizes
F IFO
method
for
pruning
collecRon
(Especially
useful
for
data
that
devalues
with
age)
• TTL
CollecRons
(2.2)
age
out
data
based
on
a
retenRon
date
limit
(useful
for
a
variety
of
data
types)
Gotcha!
Explicitly
create
the
capped
collecRon
before
any
data
is
put
into
the
system
to
avoid
auto-‐creaRon
of
collecRon
19. Monitoring MongoDB at MMF
• Monitor
for
real-‐Rme
system
events
(Faster
response
Rme
=
less
impact)
• Track
historical
performance
data
trends
(Useful
for
predicRve
failure
analysis
and
scaling
need
projecRons)
• MMS
–
MongoDB
Monitoring
Service
(Now
our
default
visual
metrics
system)
• Zabbix
open
source
monitoring
• Makoomi
Zabbix
plugins
for
MongoDB
• Mongostat
–
realRme
troubleshooRng
godsend
20. Conclusion
• MongoDB
is
extremely
versaRle,
and
can
help
your
applicaRon
scale,
even
if
you
don't
design
your
app
with
MongoDB
from
the
start.
• MongoDB
fits
well
into
both
dedicated
and
virtual
architecture
environments.
• Low
maintenance
overhead
compared
to
tradiRonal
R DMBS.
• Provides
the
horizontal
scaling
path
required
for
Internet
Sized
applicaRons.