SlideShare a Scribd company logo
1 of 32
Download to read offline
Optimizing MongoDB:
Lessons Learned at Localytics

          Andrew Rollins
            June 2011
           MongoNYC
Me

•   Email: my first name @ localytics.com
•   twitter.com/andrew311
•   andrewrollins.com
•   Founder, Chief Software Architect at Localytics
Localytics

• Real time analytics for mobile applications
• Built on:
  –   Scala
  –   MongoDB
  –   Amazon Web Services
  –   Ruby on Rails
  –   and more…
Why I‟m here: brain dump!

• To share tips, tricks, and gotchas about:
  –   Documents
  –   Indexes
  –   Fragmentation
  –   Migrations
  –   Hardware
  –   MongoDB on AWS
• Basic to more advanced, a compliment to
  MongoDB Perf Tuning at MongoSF 2011
MongoDB at Localytics

• Use cases:
  – Anonymous loyalty information
  – De-duplication of incoming data
• Requirements:
  – High throughput
  – Add capacity without long down-time
• Scale today:
  – Over 1 billion events tracked in May
  – Thousands of MongoDB operations a second
Why MongoDB?

•   Stability
•   Community
•   Support
•   Drivers
•   Ease of use
•   Feature rich
•   Scale out
OPTIMIZE YOUR DATA
Documents and indexes
Shorten names

Bad:
{
    super_happy_fun_awesome_name: “yay!”
}

Good:
{
    s: “yay!”
}
Use BinData for UUIDs/hashes

Bad:
{
    u: “21EC2020-3AEA-1069-A2DD-08002B30309D”,
    // 36 bytes plus field overhead
}

 Good:
{
    u: BinData(0, “…”),
    // 16 bytes plus field overhead
}
Override _id

Turn this
{
    _id : ObjectId("47cc67093475061e3d95369d"),
    u: BinData(0, “…”) // <- this is uniquely indexed
 }
 into
{
    _id : BinData(0, “…”) // was the u field
}

Eliminated an extra index, but be careful about
locality... (more later, see Further Reading at end)
Pack „em in

• Look for cases where you can squish multiple
  “records” into a single document.
• Why?
  – Decreases number of index entries
  – Brings documents closer to the size of a page,
    alleviating potential fragmentation
• Example: comments for a blog post.
Prefix Indexes
Suppose you have an index on a large field, but that field doesn‟t have
many possible values. You can use a “prefix index” to greatly decrease
index size.

find({k: <kval>})
{
    k: BinData(0, “…”),   // 32 byte SHA256, indexed
 }
into find({p: <prefix>, k: <kval>})
{
    k: BinData(0, “…”),   // 28 byte SHA256 suffix, not indexed
    p: <32-bit integer>   // first 4 bytes of k packed in integer, indexed
}

Example: git commits
FRAGMENTATION AND MIGRATION
Hidden evils
Fragmentation

• Data on disk is memory mapped into RAM.
• Mapped in pages (4KB usually).
• Deletes/updates will cause memory
  fragmentation.


    Disk                        RAM
    doc1                        doc1
                 find(doc1)                 Page
   deleted                     deleted
     …                           …
New writes mingle with old data

                     Data
                     doc1
                                  Page
Write docX           docX
                     doc3
                     doc4         Page
                     doc5

find(docX) also pulls in old doc1, wasting RAM
Dealing with fragmentation

• “mongod --repair” on a secondary, swap with
  primary.
• 1.9 has in-place compaction, but this still holds a
  write-lock.
• MongoDB will auto-pad records.
• Pad records yourself by including and then
  removing extra bytes on first insert.
   – Alternative offered in SERVER-1810.
The Dark Side of Migrations

• Chunks are a logical construct, not physical.
• Shard keys have serious implications.
• What could go wrong?
  – Let‟s run through an example.
Suppose the following

     Chunk 1     • K is the shard key
     k: 1 to 5
                 • K is random
     Chunk 2
     k: 6 to 9


     Shard 1
    {k: 3, …}     1st write
    {k: 9, …}     2nd write
    {k: 1, …}     and so on
    {k: 7, …}
    {k: 2, …}
    {k: 8, …}
Migrate

     Chunk 1                 Chunk 1
     k: 1 to 5               k: 1 to 5

     Chunk 2
     k: 6 to 9


    Shard 1
                             Shard 2
    {k: 3, …}
                             {k: 3, …}
    {k: 9, …}    Random IO
                             {k: 1, …}
    {k: 1, …}
                             {k: 2, …}
    {k: 7, …}
    {k: 2, …}
    {k: 8, …}
Shard 1 is now heavily fragmented

     Chunk 1                  Chunk 1
     k: 1 to 5                k: 1 to 5

     Chunk 2
     k: 6 to 9


     Shard 1
                              Shard 2
     {k: 3, …}
                              {k: 3, …}
     {k: 9, …}
                              {k: 1, …}
     {k: 1, …}   Fragmented
                              {k: 2, …}
     {k: 7, …}
     {k: 2, …}
     {k: 8, …}
Why is this scenario bad?

• Random reads
• Massive fragmentation
• New writes mingle with old data
How can we avoid bad migrations?

• Pre-split, pre-chunk
• Better shard keys for better locality
   – Ideally where data in the same chunk tends to be in
     the same region of disk
Pre-split and move

• If you know your key distribution, then pre-create
  your chunks and assign them.
• See this:
  – http://blog.zawodny.com/2011/03/06/mongodb-pre-
    splitting-for-faster-data-loading-and-importing/
Better shard keys

• Usually means including a time prefix in your
  shard key (e.g., {day: 100, id: X})
• Beware of write hotspots
• How to Choose a Shard Key
  – http://www.snailinaturtleneck.com/blog/2011/01/04/ho
    w-to-choose-a-shard-key-the-card-game/
OPTIMIZING HARDWARE/CLOUD
Working Set in RAM
• EC2 m2.2xlarge, RAID0 setup with 16 EBS volumes.
• Workers hammering MongoDB with this loop, growing data:
   – Loop { insert 500 byte record; find random record }
• Thousands of ops per second when in RAM
• Much less throughput when working set (in this case, all data
  and index) grows beyond RAM.
                     Ops per second over time
                                                           In RAM



                                                           Not In RAM
Pre-fetch

• Updates hold a lock while they fetch the original
  from disk.
• Instead do a read to warm the doc in RAM under
  a shared read lock, then update.
Shard per core

• Instead of a shard per server, try a shard per
  core.
• Use this strategy to overcome write locks when
  writes per second matter.
• Why? Because MongoDB has one big write lock.
Amazon EC2

• High throughput / small working set
  – RAM matters, go with high memory instances.
• Low throughput / large working set
  –   Ephemeral storage might be OK.
  –   Remember that EBS IO goes over Ethernet.
  –   Pay attention to IO wait time (iostat).
  –   Your only shot at consistent perf: use the biggest
      instances in a family.
• Read this:
  – http://perfcap.blogspot.com/2011/03/understanding-
    and-using-amazon-ebs.html
Amazon EBS

• ~200 seeks per second per EBS on a good day
• EBS has *much* better random IO perf than
  ephemeral, but adds a dependency
• Use RAID0
• Check out this benchmark:
  – http://orion.heroku.com/past/2009/7/29/io_performanc
    e_on_ebs/
• To understand how to monitor EBS:
  – https://forums.aws.amazon.com/thread.jspa?messag
    eID=124044
Further Reading
•   MongoDB Performance Tuning
     – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning
•   Monitoring Tips
     – http://blog.boxedice.com/mongodb-monitoring/
•   Markus‟ manual
     – http://www.markus-gattol.name/ws/mongodb.html
•   Helpful/interesting blog posts
     – http://nosql.mypopescu.com/tagged/mongodb/
•   MongoDB on EC2
     – http://www.slideshare.net/jrosoff/mongodb-on-ec2-and-ebs
•   EC2 and Ephemeral Storage
     – http://www.gabrielweinberg.com/blog/2011/05/raid0-ephemeral-storage-on-aws-
       ec2.html
•   MongoDB Strategies for the Disk Averse
     – http://engineering.foursquare.com/2011/02/09/mongodb-strategies-for-the-disk-averse/
•   MongoDB Perf Tuning at MongoSF 2011
     – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning
Thank you.

• Check out Localytics for mobile analytics!
• Reach me at:
  – Email: my first name @ localytics.com
  – twitter.com/andrew311
  – andrewrollins.com

More Related Content

What's hot

Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAATemporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAACuneyt Goksu
 
OpenSPARC T1 Processor
OpenSPARC T1 ProcessorOpenSPARC T1 Processor
OpenSPARC T1 ProcessorDVClub
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflixVinay Kumar Chella
 
Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)kakugawa
 
Exadata I/O Resource Manager (Exadata IORM)
Exadata I/O Resource Manager (Exadata IORM)Exadata I/O Resource Manager (Exadata IORM)
Exadata I/O Resource Manager (Exadata IORM)Monowar Mukul
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure DataTaro L. Saito
 
LF_DPDK_Mellanox bifurcated driver model
LF_DPDK_Mellanox bifurcated driver modelLF_DPDK_Mellanox bifurcated driver model
LF_DPDK_Mellanox bifurcated driver modelLF_DPDK
 
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...Amazon Web Services
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMariaDB plc
 
Oracle rac资源管理算法与cache fusion实现浅析
Oracle rac资源管理算法与cache fusion实现浅析Oracle rac资源管理算法与cache fusion实现浅析
Oracle rac资源管理算法与cache fusion实现浅析frogd
 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesSean Chittenden
 
Druid in Spot Instances
Druid in Spot InstancesDruid in Spot Instances
Druid in Spot InstancesImply
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdfRAHULRAHU8
 
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...Ludovico Caldara
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
SSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLSSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLYoshinori Matsunobu
 

What's hot (20)

Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAATemporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
 
OpenSPARC T1 Processor
OpenSPARC T1 ProcessorOpenSPARC T1 Processor
OpenSPARC T1 Processor
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflix
 
Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)
 
Exadata I/O Resource Manager (Exadata IORM)
Exadata I/O Resource Manager (Exadata IORM)Exadata I/O Resource Manager (Exadata IORM)
Exadata I/O Resource Manager (Exadata IORM)
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Cost Based Oracle Fundamentals
Cost Based Oracle FundamentalsCost Based Oracle Fundamentals
Cost Based Oracle Fundamentals
 
FIREWALLD
FIREWALLDFIREWALLD
FIREWALLD
 
LF_DPDK_Mellanox bifurcated driver model
LF_DPDK_Mellanox bifurcated driver modelLF_DPDK_Mellanox bifurcated driver model
LF_DPDK_Mellanox bifurcated driver model
 
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...
AWS re:Invent 2016: Workshop: Using the Database Migration Service (DMS) for ...
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
 
Oracle rac资源管理算法与cache fusion实现浅析
Oracle rac资源管理算法与cache fusion实现浅析Oracle rac资源管理算法与cache fusion实现浅析
Oracle rac资源管理算法与cache fusion实现浅析
 
Treinamento Oracle GoldenGate 19c
Treinamento Oracle GoldenGate 19cTreinamento Oracle GoldenGate 19c
Treinamento Oracle GoldenGate 19c
 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practices
 
Druid in Spot Instances
Druid in Spot InstancesDruid in Spot Instances
Druid in Spot Instances
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdf
 
Data servers
Data serversData servers
Data servers
 
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
Oracle RAC, Data Guard, and Pluggable Databases: When MAA Meets Multitenant (...
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
SSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLSSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQL
 

Viewers also liked

PEO 101: What You Need To Know!
PEO 101: What You Need To Know!PEO 101: What You Need To Know!
PEO 101: What You Need To Know!ADP, LLC
 
The Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving OutcomesThe Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving OutcomesHealth Catalyst
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Legacy Typesafe (now Lightbend)
 
A Celebration Of Women In Marketing
A Celebration Of Women In MarketingA Celebration Of Women In Marketing
A Celebration Of Women In MarketingAdobe
 
Leading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in HealthcareLeading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in HealthcareHealth Catalyst
 
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...Health Catalyst
 
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...Health Catalyst
 
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to FailHealth Catalyst
 
Splunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat HuntingSplunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat HuntingSplunk
 
The 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management SystemThe 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management SystemHealth Catalyst
 
How to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical StepsHow to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical StepsHealth Catalyst
 
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesPatient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesHealth Catalyst
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewHealth Catalyst
 
Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?Health Catalyst
 

Viewers also liked (14)

PEO 101: What You Need To Know!
PEO 101: What You Need To Know!PEO 101: What You Need To Know!
PEO 101: What You Need To Know!
 
The Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving OutcomesThe Top Six Early Detection and Action Must-Haves for Improving Outcomes
The Top Six Early Detection and Action Must-Haves for Improving Outcomes
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)
 
A Celebration Of Women In Marketing
A Celebration Of Women In MarketingA Celebration Of Women In Marketing
A Celebration Of Women In Marketing
 
Leading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in HealthcareLeading Adaptive Change to Create Value in Healthcare
Leading Adaptive Change to Create Value in Healthcare
 
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
How To Avoid The 3 Most Common Healthcare Analytics Pitfalls And Related Inef...
 
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
From Installed to Stalled: Why Sustaining Outcomes Improvement Requires More ...
 
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
6 Proven Strategies for Engaging Physicians—and 4 Ways to Fail
 
Splunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat HuntingSplunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
Splunk Forum Frankfurt - 15th Nov 2017 - Threat Hunting
 
The 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management SystemThe 3 Must-Have Qualities of a Care Management System
The 3 Must-Have Qualities of a Care Management System
 
How to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical StepsHow to Sustain Healthcare Quality Improvement in 3 Critical Steps
How to Sustain Healthcare Quality Improvement in 3 Critical Steps
 
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare OutcomesPatient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
Patient Flight Path Analytics: From Airline Operations to Healthcare Outcomes
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative Review
 
Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?Quality Improvement In Healthcare: Where Is The Best Place To Start?
Quality Improvement In Healthcare: Where Is The Best Place To Start?
 

Similar to Optimizing MongoDB: Lessons Learned at Localytics

Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...npinto
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010jbellis
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDBRick Copeland
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right JobEmily Curtin
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLHyderabad Scalability Meetup
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Burak TUNGUT
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalabilityjbellis
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS Chris Harris
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesTanel Poder
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsBenjamin Darfler
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsmarkgrover
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalVigyan Jain
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesRose Toomey
 

Similar to Optimizing MongoDB: Lessons Learned at Localytics (20)

Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right Job
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
 

Recently uploaded

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Optimizing MongoDB: Lessons Learned at Localytics

  • 1. Optimizing MongoDB: Lessons Learned at Localytics Andrew Rollins June 2011 MongoNYC
  • 2. Me • Email: my first name @ localytics.com • twitter.com/andrew311 • andrewrollins.com • Founder, Chief Software Architect at Localytics
  • 3. Localytics • Real time analytics for mobile applications • Built on: – Scala – MongoDB – Amazon Web Services – Ruby on Rails – and more…
  • 4. Why I‟m here: brain dump! • To share tips, tricks, and gotchas about: – Documents – Indexes – Fragmentation – Migrations – Hardware – MongoDB on AWS • Basic to more advanced, a compliment to MongoDB Perf Tuning at MongoSF 2011
  • 5. MongoDB at Localytics • Use cases: – Anonymous loyalty information – De-duplication of incoming data • Requirements: – High throughput – Add capacity without long down-time • Scale today: – Over 1 billion events tracked in May – Thousands of MongoDB operations a second
  • 6. Why MongoDB? • Stability • Community • Support • Drivers • Ease of use • Feature rich • Scale out
  • 8. Shorten names Bad: { super_happy_fun_awesome_name: “yay!” } Good: { s: “yay!” }
  • 9. Use BinData for UUIDs/hashes Bad: { u: “21EC2020-3AEA-1069-A2DD-08002B30309D”, // 36 bytes plus field overhead } Good: { u: BinData(0, “…”), // 16 bytes plus field overhead }
  • 10. Override _id Turn this { _id : ObjectId("47cc67093475061e3d95369d"), u: BinData(0, “…”) // <- this is uniquely indexed } into { _id : BinData(0, “…”) // was the u field } Eliminated an extra index, but be careful about locality... (more later, see Further Reading at end)
  • 11. Pack „em in • Look for cases where you can squish multiple “records” into a single document. • Why? – Decreases number of index entries – Brings documents closer to the size of a page, alleviating potential fragmentation • Example: comments for a blog post.
  • 12. Prefix Indexes Suppose you have an index on a large field, but that field doesn‟t have many possible values. You can use a “prefix index” to greatly decrease index size. find({k: <kval>}) { k: BinData(0, “…”), // 32 byte SHA256, indexed } into find({p: <prefix>, k: <kval>}) { k: BinData(0, “…”), // 28 byte SHA256 suffix, not indexed p: <32-bit integer> // first 4 bytes of k packed in integer, indexed } Example: git commits
  • 14. Fragmentation • Data on disk is memory mapped into RAM. • Mapped in pages (4KB usually). • Deletes/updates will cause memory fragmentation. Disk RAM doc1 doc1 find(doc1) Page deleted deleted … …
  • 15. New writes mingle with old data Data doc1 Page Write docX docX doc3 doc4 Page doc5 find(docX) also pulls in old doc1, wasting RAM
  • 16. Dealing with fragmentation • “mongod --repair” on a secondary, swap with primary. • 1.9 has in-place compaction, but this still holds a write-lock. • MongoDB will auto-pad records. • Pad records yourself by including and then removing extra bytes on first insert. – Alternative offered in SERVER-1810.
  • 17. The Dark Side of Migrations • Chunks are a logical construct, not physical. • Shard keys have serious implications. • What could go wrong? – Let‟s run through an example.
  • 18. Suppose the following Chunk 1 • K is the shard key k: 1 to 5 • K is random Chunk 2 k: 6 to 9 Shard 1 {k: 3, …} 1st write {k: 9, …} 2nd write {k: 1, …} and so on {k: 7, …} {k: 2, …} {k: 8, …}
  • 19. Migrate Chunk 1 Chunk 1 k: 1 to 5 k: 1 to 5 Chunk 2 k: 6 to 9 Shard 1 Shard 2 {k: 3, …} {k: 3, …} {k: 9, …} Random IO {k: 1, …} {k: 1, …} {k: 2, …} {k: 7, …} {k: 2, …} {k: 8, …}
  • 20. Shard 1 is now heavily fragmented Chunk 1 Chunk 1 k: 1 to 5 k: 1 to 5 Chunk 2 k: 6 to 9 Shard 1 Shard 2 {k: 3, …} {k: 3, …} {k: 9, …} {k: 1, …} {k: 1, …} Fragmented {k: 2, …} {k: 7, …} {k: 2, …} {k: 8, …}
  • 21. Why is this scenario bad? • Random reads • Massive fragmentation • New writes mingle with old data
  • 22. How can we avoid bad migrations? • Pre-split, pre-chunk • Better shard keys for better locality – Ideally where data in the same chunk tends to be in the same region of disk
  • 23. Pre-split and move • If you know your key distribution, then pre-create your chunks and assign them. • See this: – http://blog.zawodny.com/2011/03/06/mongodb-pre- splitting-for-faster-data-loading-and-importing/
  • 24. Better shard keys • Usually means including a time prefix in your shard key (e.g., {day: 100, id: X}) • Beware of write hotspots • How to Choose a Shard Key – http://www.snailinaturtleneck.com/blog/2011/01/04/ho w-to-choose-a-shard-key-the-card-game/
  • 26. Working Set in RAM • EC2 m2.2xlarge, RAID0 setup with 16 EBS volumes. • Workers hammering MongoDB with this loop, growing data: – Loop { insert 500 byte record; find random record } • Thousands of ops per second when in RAM • Much less throughput when working set (in this case, all data and index) grows beyond RAM. Ops per second over time In RAM Not In RAM
  • 27. Pre-fetch • Updates hold a lock while they fetch the original from disk. • Instead do a read to warm the doc in RAM under a shared read lock, then update.
  • 28. Shard per core • Instead of a shard per server, try a shard per core. • Use this strategy to overcome write locks when writes per second matter. • Why? Because MongoDB has one big write lock.
  • 29. Amazon EC2 • High throughput / small working set – RAM matters, go with high memory instances. • Low throughput / large working set – Ephemeral storage might be OK. – Remember that EBS IO goes over Ethernet. – Pay attention to IO wait time (iostat). – Your only shot at consistent perf: use the biggest instances in a family. • Read this: – http://perfcap.blogspot.com/2011/03/understanding- and-using-amazon-ebs.html
  • 30. Amazon EBS • ~200 seeks per second per EBS on a good day • EBS has *much* better random IO perf than ephemeral, but adds a dependency • Use RAID0 • Check out this benchmark: – http://orion.heroku.com/past/2009/7/29/io_performanc e_on_ebs/ • To understand how to monitor EBS: – https://forums.aws.amazon.com/thread.jspa?messag eID=124044
  • 31. Further Reading • MongoDB Performance Tuning – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning • Monitoring Tips – http://blog.boxedice.com/mongodb-monitoring/ • Markus‟ manual – http://www.markus-gattol.name/ws/mongodb.html • Helpful/interesting blog posts – http://nosql.mypopescu.com/tagged/mongodb/ • MongoDB on EC2 – http://www.slideshare.net/jrosoff/mongodb-on-ec2-and-ebs • EC2 and Ephemeral Storage – http://www.gabrielweinberg.com/blog/2011/05/raid0-ephemeral-storage-on-aws- ec2.html • MongoDB Strategies for the Disk Averse – http://engineering.foursquare.com/2011/02/09/mongodb-strategies-for-the-disk-averse/ • MongoDB Perf Tuning at MongoSF 2011 – http://www.scribd.com/doc/56271132/MongoDB-Performance-Tuning
  • 32. Thank you. • Check out Localytics for mobile analytics! • Reach me at: – Email: my first name @ localytics.com – twitter.com/andrew311 – andrewrollins.com