This presentation covers best practices for running MongoDB on AWS. We also discuss how to utilize the automation features of MMS to spin up new clusters in minutes on AWS.
3. MongoDB
• Flexible document data model
• Rich ad-hoc queries and in-place updates
• Real-time aggregation
• Geospatial support
• Text search
• Built-in support for
– Redundancy and High Availability
– Auto-partitioning and scale out
10. Storage Configurations
• PIOPS EBS or Instance Store are best choices
• Instance Store offers best $/IOP
– Storage is ephemeral
– Must be used with MongoDB Replica Sets
• Can mix/match in a single deployment
– E.g. some Secondary nodes on EBS
– …But you’ll need several EBS volumes to maintain reasonable IOPS
parity
11. Instance Configuration
• Use EXT4 or XFS along with appropriate attributes
• Tune block device read-ahead
• Tune TCP keep alive
• Disable NUMA
• Disable zone-reclaim mode
• Increase ulimits for processes and open files
21. MongoDB Management Service
• MMS is a web-based tool that starts you from the beginning of your
MongoDB deployment lifecycle
• Use MMS to build and maintain your deployment and to manage its
lifecycle (monitoring and backup)
22. MMSChanges
• Before, MMS was used to monitor and backup
• But MMS was “late to the party” – mistakes or misconfigurations had
been applied to the initial deployment
• Monitoring was helpful but not in setting users down the right path
• Upgrade/maintenance tasks were non-trivial and very involved
30. Monitoring
Charting
MongoDB-specific
metrics and
measurements
View complete cluster
topology and metrics for
each component
Create custom
dashboards for key
metrics and nodes
Alerting
Create alerts for just
about any metric value
change
Target some or all hosts
Customizable
notifications including
SMS, HipChat, PagerDuty
Proactive
Support
Our engineers monitor
your deployment and
make suggestions
Offered to Subscription
Customers
33. Backup
Mongodump File system MMS Backup
Initial complexity Medium High Low
Confidence in
Backups
Medium Medium High
Point in time
recovery of replica
set
Sort of ☺ No Yes
System Overhead High Can be low Low
Scalable No With work Yes
Consistent
Snapshot of
Sharded System
Difficult Difficult Yes
36. Elastic MapReduce
• Background
– Quickly deploy and run Hadoop in AWS
– Tuned distributions to run on top of EC2
– Provision deployments with any number of nodes
– Supports spot and reserved pricing to minimize cost
• MongoDB
– MongoDB Connector for Hadoop
– https://github.com/mongodb/mongo-hadoop
– Bi-directional access
– MapReduce, Hive, Pig, Streaming, Spark
– MongoDB deployments or BSON backup files
38. Redshift
• Fully managed petabyte scale data warehouse as a service
• MongoDB not natively supported as an input data source
• Use Data Pipeline and EMR to move data
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html
39. Elastic Beanstalk
• Deploy and manage applications
• Handles provisioning, scaling, load
balancing
• Built on EC2, S3, SNS, Auto Scaling
• Customize and configure software
that your app needs
• Install packages, create files
• Execute commands
• Control system services
App
Server
App
Server
App
Server
Security Group
Elastic Load Balancer
Auto Scaling Group
mongosmongosmongos
MongoDB
40. Route53
• Highly available and scalable DNS service
• Hostnames can be assigned to
– EC2 instances, ELB instances, S3 buckets
• DNS load balancing with weighted round robin
• Supports hostnames for non-AWS infrastructure
• Use hostnames for all MongoDB components
• With replica sets, hostnames can ease machine replacement
• With sharded clusters, hostnames can simplify config server maintenance
• Or use Automation!