This presentation discusses migrating data from other data stores to MongoDB Atlas. It begins by explaining why MongoDB and Atlas are good choices for data management. Several preparation steps are covered, including sizing the target Atlas cluster, increasing the source oplog, and testing connectivity. Live migration, mongomirror, and dump/restore options are presented for migrating between replicasets or sharded clusters. Post-migration steps like monitoring and backups are also discussed. Finally, migrating from other data stores like AWS DocumentDB, Azure CosmosDB, DynamoDB, and relational databases are briefly covered.
4. #MDBLocal
Why MongoDB? A: Next Gen Multi-Model data platform
Mobile
Apps
MongoDB is the most powerful data management platform in the market today
01
10JSON
Flexible Multi-Structured Schema is designed to adapt to changes
GeoSpatial
GeoJSON
2D &
2DSphere
Relational
Left-Outer Join
Views
Schema Validation
Key/Value
Horizontal Scale
In-Memory
Binaries
Files & Metadata
Encrypted
Search
Text Search
Multiple Languages
Faceted Search
Graph
Graph &
Hierarchical
Recursive
Lookups
Document
Rich JSON
Data Structures
Flexible Schema
8. #MDBLocal
Prep Items: Atlas Cluster Sizing
What is the current cluster hardware like?
RAM
Disk (size & speed)
CPUs
What is the workload like?
Reads / Sec?
Writes / Sec?
Docs / Sec?
Peak Connections?
APM: DataDog, NewRelic, ?
cmd line: mongostat, mongotop,
iostat, top, free, vmstat,
etc.
MongoDB Shell:
db.serverStatus().connections
9. #MDBLocal
Prep Items: Atlas Cluster Sizing
On-Prem or Cloud Reserved Instances
Most-likely Overprovisioned
Let ATLAS AUTO-SCALE
figure it out!
Match the current hardware
Run performance tests hours / days
Upscale: CPU or RAM > 75% (1 hr)
Dowscale: CPU and RAM < 50% (72 hrs)
10. #MDBLocal
Prep Items: Expert Atlas Cluster Sizing
#Shards by Storage = Total Storage ÷ Max Storage Per Shard
#Shards by RAM = Total RAM ÷ Max RAM Per Shard
#Shards by Cores = Total Cores ÷ Max Cores Per Shard
#Shards by IOPS = Total IOPS ÷ Max IOPS Per Shard
#Shards by Network Bandwidth = Peak Gbps ÷ Gbps Capacity Per Shard
#Shards by Disk Bandwidth = Peak Mbps ÷ Mbps Capacity Per Shard
Complete MongoDB Atlas Sizing Talk from MDBW19:
https://www.slideshare.net/mongodb/mongodb-world-2019-finding-the-right-mongodb-atlas-cluster-size-does-this-instance-make-my-app-look-fast
Work with your local MongoDB Solution Architect
11. #MDBLocal
Prep Items: Version, Driver & Retries
Ensure your current driver is 3.6+ compatible
As of Feb 2020 Atlas is 3.6+
You can still migrate from 2.6+!!
3.6 Retryable Writes
4.2 Retryable Reads
Fault Resiliency
12. #MDBLocal
Prep Items: Connectivity
● IP Whitelist | VPC Peer | Private Endpoint
● Create Users & Permissions
● Use SRV connection strings (3.6+)
vs.
13. #MDBLocal
Prep Items: Test Basic Ops mgeneratejs '{
"_id": "$objectid",
"dateTime": "$date",
"createdAt": "$date",
"Action" :"$string",
"severityLevel": "$integer",
"source": "$string",
"display": "$string",
"deviceServerIp": "$ip",
"details": {
"ipAddress": "$ip",
"macAddress": "$string",
"userId": "SYSTEM",
"method": "method"
}}' --jsonArray -n 1000000 | mongoimport -
-jsonArray --port 27017 --upsert -d atlas -c
iot
Test, Test, Test
● Simulate Production Traffic
● Your own test suite
● POCDriver
> https://github.com/johnlpage/POCDriver
● mgeneratejs
> https://github.com/rueckstiess/mgeneratejs
14. #MDBLocal
Prep Items: Increase OpLog on Source Cluster
Initial Sync
Scans every document
Replicates to target cluster
Source OpLog
Must be large enough to contain entire
initial sync oplog window in order to
replicate data changes that occurred
during initial sync
Initial Sync
Source OpLog
15. #MDBLocal
Prep Items: Upscale Target Cluster
Recommend upscale by 1+ tier higher
Consider higher IOPS too
Increase disk size lower cost alternative
over provisioned IOPS.
Turn off Auto-Scale
Force Failover before migration
17. #MDBLocal
Comparing Options
Live Migrate mongomirror dump/restore or import
RS or Sharded
Built-in cutover
RS only
Sharded: Professional Services
All deployments
Great for most customers Can avoid network hop Downtime proportional to data size
Built-in Atlas UI
Must temporarily allow
network access (hop)
Works with Network peering
User-controlled cut-over
Sharded -> RS
18. #MDBLocal
Behind the scenes
1. initial sync - copying documents
and building indexes that already
exist on the source deployment.
2. oplog sync - tailing and applying
entries from the oplog (delta).
○ “CDC” - Continues replicating
as live data is changing
○ resumable from here
19. #MDBLocal
Migration Dry Run
Prod ⇒ Staging/QA Atlas Cluster
Dry-run:
Connectivity & Security
Time to perform initial sync
Restart App(s) with
new Connection
Run initial sync at least 2 times
1) Build Staging site with Initial Sync but w/o Cutover
a) Measure time
2) Repeat w/Cutover
a) Let LM / MM reach 0s replication lag
b) Restarting Apps pointing to new Cluster
c) Test, Test, Test
30. 30
This presentation contains “forward-looking statements” within the meaning of Section 27A of the Securities Act of 1933,
as amended, and Section 21E of the Securities Exchange Act of 1934, as amended. Such forward-looking statements are
subject to a number of risks, uncertainties, assumptions and other factors that could cause actual results and the timing of
certain events to differ materially from future results expressed or implied by the forward-looking statements. Factors that
could cause or contribute to such differences include, but are not limited to, those identified our filings with the Securities
and Exchange Commission. You should not rely upon forward-looking statements as predictions of future events.
Furthermore, such forward-looking statements speak only as of the date of this presentation.
In particular, the development, release, and timing of any features or functionality described for MongoDB products
remains at MongoDB’s sole discretion. This information is merely intended to outline our general product direction and it
should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver
any material, code, or functionality. Except as required by law, we undertake no obligation to update any forward-looking
statements to reflect events or circumstances after the date of such statements.
Safe Harbor Statement
32. #MDBLocal
Let’s choose a few
MongoDB “compatible” Key-value stores Relational DBMS
AWS DocumentDB
Azure CosmosDB
AWS DynamoDB
33. #MDBLocal
AWS DocumentDB
● Compatible with MongoDB 3.6
● Use the same MongoDB Drivers/SDKs, Tools and
Applications with Amazon DocumentDB
● Automatic Patching, Failover and Recovery
● Integrated with AWS services (CloudWatch, etc.)
● Functional Differences:
https://docs.aws.amazon.com/documentdb/latest/developerguide/functio
nal-differences.html
34. #MDBLocal
AWS DocumentDB Feature Gap vs. MongoDB
Fails > 60%* of MongoDB correctness tests
• Extensive testing, debugging & refactoring
required to migrate to DocumentDB
Lags mainline features by 5 years
• No retryable reads + writes
• No transactions
• No support for storage or index compression
• Missing many aggregation stages that allow
expressive data handling
• No lossless decimal type
• No search and geospatial queries
• Indexes are not copied over via the utilities
(mongodump and mongorestore)
• No materialized views
MongoDB’s most
important value is
developer productivity
These limitations can
significantly reduce
that value
*60% for 3.6, 64% for 4.2* https://www.mongodb.com/atlas-vs-amazon-documentdb/compatibility
35. #MDBLocal
AWS DocumentDB Feature Gap vs. MongoDB
Not based on the MongoDB server
emulates the MongoDB API
does not provide complete functionality
Yet, Developers are directed to use official
MongoDB Drivers, Documentation and University
to learn how to connect and develop?
What is this experience like? ...
36. #MDBLocal
Possible Migration Options
Method Considerations
Offline mongodump / mongorestore
Does not dump admin database
Recreate user(s) (DocumentDB does not provide RBAC*)
Online
build-your-own
Does not support Kinesis Streams, Data Pipeline, etc.
Change Streams (limited) could be used (likely very fragile)
*https://docs.aws.amazon.com/documentdb/latest/developerguide/fu
nctional-differences.html#functional-differences.mongodump-
mongorestore
37. #MDBLocal
[ec2-user@ip-172-31-1-79 dump]$ mongodump --host sigsdocdb.caexbcw7y6up.us-west-
2.docdb.amazonaws.com:27017 --username snarvaez --ssl --sslCAFile /home/ec2-user/rds-
combined-ca-bundle.pem
2020-02-24T05:01:23.523+0000writing SigsTest.coll to
2020-02-24T05:01:23.525+0000done dumping SigsTest.coll (1 document)
[ec2-user@ip-172-31-1-79 bin]$ ./mongomirror --host rs0/sigsdocdb.caexbcw7y6up.us-west-
2.docdb.amazonaws.com:27017 --username snarvaez --ssl --sslCAFile /home/ec2-user/rds-
combined-ca-bundle.pem --destination Cluster0-shard-0/cluster0-shard-00-00-
tlsla.mongodb.net:27017,cluster0-shard-00-01-tlsla.mongodb.net:27017,cluster0-shard-00-02-
tlsla.mongodb.net:27017 --destinationUsername snarvaez
mongomirror version: 0.9.1
git version: 0bc45282784aa74bc25c336412efca7f84749aa4
Go version: go1.12.13
os: linux
arch: amd64
compiler: gc
2020-02-24T05:02:56.564+0000Error initializing mongomirror: could not initialize source
connection: could not connect to server: server selection error: server selection timeout
current topology: Type: Single
Servers:
Addr: sigsdocdb.caexbcw7y6up.us-west-2.docdb.amazonaws.com:27017, Type: Unknown, State:
Connected, Average RTT: 0, Last error: connection(sigsdocdb.caexbcw7y6up.us-west-
2.docdb.amazonaws.com:27017[-121]) connection is closed
38. #MDBLocal
Azure CosmosDB
Advertised Strengths
1. Globally Distributed
2. Linearly Scalable
3. Schema-Agnostic Indexing
4. Multi-Model
5. Multi-API and Multi-Language Support
6. Multi-Consistency Support
7. Indexes Data Automatically
8. High Availability
9. Guaranteed Low Latency
10. Multi-Master Support
39. #MDBLocal
Azure CosmosDB Feature Gap vs. MongoDB
Also not based on the MongoDB server - It emulates the MongoDB API
Large feature gaps vs. mainline
● No multi document ACID Transactions, Materialized Views, Retryable Writes, Lossless
Decimals, Text Search, Schema Validation, etc.
● 3.2 and 3.6 modes. 3.2 clusters cannot be upgraded to 3.6 at this time (Feb 2020)
● Numerous Incompatibilities
Many operations work differently and are not documented - left to developers to figure out
Scalability needs Handling + Rapid Cost Escalations
● RUs determine scalability - developers need error handling when max RUs exceeded
Azure Only - Lock-in
40. #MDBLocal
Possible migration options
Method Considerations
Offline mongodump / mongorestore
Not an option - backups cannot be restored to another target
Offline Via Azure Data Factory* or
Azure DocumentDB Data Migration Tool*
ETL Export to JSON / mongoimport
Online
build-your-own
Via Change Feed
Similar to using Change Streams + Azure Functions to write to Atlas
* https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-cosmos-db-mongodb-api
* https://www.microsoft.com/en-us/download/details.aspx?id=46436
* https://docs.microsoft.com/en-us/azure/cosmos-db/change-feed
41. #MDBLocal
AWS DynamoDB
DynamoDB is a wide-column key/value store. Each
entry is called Item and consists of Attributes.
Widely used in AWS Ecosystem ⇒ AWS Only
Migration may required due to
● Increased / Unpredictable Cost
● Functionality insufficient for Business or Dev
Productivity - App has outgrown the data store
● etc. https://aws.amazon.com/blogs/database/choosing-the-right-
dynamodb-partition-key/