SlideShare a Scribd company logo
1 of 27
MongoSF 4/30/2010From MySQL to MongoDB Migrating a Live Application Tony Tam
What is Wordnik Project to track language  like GPS for English Dictionary is a road block to the language Roughly 200 new words created daily Language is not static Capture information about all words Meaning is often undefined in traditional sense Machines can determine meaning through analysis Needs LOTS of data
Why should You care Every Developer can use a Robust Language API! Wordnik migrated to MongoDB > 5 Billion documents > 1.2 TB Zero application downtime Learn from our Experience
Wordnik Not just a website! But we have one Launched Wordnik entirely on MySQL Hit road bumps with insert speed ~4B rows on MyISAMtables Tables locked for 10’s of seconds during inserts But we need more data! Created elaborate update schemes to work around it Lost lots of sleep babysitting servers while researching LT solution
Wordnik + MongoDB What are our storage needs? Database vs. Application Logic No PK/FK constraints No Stored Procedures Consistency? Lots of R&D Tried most all noSQL solutions
Migrating Storage Engines Many parts to this effort Setup & Administration Software Design Optimization Many types of data at Wordnik Corpus Structured HierarchicalData User Data Migrated #1 & #2
Server Infrastructure Wordnik is Heavily Read-only Master / Slave deployment Looking at replica pairs MongoDB loves system resources Wordnik runs dedicated boxes to avoid other apps being sent to disk (aka time-out) Memory + Disk = Happy Mongo Many X the disk space of MySQL Easy pill to swallow until…
Server Infrastructure Physical Hardware 2 x 4 core CPU, 32gb RAM, FC SAN Had bad luck on VMs (you might not) Disk speed => performance
Software Design Two distinct use cases for MongoDB Identical structure, different storage engine Same underlying objects, same storage fidelity (largelykey/value) Hierarchical data structure Same underlying objects, document-oriented storage
Software Design Create BasicDBObjects from POJOs and used collection methods BasicDBObjectdbo =  new BasicDBObject("sentence",s.getSentence())  .append("rating",s.getRating()).append(...); ID Generation to manage unique _ID values Analogous to MySQL AutoIncrement behavior Compatible with MySQL Ids (more later) dbo.append("_ID", getId()); collection.save(dbo); Implemented all CRUD methods in DAO Swappable between MongoDB and MySQL at runtime
Software Design Key-Value storage use case Easy as implementing new DAOs SentenceHandlerh = new MongoDBSentenceHandler(); Save methods construct BasicDBObject and call save() on collection Implement same interface Same methods against DAO between MySQL and MongoDB versions Data Abstraction 101
Software Design What about bulk inserts? FAF Queued approach Add objects to queue, return to caller Every X seconds, process queue All objects from same collection are appended to a single List<DBObject> Call collection.insert(…) before 2M characters Reduces network overhead Very fast inserts
Software Design Hierarchical Data done more elegantly Wordnik Dictionary Model Java POJOs already had JAXB annotations Part of public REST api Used Mysql 12+ tables 13 DAOs 2500 lines of code 50 requests/second uncached Memcache needed to maintain reasonable speed
Software Design TMGO
Software Design MongoDB’s Document Storage let us… Turn the Objects into JSON via Jackson Mapper (fasterxml.com) Call save Support all fetch types, enhanced filters 1000 requests / second No explicit caching No less scary code
Software Design Saving a complex object String rawJSON = getMapper().writeValueAsString(veryComplexObject); collection.save(newBasicDBOBject(getId(),JSON.parse(rawJSON)); Fetching complex object BasicDBObjectdbo = cursor.next(); ComplexObjectobj = getMapper().readValue(dbo.toString(), ComplexObject.class); No joins, 20x faster
Migrating Data Migrating => existing data logic Use logic to select DAOs appropriately Read from old, write with new Great system test for MongoDB SentenceHandlermysqlSh = new MySQLSentenceHandler(); SentenceHandlermongoSh = new MongoDbSentenceHandler(); while(hasMoreData){ mongoSh.asyncWrite(mysqlSh.next());     ... }
Migrating Data Wordnik moved 5 billion rows from MySQL Sustained 100,000 inserts/second Migration tool was CPU bound ID generation logic, among other Wordnik reads MongoDB fast Read + create java objects @ 250k/second (!)
Going live to Production Choose your use case carefully if migrating incrementally Scary no matter what Test your perf monitoring system first! Use your DAOs from migration Turn on MongoDB on one server, monitor, tune (rollback, repeat) Full switch over when comfortable
Going live to Production Really? SentenceHandlerh = null; if(useMongoDb){ h = new MongoDbSentenceHandler(); } else{ h = new MySQLDbSentenceHandler(); } return h.find(...);
Optimizing Performance Home-grown connection pooling Master only ConnectionManager.getReadWriteConnection() Slave only ConnectionManager.getReadOnlyConnection() Round-robin all servers, bias on slaves ConnectionManager.getConnection()
Optimizing Performance Caching Had complex logic to handle cache invalidation Out-of-process caches are not free MongoDB loves your RAM Let it do your LRU cache (it will anyway) Hardware Do not skimp on your disk or RAM Indexes Schema-less design Even if no values in any document, needs to read document schema to check
Optimizing Performance Disk space Schemaless => schema per document (row) Choose your mappings wisely ({veryLongAttributeName:true}) => more disk space than ({vlan:true})
Optimizing Performance A Typical Day at the Office for MongoDB API call rate: 47.7 calls/sec
Other Tips Data Types Use caution when changing DBObjectobj = cur.next(); long id = (Long) obj.get(“IWasAnIntOnce”) Attribute names Don’t change w/o migrating existing data! WTFDMDG????
What’s next? GridFS Store audio files on disk Requires clustered file system for shared access Capped Collections (rolling out this week) UGC from MySQL => MongoDB Beg/Bribe 10gen for some Features
Questions?

More Related Content

What's hot

MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...Prasoon Kumar
 
Living with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic ApproachLiving with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic ApproachJeremy Zawodny
 
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger MongoDB
 
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...MongoDB
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesignMongoDB APAC
 
Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2MongoDB
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB AtlasMongoDB
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionBrian Enochson
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDBMongoDB
 
MMS - Monitoring, backup and management at a single click
MMS - Monitoring, backup and management at a single clickMMS - Monitoring, backup and management at a single click
MMS - Monitoring, backup and management at a single clickMatias Cascallares
 
Part Two: Building Web Apps with the MERN Stack
Part Two: Building Web Apps with the MERN StackPart Two: Building Web Apps with the MERN Stack
Part Two: Building Web Apps with the MERN StackMongoDB
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataTreasure Data, Inc.
 

What's hot (20)

MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
 
What's new in MongoDB 2.6
What's new in MongoDB 2.6What's new in MongoDB 2.6
What's new in MongoDB 2.6
 
Living with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic ApproachLiving with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic Approach
 
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger
 
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign
 
Rpsonmongodb
RpsonmongodbRpsonmongodb
Rpsonmongodb
 
Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB Atlas
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 
MMS - Monitoring, backup and management at a single click
MMS - Monitoring, backup and management at a single clickMMS - Monitoring, backup and management at a single click
MMS - Monitoring, backup and management at a single click
 
Part Two: Building Web Apps with the MERN Stack
Part Two: Building Web Apps with the MERN StackPart Two: Building Web Apps with the MERN Stack
Part Two: Building Web Apps with the MERN Stack
 
Introduction to mongoDB
Introduction to mongoDBIntroduction to mongoDB
Introduction to mongoDB
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 

Similar to Migrating from MySQL to MongoDB at Wordnik

Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relationalTony Tam
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community EngineCommunity Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community enginemathraq
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsSteven Francia
 
Beginning MEAN Stack
Beginning MEAN StackBeginning MEAN Stack
Beginning MEAN StackRob Davarnia
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentationHyphen Call
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBRick Copeland
 
Node Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialNode Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialPHP Support
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistJeremy Zawodny
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamBrian Benz
 
MongoDB Introduction and Data Modelling
MongoDB Introduction and Data Modelling MongoDB Introduction and Data Modelling
MongoDB Introduction and Data Modelling Sachin Bhosale
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBMarco Segato
 
MongoDB.pptx
MongoDB.pptxMongoDB.pptx
MongoDB.pptxSigit52
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBChun-Kai Wang
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDBNorberto Leite
 
Introduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesIntroduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesAshishRathore72
 
GWT is Smarter Than You
GWT is Smarter Than YouGWT is Smarter Than You
GWT is Smarter Than YouRobert Cooper
 
How it's made - MyGet (CloudBurst)
How it's made - MyGet (CloudBurst)How it's made - MyGet (CloudBurst)
How it's made - MyGet (CloudBurst)Maarten Balliauw
 

Similar to Migrating from MySQL to MongoDB at Wordnik (20)

Why Wordnik went non-relational
Why Wordnik went non-relationalWhy Wordnik went non-relational
Why Wordnik went non-relational
 
Open source Technology
Open source TechnologyOpen source Technology
Open source Technology
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
 
Beginning MEAN Stack
Beginning MEAN StackBeginning MEAN Stack
Beginning MEAN Stack
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentation
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
 
Node Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialNode Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js Tutorial
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at Craigslist
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure team
 
MongoDB Introduction and Data Modelling
MongoDB Introduction and Data Modelling MongoDB Introduction and Data Modelling
MongoDB Introduction and Data Modelling
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
 
Mongodb
MongodbMongodb
Mongodb
 
MongoDB.pptx
MongoDB.pptxMongoDB.pptx
MongoDB.pptx
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
 
Introduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesIntroduction to MongoDB and its best practices
Introduction to MongoDB and its best practices
 
GWT is Smarter Than You
GWT is Smarter Than YouGWT is Smarter Than You
GWT is Smarter Than You
 
How it's made - MyGet (CloudBurst)
How it's made - MyGet (CloudBurst)How it's made - MyGet (CloudBurst)
How it's made - MyGet (CloudBurst)
 

More from Tony Tam

A Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksA Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksTony Tam
 
API Design first with Swagger
API Design first with SwaggerAPI Design first with Swagger
API Design first with SwaggerTony Tam
 
Developing Faster with Swagger
Developing Faster with SwaggerDeveloping Faster with Swagger
Developing Faster with SwaggerTony Tam
 
Writer APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger InflectorWriter APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger InflectorTony Tam
 
Fastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + SwaggerFastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + SwaggerTony Tam
 
Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)Tony Tam
 
Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)Tony Tam
 
Swagger for-your-api
Swagger for-your-apiSwagger for-your-api
Swagger for-your-apiTony Tam
 
Swagger for startups
Swagger for startupsSwagger for startups
Swagger for startupsTony Tam
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without InterferenceTony Tam
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data SafeTony Tam
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's ArchitectureTony Tam
 
Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swaggerTony Tam
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the CloudTony Tam
 
Scala & Swagger at Wordnik
Scala & Swagger at WordnikScala & Swagger at Wordnik
Scala & Swagger at WordnikTony Tam
 
Introducing Swagger
Introducing SwaggerIntroducing Swagger
Introducing SwaggerTony Tam
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDBTony Tam
 

More from Tony Tam (18)

A Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification LinksA Tasty deep-dive into Open API Specification Links
A Tasty deep-dive into Open API Specification Links
 
API Design first with Swagger
API Design first with SwaggerAPI Design first with Swagger
API Design first with Swagger
 
Developing Faster with Swagger
Developing Faster with SwaggerDeveloping Faster with Swagger
Developing Faster with Swagger
 
Writer APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger InflectorWriter APIs in Java faster with Swagger Inflector
Writer APIs in Java faster with Swagger Inflector
 
Fastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + SwaggerFastest to Mobile with Scalatra + Swagger
Fastest to Mobile with Scalatra + Swagger
 
Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)Swagger APIs for Humans and Robots (Gluecon)
Swagger APIs for Humans and Robots (Gluecon)
 
Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)Love your API with Swagger (Gluecon lightning talk)
Love your API with Swagger (Gluecon lightning talk)
 
Swagger for-your-api
Swagger for-your-apiSwagger for-your-api
Swagger for-your-api
 
Swagger for startups
Swagger for startupsSwagger for startups
Swagger for startups
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without Interference
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data Safe
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
 
Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swagger
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
 
Scala & Swagger at Wordnik
Scala & Swagger at WordnikScala & Swagger at Wordnik
Scala & Swagger at Wordnik
 
Introducing Swagger
Introducing SwaggerIntroducing Swagger
Introducing Swagger
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDB
 

Migrating from MySQL to MongoDB at Wordnik

  • 1. MongoSF 4/30/2010From MySQL to MongoDB Migrating a Live Application Tony Tam
  • 2. What is Wordnik Project to track language like GPS for English Dictionary is a road block to the language Roughly 200 new words created daily Language is not static Capture information about all words Meaning is often undefined in traditional sense Machines can determine meaning through analysis Needs LOTS of data
  • 3. Why should You care Every Developer can use a Robust Language API! Wordnik migrated to MongoDB > 5 Billion documents > 1.2 TB Zero application downtime Learn from our Experience
  • 4. Wordnik Not just a website! But we have one Launched Wordnik entirely on MySQL Hit road bumps with insert speed ~4B rows on MyISAMtables Tables locked for 10’s of seconds during inserts But we need more data! Created elaborate update schemes to work around it Lost lots of sleep babysitting servers while researching LT solution
  • 5. Wordnik + MongoDB What are our storage needs? Database vs. Application Logic No PK/FK constraints No Stored Procedures Consistency? Lots of R&D Tried most all noSQL solutions
  • 6. Migrating Storage Engines Many parts to this effort Setup & Administration Software Design Optimization Many types of data at Wordnik Corpus Structured HierarchicalData User Data Migrated #1 & #2
  • 7. Server Infrastructure Wordnik is Heavily Read-only Master / Slave deployment Looking at replica pairs MongoDB loves system resources Wordnik runs dedicated boxes to avoid other apps being sent to disk (aka time-out) Memory + Disk = Happy Mongo Many X the disk space of MySQL Easy pill to swallow until…
  • 8. Server Infrastructure Physical Hardware 2 x 4 core CPU, 32gb RAM, FC SAN Had bad luck on VMs (you might not) Disk speed => performance
  • 9. Software Design Two distinct use cases for MongoDB Identical structure, different storage engine Same underlying objects, same storage fidelity (largelykey/value) Hierarchical data structure Same underlying objects, document-oriented storage
  • 10. Software Design Create BasicDBObjects from POJOs and used collection methods BasicDBObjectdbo = new BasicDBObject("sentence",s.getSentence()) .append("rating",s.getRating()).append(...); ID Generation to manage unique _ID values Analogous to MySQL AutoIncrement behavior Compatible with MySQL Ids (more later) dbo.append("_ID", getId()); collection.save(dbo); Implemented all CRUD methods in DAO Swappable between MongoDB and MySQL at runtime
  • 11. Software Design Key-Value storage use case Easy as implementing new DAOs SentenceHandlerh = new MongoDBSentenceHandler(); Save methods construct BasicDBObject and call save() on collection Implement same interface Same methods against DAO between MySQL and MongoDB versions Data Abstraction 101
  • 12. Software Design What about bulk inserts? FAF Queued approach Add objects to queue, return to caller Every X seconds, process queue All objects from same collection are appended to a single List<DBObject> Call collection.insert(…) before 2M characters Reduces network overhead Very fast inserts
  • 13. Software Design Hierarchical Data done more elegantly Wordnik Dictionary Model Java POJOs already had JAXB annotations Part of public REST api Used Mysql 12+ tables 13 DAOs 2500 lines of code 50 requests/second uncached Memcache needed to maintain reasonable speed
  • 15. Software Design MongoDB’s Document Storage let us… Turn the Objects into JSON via Jackson Mapper (fasterxml.com) Call save Support all fetch types, enhanced filters 1000 requests / second No explicit caching No less scary code
  • 16. Software Design Saving a complex object String rawJSON = getMapper().writeValueAsString(veryComplexObject); collection.save(newBasicDBOBject(getId(),JSON.parse(rawJSON)); Fetching complex object BasicDBObjectdbo = cursor.next(); ComplexObjectobj = getMapper().readValue(dbo.toString(), ComplexObject.class); No joins, 20x faster
  • 17. Migrating Data Migrating => existing data logic Use logic to select DAOs appropriately Read from old, write with new Great system test for MongoDB SentenceHandlermysqlSh = new MySQLSentenceHandler(); SentenceHandlermongoSh = new MongoDbSentenceHandler(); while(hasMoreData){ mongoSh.asyncWrite(mysqlSh.next()); ... }
  • 18. Migrating Data Wordnik moved 5 billion rows from MySQL Sustained 100,000 inserts/second Migration tool was CPU bound ID generation logic, among other Wordnik reads MongoDB fast Read + create java objects @ 250k/second (!)
  • 19. Going live to Production Choose your use case carefully if migrating incrementally Scary no matter what Test your perf monitoring system first! Use your DAOs from migration Turn on MongoDB on one server, monitor, tune (rollback, repeat) Full switch over when comfortable
  • 20. Going live to Production Really? SentenceHandlerh = null; if(useMongoDb){ h = new MongoDbSentenceHandler(); } else{ h = new MySQLDbSentenceHandler(); } return h.find(...);
  • 21. Optimizing Performance Home-grown connection pooling Master only ConnectionManager.getReadWriteConnection() Slave only ConnectionManager.getReadOnlyConnection() Round-robin all servers, bias on slaves ConnectionManager.getConnection()
  • 22. Optimizing Performance Caching Had complex logic to handle cache invalidation Out-of-process caches are not free MongoDB loves your RAM Let it do your LRU cache (it will anyway) Hardware Do not skimp on your disk or RAM Indexes Schema-less design Even if no values in any document, needs to read document schema to check
  • 23. Optimizing Performance Disk space Schemaless => schema per document (row) Choose your mappings wisely ({veryLongAttributeName:true}) => more disk space than ({vlan:true})
  • 24. Optimizing Performance A Typical Day at the Office for MongoDB API call rate: 47.7 calls/sec
  • 25. Other Tips Data Types Use caution when changing DBObjectobj = cur.next(); long id = (Long) obj.get(“IWasAnIntOnce”) Attribute names Don’t change w/o migrating existing data! WTFDMDG????
  • 26. What’s next? GridFS Store audio files on disk Requires clustered file system for shared access Capped Collections (rolling out this week) UGC from MySQL => MongoDB Beg/Bribe 10gen for some Features