SlideShare a Scribd company logo
1 of 50
Download to read offline
DATA SERVICES AT RACKSPACE SERIES
Pre-aggregated analytics and social feeds using MongoDB

JON HYMAN

GREG AVOLA

KENNY GORMAN

J.R. ARREDONDO

Co-Founder and CIO

CTO, Co-Founder
and Developer

Founder of ObjectRocket
Chief Architect

Director, Product Marketing

Appboy

Untappd

Rackspace

Rackspace
Data Services at Rackspace

2 acquisitions for
MongoDB and Redis apps
2 offerings in partnership
with Hortonworks for Hadoop-based
applications
Strong portfolio of
traditional offerings
2

RACKSPACE® HOSTING

|

WWW.RACKSPACE.COM

2
The right tool for the right job
Relational

Documents

Key-value

Distributed
large sets

Data
Integrity

Flexible
Schema

Fast
Retrieval

Distributed
Processing

SQL

Scale

Data
structures

Big Data

(MongoDB)

(Redis)

RACKSPACE® HOSTING

|

WWW.RACKSPACE.COM

(Hadoop)

3
What is ObjectRocket?
Think about 4 “S”

A fully managed
MongoDB service that
is purpose-built to be
highly available,
automatically sharded,
and very fast.

RACKSPACE® HOSTING

|

WWW.RACKSPACE.COM

4
ObjectRocket high level summary
Think about 4 “S”
SCALABILITY

Automatically sharded (all you have to do is select a shard key)
Multiple plans, starting at 1GB with no sharding to custom plans
Choice of regions: US East, US West, London (Sydney and Hong Kong coming up)
AWS Direct Connect or Rackspace ServiceNet

SPEED

Purpose-built design optimized for MongoDB
Fusion-IO storage for high throughput
Container-based architecture

SAFETY

Smart provisioning to reduce the possibility of downtime
Redundancy and automatic backups
Security: ACLs, MongoDB authentication, SSL termination, encrypted replication links

SUPPORT

Deep MongoDB expertise
Monitoring
Migration Services

RACKSPACE® HOSTING

|

WWW.RACKSPACE.COM

5
Who is speaking to you today?

Jon Hyman (Appboy)

Greg Avola (Untappd)

Appboy addresses the rising need of apps to
keep audiences engaged. By bringing data-driven
marketing automation, segmentation, multichannel messaging and mobile marketing experts
together, Appboy has become the pioneer of
Mobile Relationship Management. Co-Founder
and CIO Jon Hyman is responsible for the
infrastructure behind the tools designed for app
developers to segment users by behavior and
deliver personalized, relevant content to them via
multi-channel messaging, helping to increase
engagement and understanding within the app.

Untappd is a new way to socially share and
explore the world of beer with your friends and the
world. Living in the craft beer haven of New York
City, Greg is the backend developer for Untappd.
After experiencing Rare Vos for the first time, he
instantly fell in love with craft beer. While some
people enjoy reading books or watching movies,
Greg's passion is to code. That being said, after
Tim and Greg came up with the idea of Untappd,
Greg had a working prototype the next day. Being
able to combine his passion for development and
craft beer allowed Untappd to be born.
RACKSPACE® HOSTING

|

WWW.RACKSPACE.COM
Appboy and MongoDB
Jon Hyman
MongoDB & ObjectRocket Webinar, Feb. 12, 2014

@appboy

@jon_hyman
A LITTLE BIT ABOUT
US & APPBOY
(who we are and what we do)

Appboy is an app marketing
automation platform for apps
Jon Hyman
CIO :: @jon_hyman

Harvard
Bridgew
ater
Appboy improves
engagement by helping you
understand your app users
•

IDENTIFY - Understand demographics,

social and behavioral data
•

SEGMENT - Organize customers into

groups based on behaviors, events, user
attributes, and location
•

ENGAGE - Message users through push

notifications, emails, and multiple forms of
in-app messages
Use Case: Customer engagement begins with onboarding

Urban Outfitters

textPlus

Shape Magazine
Agenda
•

How flexible schemas make storing arbitrary
data super easy

•

How to quickly store time series data in
MongoDB using flexible schemas

•

Learn how flexible schemas can easily
provide breakdowns across dimensions
Flexible Schemas, Part 1:

EXTENSIBLE USER PROFILES
Extensible User Profiles
Appboy creates a rich user
profile on every user who opens
one of our customers’ apps
Extensible User Profiles
We also let our customers add their own custom
attributes
Extensible User Profiles
Let’s talk schema
{
first_name: “Jovan”,
email: “jovan+demo@appboy.com”,
dob: 1994-10-24,
gender: “F”,
country: “DE”,
...
}
Extensible User Profiles
Custom attributes can go alongside other fields!
{
first_name: “Jovan”,
email: “jovan+demo@appboy.com”,
dob: 1994-10-24,
gender: “F”,
custom: {
has_avatar: true,
highest_score: 1000,
visited_website: false,
...
},
...
}
db.users.find(…).update({$set: {“custom.has_avatar”:true}})
Extensible User Profiles
Pros

•
•

•

•
•

Easily extensible to add any number of fields
Don’t need to worry about type (bool, string, integer, float, etc.): MongoDB
handles it all
Can do atomic operations like $inc easily
Easily queryable, no need to do complicated joins against the right value
column

Cons

•
•
•

Can take up a lot of space “this_is_my_really_long_custom_attribute_name_weeeeeee”
Can end up with mismatched types across documents
{ visited_website: true }
{ visited_website: “yes” }
Extensible User Profiles - How to Improve the Cons

Space Concern
{

Tokenize values, use a field map:
{

has_avatar: 0,
highest_score: 1,
visited_website: 2

}

first_name: “Jovan”,
email: “jovan+demo@appboy.com”,
dob: 1994-10-24,
gender: “F”,
custom: {
0: true,
1: 1000,
2: false,
...
},
...
}

You should also limit the length of values
Extensible User Profiles - How to Improve the Cons

Type Constraints
Handle in the client, store expected types in a map
and coerce/reject bad values
{
has_avatar: Boolean,
highest_score: Integer,
favorite_color: String
}
Extensible User Profiles Summary
MongoDB is probably the best tool
you can use for custom attributes
• Just as simple to CRUD as any
other field on the document
• Be sure to handle space and type
concerns
•
Flexible Schemas, Part 2:

PRE-AGGREGATED ANALYTICS
What kinds of analytics does Appboy track?
•

Lots of time series data
• App opens over time
• Events over time
• Revenue over time
• Marketing campaign stats and efficacy over time
What kinds of analytics does Appboy track?
•

Breakdowns*
• Device types
• Device OS versions
• Screen resolutions
• Revenue by product

* We also care about this over time!
Typical time series collection
Log a new row for each open received
{
timestamp: 2013-11-14 00:00:00 UTC,
app_id: App identifier
}
db.app_opens.find({app_id: A, timestamp: {$gte: date}})

Pro: Really, really simple. Easy to add attribution to users.
Con: You need to aggregate the data before
drawing the chart; lots of documents read into
memory, lots of dirty pages
Fewer documents with pre-aggregation iteration 1
Create a document that groups by the time period
{
app_id: App identifier,
date: Date of the document,
hour: 0-23 based hour this document represents,

opens: Number of opens this hour
}
db.app_opens.update({date: D, app_id: A, hour: 0}, {$inc: {opens:1}})

Pro: Really easy to draw histograms
Con: We never care about an hour by itself. We lose attribution.
Fewer documents with pre-aggregation iteration 2
Create a document by day and have each hour be a field
{
app_id: App identifier,
date: Date of the document,
total_opens: Total number of opens this day,
0: Number of opens at midnight,
1: Number of opens at 1am,
...
23: Number of opens at 11pm
}
db.app_opens.update(
{date: D, app_id: A},
{$inc: {“0”:1, total:1}}
)

Pro: Document count is low, easy to use aggregation framework for
longer spans, fast: document should be in working set
Fewer documents with pre-aggregation iteration 2
•

What about looking at different dimensions?
•

App opens by device type (e.g., how do iPads

compare to iPhones?)
•

Demographics (gender, age group)
Solution!

FLEXIBLE SCHEMAS!
Fewer documents with pre-aggregation iteration 3
Dynamically add dimensions in the document
Pre-aggregated analytics
•

Pros
• Easily extensible to add other dimensions
• Still only using one document, therefore you can create
charts very quickly
• You get breakdowns over a time period for free

•

Cons
• Pre-aggregated data has no attribution
• Have to know questions ahead of time
• Not infinitely scaleable
• Timezones are awful problems to deal with
Pre-aggregated analytics summary
Get started tracking time series
data quickly
• You get breakdowns for free
• Adding dimensions is super simple
• No attribution, need to know
questions ahead of time
• Don’t just rely on pre-aggregated
analytics
•
Working with ObjectRocket
Knowledgable, MongoDB experts
• Helped us scale to many billions of data
points each month
• Support is top-notch, response times are
wicked fast (and they stay on top of
MongoDB, Inc. when tickets get escalated)
• Great for cost: 3 full nodes (not 2 and 1
arbiter), backups included, good step-wise
pricing model as you add more shards
•
Thanks! Questions?
jon@appboy.com

@appboy

@jon_hyman
Experiences with MySQL and
MongoDB in a social app
Who am I?
Live in NYC
Day-time Job: Web Developer
at ABC News (US)
Night-time Job: CTO, CoFounder of Untappd
Loves: Data, Beer and
Javascript
What is Untappd?
A social discovery and sharing network for beer drinkers
What is Untappd?
Users “check-in” to their beers, add their location, a photo,
rating, comment and then share it with their friends
What is Untappd?
Using GPS, Untappd will find local and popular bars, beers
and breweries nearby, where ever you are!
What are we going to talk about?
MongoDB and MySQL together

Improving on MySQL with MongoDB
•
•
•
•

Faster joins for friends feeds
Solving contention issues during high traffic times
Location data, first class citizen in MongoDB
Data Modeling differences

Scaling with ObjectRocket
MySQL and MongoDB together
It’s not one or the other
• What works best for the
workflow?
– MySQL worked best for
reference data for us
– Not everything moved to
MongoDB

What stayed in MySQL?
Check-ins
Users
Relationships Data
Primary Datastore

What moved to MongoDB?
Activity Feed (Friend’s Graph)
Recommendation Data
Location-based Check-ins
Faster joins in Friends Feed
Background: Relational queries
(2-5 seconds per query)

Today: MongoDB queries
(0.3-0.5 seconds per query)

• Worked well when we were
small
• Friends became important as
we grew
• Growth in user base affected
query performance
• Friends feed

• Friends become important
• “Schema-less” design
• One query to one collection
with a shard key
• More consumable by others
who develop on top of API

–
–
–
–

First thing people see in app
Speed is important
Two rows for every friendship
You don’t want to wait
Saturday night at the bar!

– JSON

Single query, not
multiple joins
Contention issues in MySQL
Built for the Weekend
• Our activity is compressed to
weekends and nights
– Always looking for the next
bottleneck
• Keep looking for ways to
streamline our process

Need for Speed and
Performance
• Two key scenarios
– Loading of friends feed
– Checking on a beer

• Initially found lock contention
– Moved data to MongoDB

– Use MongoDB and MySQL for
best performance
• Improving Efficiency with
Paging

We were averaging around 7-8k queries
per second per box without MongoDB.
After migration, we dropped to around
2-3k queries per box.
Location data in MongoDB
Query: “Who is nearby?”

Query: “What beers are
available close to me?”

• Location data in MongoDB

• Geospatial index

– Anything with a GPS location
is moved to MongoDB
(outside of friends)
– Queries such as “Who is
nearby?” become easy
– Gets really complicated with
beer
– LAT/LONG attached
– It is not like a location

– What beers are popular in my
area?
– What bars are trending in my
area?
– Unique users checking into a
location

• Queries look like English
– NEAR
– SQL queries can go bad fast
– Location is a first class citizen
in MongoDB
Data Modeling
Modeling with MongoDB

Handling Updates

• Struggled initially storing beers
• Handling two data sources
with volatile data can be dificult
• Use of TTL

• Identifying which fields can can
MySQL to back-fill
• Deletion Scripts with Message
Queues to increase
performance
• TTL helps with older records

– Documents expiration
– You are not looking at what
your friends were drinking
weeks ago
Scaling with MongoDB and ObjectRocket

• History
– Started with 5GB plan
– Two: 100GB and 20GB
– 100GB holds Activity Feed Data, while 20GB
holds recommendations and location data
• Huge performance gains moving to the
ObjectRocket architecture
In Summary
• Use MySQL and MongoDB for the right jobs
• Look for opportunities to improve query speed
• Think of complicated MySQL features that are
easy in MongoDB
– Dynamic schema
– Location data
– Queries

• Scalability
Thank you!

greg@untappd.com
untappd.com/user/gregavola
Q&A

RACKSPACE® HOSTING

|

WWW.RACKSPACE.COM
What should you do next?
http://www.rackspace.com/mongodb

I want to watch a demo of ObjectRocket
• http://goo.gl/oxpsbA
I am interested in migrating to ObjectRocket
• Send an email to support@objectrocket.com
I want to learn more about ObjectRocket

• Twitter @objectrocket
• Send an email to support@objectrocket.com
I want to learn more about MongoDB
• Visit www.mongodb.com for great learning and training resources
RACKSPACE® HOSTING

|

WWW.RACKSPACE.COM
RACKSPACE® HOSTING
US SALES: 1-800-961-2888

RACKSPACE® HOSTING

|

© RACKSPACE US, INC.

|

|

|

5000 WALZEM ROAD

|

US SUPPORT: 1-800-961-4454

SAN ANTONIO, TX 78218
|

WWW.RACKSPACE.COM

RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN TH E UNITED STATES AND OTHER COUNTRIES.

|

WWW.RACKSPACE.COM

More Related Content

Similar to Pre-Aggregated Analytics And Social Feeds Using MongoDB

YALLZI @ mongoDBWorld
YALLZI @ mongoDBWorldYALLZI @ mongoDBWorld
YALLZI @ mongoDBWorldSteve Bond
 
Building an Analytics Engine on MongoDB to Revolutionize Advertising
Building an Analytics Engine on MongoDB to Revolutionize AdvertisingBuilding an Analytics Engine on MongoDB to Revolutionize Advertising
Building an Analytics Engine on MongoDB to Revolutionize AdvertisingMongoDB
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge GraphTrey Grainger
 
MongoDB and the MEAN Stack
MongoDB and the MEAN StackMongoDB and the MEAN Stack
MongoDB and the MEAN StackMongoDB
 
DaZhangJM0203JM0203
DaZhangJM0203JM0203DaZhangJM0203JM0203
DaZhangJM0203JM0203Da Zhang
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Talent42
 
Global Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastGlobal Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastSammy Fung
 
Stacker's the way you connect the world .pptx
Stacker's the way you connect the world .pptxStacker's the way you connect the world .pptx
Stacker's the way you connect the world .pptxBOBY RISHABH KUMAR SHARMA
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
Accra MongoDB User Group
Accra MongoDB User GroupAccra MongoDB User Group
Accra MongoDB User GroupMongoDB
 
RedisConf18 - Transforming Vulnerability Telemetry with Redis Enterprise
RedisConf18 - Transforming Vulnerability Telemetry with Redis EnterpriseRedisConf18 - Transforming Vulnerability Telemetry with Redis Enterprise
RedisConf18 - Transforming Vulnerability Telemetry with Redis EnterpriseRedis Labs
 

Similar to Pre-Aggregated Analytics And Social Feeds Using MongoDB (20)

YALLZI @ mongoDBWorld
YALLZI @ mongoDBWorldYALLZI @ mongoDBWorld
YALLZI @ mongoDBWorld
 
Building an Analytics Engine on MongoDB to Revolutionize Advertising
Building an Analytics Engine on MongoDB to Revolutionize AdvertisingBuilding an Analytics Engine on MongoDB to Revolutionize Advertising
Building an Analytics Engine on MongoDB to Revolutionize Advertising
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
MongoDB and the MEAN Stack
MongoDB and the MEAN StackMongoDB and the MEAN Stack
MongoDB and the MEAN Stack
 
DaZhangJM0203JM0203
DaZhangJM0203JM0203DaZhangJM0203JM0203
DaZhangJM0203JM0203
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015
 
Global Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastGlobal Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 Forecast
 
Stacker's the way you connect the world .pptx
Stacker's the way you connect the world .pptxStacker's the way you connect the world .pptx
Stacker's the way you connect the world .pptx
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
RamaRaju_Profile
RamaRaju_ProfileRamaRaju_Profile
RamaRaju_Profile
 
PykQuery.js
PykQuery.jsPykQuery.js
PykQuery.js
 
profile_rajasekar
profile_rajasekarprofile_rajasekar
profile_rajasekar
 
Resume (2)
Resume (2)Resume (2)
Resume (2)
 
2014 Picking a Platform by Anand Kulkarni
2014 Picking a Platform by Anand Kulkarni2014 Picking a Platform by Anand Kulkarni
2014 Picking a Platform by Anand Kulkarni
 
APIs v2
APIs v2APIs v2
APIs v2
 
Accra MongoDB User Group
Accra MongoDB User GroupAccra MongoDB User Group
Accra MongoDB User Group
 
MediaGlu and Mongo DB
MediaGlu and Mongo DBMediaGlu and Mongo DB
MediaGlu and Mongo DB
 
Resume - Around 3 years_Shradha
Resume - Around 3 years_ShradhaResume - Around 3 years_Shradha
Resume - Around 3 years_Shradha
 
RedisConf18 - Transforming Vulnerability Telemetry with Redis Enterprise
RedisConf18 - Transforming Vulnerability Telemetry with Redis EnterpriseRedisConf18 - Transforming Vulnerability Telemetry with Redis Enterprise
RedisConf18 - Transforming Vulnerability Telemetry with Redis Enterprise
 

More from Rackspace

What Would You Do With More Time?
What Would You Do With More Time?What Would You Do With More Time?
What Would You Do With More Time?Rackspace
 
RMS Security Breakfast
RMS Security BreakfastRMS Security Breakfast
RMS Security BreakfastRackspace
 
6 Commonly Asked Questions from Customers Building on AWS
6 Commonly Asked Questions from Customers Building on AWS6 Commonly Asked Questions from Customers Building on AWS
6 Commonly Asked Questions from Customers Building on AWSRackspace
 
The Evolution of OpenStack – From Infancy to Enterprise
The Evolution of OpenStack – From Infancy to EnterpriseThe Evolution of OpenStack – From Infancy to Enterprise
The Evolution of OpenStack – From Infancy to EnterpriseRackspace
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?Rackspace
 
Become an IT Service Broker
Become an IT Service BrokerBecome an IT Service Broker
Become an IT Service BrokerRackspace
 
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data PlatformDeploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data PlatformRackspace
 
Rethinking People Costs in Enterprise IT
Rethinking People Costs in Enterprise ITRethinking People Costs in Enterprise IT
Rethinking People Costs in Enterprise ITRackspace
 
Starting the Journey to Managed Infrastructure Services
Starting the Journey to Managed Infrastructure ServicesStarting the Journey to Managed Infrastructure Services
Starting the Journey to Managed Infrastructure ServicesRackspace
 
Rackspace::Solve NYC - Welcome Keynote featuring Rackspace CTO John Engates
Rackspace::Solve NYC - Welcome Keynote featuring Rackspace CTO John EngatesRackspace::Solve NYC - Welcome Keynote featuring Rackspace CTO John Engates
Rackspace::Solve NYC - Welcome Keynote featuring Rackspace CTO John EngatesRackspace
 
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...Rackspace
 
Rackspace::Solve NYC - Second Stage Cloud
Rackspace::Solve NYC - Second Stage CloudRackspace::Solve NYC - Second Stage Cloud
Rackspace::Solve NYC - Second Stage CloudRackspace
 
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...Rackspace
 
Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...
Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...
Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...Rackspace
 
vCenter Site Recovery Manager: Architecting a DR Solution
vCenter Site Recovery Manager: Architecting a DR SolutionvCenter Site Recovery Manager: Architecting a DR Solution
vCenter Site Recovery Manager: Architecting a DR SolutionRackspace
 
Outsourcing IT Projects to Managed Hosting of the Cloud
Outsourcing IT Projects to Managed Hosting of the CloudOutsourcing IT Projects to Managed Hosting of the Cloud
Outsourcing IT Projects to Managed Hosting of the CloudRackspace
 
How to Bring Shadow IT to the Light
How to Bring Shadow IT to the LightHow to Bring Shadow IT to the Light
How to Bring Shadow IT to the LightRackspace
 
DR-to-the-Cloud Best Practices
DR-to-the-Cloud Best PracticesDR-to-the-Cloud Best Practices
DR-to-the-Cloud Best PracticesRackspace
 
Migrating Traditional Apps from On-Premises to the Hybrid Cloud
Migrating Traditional Apps from On-Premises to the Hybrid CloudMigrating Traditional Apps from On-Premises to the Hybrid Cloud
Migrating Traditional Apps from On-Premises to the Hybrid CloudRackspace
 
Rackspace::Solve SFO - CoreOS CEO Alex Polvi on Solving for What's Next
Rackspace::Solve SFO - CoreOS CEO Alex Polvi on Solving for What's NextRackspace::Solve SFO - CoreOS CEO Alex Polvi on Solving for What's Next
Rackspace::Solve SFO - CoreOS CEO Alex Polvi on Solving for What's NextRackspace
 

More from Rackspace (20)

What Would You Do With More Time?
What Would You Do With More Time?What Would You Do With More Time?
What Would You Do With More Time?
 
RMS Security Breakfast
RMS Security BreakfastRMS Security Breakfast
RMS Security Breakfast
 
6 Commonly Asked Questions from Customers Building on AWS
6 Commonly Asked Questions from Customers Building on AWS6 Commonly Asked Questions from Customers Building on AWS
6 Commonly Asked Questions from Customers Building on AWS
 
The Evolution of OpenStack – From Infancy to Enterprise
The Evolution of OpenStack – From Infancy to EnterpriseThe Evolution of OpenStack – From Infancy to Enterprise
The Evolution of OpenStack – From Infancy to Enterprise
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?
 
Become an IT Service Broker
Become an IT Service BrokerBecome an IT Service Broker
Become an IT Service Broker
 
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data PlatformDeploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
 
Rethinking People Costs in Enterprise IT
Rethinking People Costs in Enterprise ITRethinking People Costs in Enterprise IT
Rethinking People Costs in Enterprise IT
 
Starting the Journey to Managed Infrastructure Services
Starting the Journey to Managed Infrastructure ServicesStarting the Journey to Managed Infrastructure Services
Starting the Journey to Managed Infrastructure Services
 
Rackspace::Solve NYC - Welcome Keynote featuring Rackspace CTO John Engates
Rackspace::Solve NYC - Welcome Keynote featuring Rackspace CTO John EngatesRackspace::Solve NYC - Welcome Keynote featuring Rackspace CTO John Engates
Rackspace::Solve NYC - Welcome Keynote featuring Rackspace CTO John Engates
 
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...
 
Rackspace::Solve NYC - Second Stage Cloud
Rackspace::Solve NYC - Second Stage CloudRackspace::Solve NYC - Second Stage Cloud
Rackspace::Solve NYC - Second Stage Cloud
 
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...
 
Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...
Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...
Rackspace::Solve NYC - The Future of Applications with Ken Cochrane, Engineer...
 
vCenter Site Recovery Manager: Architecting a DR Solution
vCenter Site Recovery Manager: Architecting a DR SolutionvCenter Site Recovery Manager: Architecting a DR Solution
vCenter Site Recovery Manager: Architecting a DR Solution
 
Outsourcing IT Projects to Managed Hosting of the Cloud
Outsourcing IT Projects to Managed Hosting of the CloudOutsourcing IT Projects to Managed Hosting of the Cloud
Outsourcing IT Projects to Managed Hosting of the Cloud
 
How to Bring Shadow IT to the Light
How to Bring Shadow IT to the LightHow to Bring Shadow IT to the Light
How to Bring Shadow IT to the Light
 
DR-to-the-Cloud Best Practices
DR-to-the-Cloud Best PracticesDR-to-the-Cloud Best Practices
DR-to-the-Cloud Best Practices
 
Migrating Traditional Apps from On-Premises to the Hybrid Cloud
Migrating Traditional Apps from On-Premises to the Hybrid CloudMigrating Traditional Apps from On-Premises to the Hybrid Cloud
Migrating Traditional Apps from On-Premises to the Hybrid Cloud
 
Rackspace::Solve SFO - CoreOS CEO Alex Polvi on Solving for What's Next
Rackspace::Solve SFO - CoreOS CEO Alex Polvi on Solving for What's NextRackspace::Solve SFO - CoreOS CEO Alex Polvi on Solving for What's Next
Rackspace::Solve SFO - CoreOS CEO Alex Polvi on Solving for What's Next
 

Recently uploaded

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Recently uploaded (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Pre-Aggregated Analytics And Social Feeds Using MongoDB

  • 1. DATA SERVICES AT RACKSPACE SERIES Pre-aggregated analytics and social feeds using MongoDB JON HYMAN GREG AVOLA KENNY GORMAN J.R. ARREDONDO Co-Founder and CIO CTO, Co-Founder and Developer Founder of ObjectRocket Chief Architect Director, Product Marketing Appboy Untappd Rackspace Rackspace
  • 2. Data Services at Rackspace 2 acquisitions for MongoDB and Redis apps 2 offerings in partnership with Hortonworks for Hadoop-based applications Strong portfolio of traditional offerings 2 RACKSPACE® HOSTING | WWW.RACKSPACE.COM 2
  • 3. The right tool for the right job Relational Documents Key-value Distributed large sets Data Integrity Flexible Schema Fast Retrieval Distributed Processing SQL Scale Data structures Big Data (MongoDB) (Redis) RACKSPACE® HOSTING | WWW.RACKSPACE.COM (Hadoop) 3
  • 4. What is ObjectRocket? Think about 4 “S” A fully managed MongoDB service that is purpose-built to be highly available, automatically sharded, and very fast. RACKSPACE® HOSTING | WWW.RACKSPACE.COM 4
  • 5. ObjectRocket high level summary Think about 4 “S” SCALABILITY Automatically sharded (all you have to do is select a shard key) Multiple plans, starting at 1GB with no sharding to custom plans Choice of regions: US East, US West, London (Sydney and Hong Kong coming up) AWS Direct Connect or Rackspace ServiceNet SPEED Purpose-built design optimized for MongoDB Fusion-IO storage for high throughput Container-based architecture SAFETY Smart provisioning to reduce the possibility of downtime Redundancy and automatic backups Security: ACLs, MongoDB authentication, SSL termination, encrypted replication links SUPPORT Deep MongoDB expertise Monitoring Migration Services RACKSPACE® HOSTING | WWW.RACKSPACE.COM 5
  • 6. Who is speaking to you today? Jon Hyman (Appboy) Greg Avola (Untappd) Appboy addresses the rising need of apps to keep audiences engaged. By bringing data-driven marketing automation, segmentation, multichannel messaging and mobile marketing experts together, Appboy has become the pioneer of Mobile Relationship Management. Co-Founder and CIO Jon Hyman is responsible for the infrastructure behind the tools designed for app developers to segment users by behavior and deliver personalized, relevant content to them via multi-channel messaging, helping to increase engagement and understanding within the app. Untappd is a new way to socially share and explore the world of beer with your friends and the world. Living in the craft beer haven of New York City, Greg is the backend developer for Untappd. After experiencing Rare Vos for the first time, he instantly fell in love with craft beer. While some people enjoy reading books or watching movies, Greg's passion is to code. That being said, after Tim and Greg came up with the idea of Untappd, Greg had a working prototype the next day. Being able to combine his passion for development and craft beer allowed Untappd to be born. RACKSPACE® HOSTING | WWW.RACKSPACE.COM
  • 7. Appboy and MongoDB Jon Hyman MongoDB & ObjectRocket Webinar, Feb. 12, 2014 @appboy @jon_hyman
  • 8. A LITTLE BIT ABOUT US & APPBOY (who we are and what we do) Appboy is an app marketing automation platform for apps Jon Hyman CIO :: @jon_hyman Harvard Bridgew ater
  • 9. Appboy improves engagement by helping you understand your app users • IDENTIFY - Understand demographics, social and behavioral data • SEGMENT - Organize customers into groups based on behaviors, events, user attributes, and location • ENGAGE - Message users through push notifications, emails, and multiple forms of in-app messages
  • 10. Use Case: Customer engagement begins with onboarding Urban Outfitters textPlus Shape Magazine
  • 11. Agenda • How flexible schemas make storing arbitrary data super easy • How to quickly store time series data in MongoDB using flexible schemas • Learn how flexible schemas can easily provide breakdowns across dimensions
  • 12. Flexible Schemas, Part 1: EXTENSIBLE USER PROFILES
  • 13. Extensible User Profiles Appboy creates a rich user profile on every user who opens one of our customers’ apps
  • 14. Extensible User Profiles We also let our customers add their own custom attributes
  • 15. Extensible User Profiles Let’s talk schema { first_name: “Jovan”, email: “jovan+demo@appboy.com”, dob: 1994-10-24, gender: “F”, country: “DE”, ... }
  • 16. Extensible User Profiles Custom attributes can go alongside other fields! { first_name: “Jovan”, email: “jovan+demo@appboy.com”, dob: 1994-10-24, gender: “F”, custom: { has_avatar: true, highest_score: 1000, visited_website: false, ... }, ... } db.users.find(…).update({$set: {“custom.has_avatar”:true}})
  • 17. Extensible User Profiles Pros • • • • • Easily extensible to add any number of fields Don’t need to worry about type (bool, string, integer, float, etc.): MongoDB handles it all Can do atomic operations like $inc easily Easily queryable, no need to do complicated joins against the right value column Cons • • • Can take up a lot of space “this_is_my_really_long_custom_attribute_name_weeeeeee” Can end up with mismatched types across documents { visited_website: true } { visited_website: “yes” }
  • 18. Extensible User Profiles - How to Improve the Cons Space Concern { Tokenize values, use a field map: { has_avatar: 0, highest_score: 1, visited_website: 2 } first_name: “Jovan”, email: “jovan+demo@appboy.com”, dob: 1994-10-24, gender: “F”, custom: { 0: true, 1: 1000, 2: false, ... }, ... } You should also limit the length of values
  • 19. Extensible User Profiles - How to Improve the Cons Type Constraints Handle in the client, store expected types in a map and coerce/reject bad values { has_avatar: Boolean, highest_score: Integer, favorite_color: String }
  • 20. Extensible User Profiles Summary MongoDB is probably the best tool you can use for custom attributes • Just as simple to CRUD as any other field on the document • Be sure to handle space and type concerns •
  • 21. Flexible Schemas, Part 2: PRE-AGGREGATED ANALYTICS
  • 22. What kinds of analytics does Appboy track? • Lots of time series data • App opens over time • Events over time • Revenue over time • Marketing campaign stats and efficacy over time
  • 23. What kinds of analytics does Appboy track? • Breakdowns* • Device types • Device OS versions • Screen resolutions • Revenue by product * We also care about this over time!
  • 24. Typical time series collection Log a new row for each open received { timestamp: 2013-11-14 00:00:00 UTC, app_id: App identifier } db.app_opens.find({app_id: A, timestamp: {$gte: date}}) Pro: Really, really simple. Easy to add attribution to users. Con: You need to aggregate the data before drawing the chart; lots of documents read into memory, lots of dirty pages
  • 25. Fewer documents with pre-aggregation iteration 1 Create a document that groups by the time period { app_id: App identifier, date: Date of the document, hour: 0-23 based hour this document represents, opens: Number of opens this hour } db.app_opens.update({date: D, app_id: A, hour: 0}, {$inc: {opens:1}}) Pro: Really easy to draw histograms Con: We never care about an hour by itself. We lose attribution.
  • 26. Fewer documents with pre-aggregation iteration 2 Create a document by day and have each hour be a field { app_id: App identifier, date: Date of the document, total_opens: Total number of opens this day, 0: Number of opens at midnight, 1: Number of opens at 1am, ... 23: Number of opens at 11pm } db.app_opens.update( {date: D, app_id: A}, {$inc: {“0”:1, total:1}} ) Pro: Document count is low, easy to use aggregation framework for longer spans, fast: document should be in working set
  • 27. Fewer documents with pre-aggregation iteration 2 • What about looking at different dimensions? • App opens by device type (e.g., how do iPads compare to iPhones?) • Demographics (gender, age group)
  • 29. Fewer documents with pre-aggregation iteration 3 Dynamically add dimensions in the document
  • 30. Pre-aggregated analytics • Pros • Easily extensible to add other dimensions • Still only using one document, therefore you can create charts very quickly • You get breakdowns over a time period for free • Cons • Pre-aggregated data has no attribution • Have to know questions ahead of time • Not infinitely scaleable • Timezones are awful problems to deal with
  • 31. Pre-aggregated analytics summary Get started tracking time series data quickly • You get breakdowns for free • Adding dimensions is super simple • No attribution, need to know questions ahead of time • Don’t just rely on pre-aggregated analytics •
  • 32. Working with ObjectRocket Knowledgable, MongoDB experts • Helped us scale to many billions of data points each month • Support is top-notch, response times are wicked fast (and they stay on top of MongoDB, Inc. when tickets get escalated) • Great for cost: 3 full nodes (not 2 and 1 arbiter), backups included, good step-wise pricing model as you add more shards •
  • 34. Experiences with MySQL and MongoDB in a social app
  • 35. Who am I? Live in NYC Day-time Job: Web Developer at ABC News (US) Night-time Job: CTO, CoFounder of Untappd Loves: Data, Beer and Javascript
  • 36. What is Untappd? A social discovery and sharing network for beer drinkers
  • 37. What is Untappd? Users “check-in” to their beers, add their location, a photo, rating, comment and then share it with their friends
  • 38. What is Untappd? Using GPS, Untappd will find local and popular bars, beers and breweries nearby, where ever you are!
  • 39. What are we going to talk about? MongoDB and MySQL together Improving on MySQL with MongoDB • • • • Faster joins for friends feeds Solving contention issues during high traffic times Location data, first class citizen in MongoDB Data Modeling differences Scaling with ObjectRocket
  • 40. MySQL and MongoDB together It’s not one or the other • What works best for the workflow? – MySQL worked best for reference data for us – Not everything moved to MongoDB What stayed in MySQL? Check-ins Users Relationships Data Primary Datastore What moved to MongoDB? Activity Feed (Friend’s Graph) Recommendation Data Location-based Check-ins
  • 41. Faster joins in Friends Feed Background: Relational queries (2-5 seconds per query) Today: MongoDB queries (0.3-0.5 seconds per query) • Worked well when we were small • Friends became important as we grew • Growth in user base affected query performance • Friends feed • Friends become important • “Schema-less” design • One query to one collection with a shard key • More consumable by others who develop on top of API – – – – First thing people see in app Speed is important Two rows for every friendship You don’t want to wait Saturday night at the bar! – JSON Single query, not multiple joins
  • 42. Contention issues in MySQL Built for the Weekend • Our activity is compressed to weekends and nights – Always looking for the next bottleneck • Keep looking for ways to streamline our process Need for Speed and Performance • Two key scenarios – Loading of friends feed – Checking on a beer • Initially found lock contention – Moved data to MongoDB – Use MongoDB and MySQL for best performance • Improving Efficiency with Paging We were averaging around 7-8k queries per second per box without MongoDB. After migration, we dropped to around 2-3k queries per box.
  • 43. Location data in MongoDB Query: “Who is nearby?” Query: “What beers are available close to me?” • Location data in MongoDB • Geospatial index – Anything with a GPS location is moved to MongoDB (outside of friends) – Queries such as “Who is nearby?” become easy – Gets really complicated with beer – LAT/LONG attached – It is not like a location – What beers are popular in my area? – What bars are trending in my area? – Unique users checking into a location • Queries look like English – NEAR – SQL queries can go bad fast – Location is a first class citizen in MongoDB
  • 44. Data Modeling Modeling with MongoDB Handling Updates • Struggled initially storing beers • Handling two data sources with volatile data can be dificult • Use of TTL • Identifying which fields can can MySQL to back-fill • Deletion Scripts with Message Queues to increase performance • TTL helps with older records – Documents expiration – You are not looking at what your friends were drinking weeks ago
  • 45. Scaling with MongoDB and ObjectRocket • History – Started with 5GB plan – Two: 100GB and 20GB – 100GB holds Activity Feed Data, while 20GB holds recommendations and location data • Huge performance gains moving to the ObjectRocket architecture
  • 46. In Summary • Use MySQL and MongoDB for the right jobs • Look for opportunities to improve query speed • Think of complicated MySQL features that are easy in MongoDB – Dynamic schema – Location data – Queries • Scalability
  • 49. What should you do next? http://www.rackspace.com/mongodb I want to watch a demo of ObjectRocket • http://goo.gl/oxpsbA I am interested in migrating to ObjectRocket • Send an email to support@objectrocket.com I want to learn more about ObjectRocket • Twitter @objectrocket • Send an email to support@objectrocket.com I want to learn more about MongoDB • Visit www.mongodb.com for great learning and training resources RACKSPACE® HOSTING | WWW.RACKSPACE.COM
  • 50. RACKSPACE® HOSTING US SALES: 1-800-961-2888 RACKSPACE® HOSTING | © RACKSPACE US, INC. | | | 5000 WALZEM ROAD | US SUPPORT: 1-800-961-4454 SAN ANTONIO, TX 78218 | WWW.RACKSPACE.COM RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN TH E UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM