SlideShare a Scribd company logo
1 of 25
Download to read offline
CIGNEX Datamatics Confidential www.cignex.com
Scaling MongoDB with
Sharding – A Case Study
Presented by: Nikhil Naib
Title: Lead Consultant – Big Data
For MongoDB and CIGNEX Datamatics Use Only
CIGNEX Datamatics Confidential www.cignex.com
Who We Are?
• Since 2000, delivering solutions
using Open Source technologies
to
– Address business goals
– Increase business velocity
– Lower the cost of doing business
– Gain competitive advantage
• Dramatically reduce Total Cost of
Ownership (TCO) & deployment
time of IT solutions
2
400+
Implementations
450+
Experts
200+
Integrations
13
Books
5000+
Community
Contributions
Offices : America | India | UK | Europe | Singapore | Australia
Portal
Solutions Content
Solutions
Big Data Analytics
Solutions
CIGNEX Datamatics Confidential www.cignex.com
Our Big Data Analytics Practice
3
Team Size: 110+ Projects: 10+
• 20+ Big Data, 100+ Analytics & DW/BI
• Partnership –MongoDB, Cloudera, IBM
• Technical expertise –MongoDB, Hadoop,
Neo4j, Solr, Pentaho, Talend, Cognos, Business
Objects, Tableau, Jasper Reports
• Research & Analytics division with data
scientists
• Connectors/Accelerators, Frameworks
• BIGArchive – Enterprise Scale Archival
• Liferay MongoDB Store
• Drupal MongoDB Connector
Big Data Partners
Business Intelligence Expertise
CIGNEX Datamatics Confidential www.cignex.com 4
• Use Case & Database Requirements
• Why MongoDB?
• Solution
• To Shard Or Not To Shard
• Scaling with Sharding
– Sharding Basics
– Architecture and Hardware Sizing
– Sharding – Choosing the RIGHT Shard Key
– Benchmarking with Results
• Key Takeaways
Agenda
CIGNEX Datamatics Confidential www.cignex.com 5
Use Case
Load Balancer DatabaseDevices
7 Million Users
Across Geography
Users
8 devices / user
Home/Office/Any
where
High volume of
concurrent CRUD
requests routes
to DB cluster
MongoDB Data
Storage cluster
enabled with
sharding, Auto
replication for
failover, Indexes
Ability to access the digital assets of the service provider across array of
devices registered by the user with the facility of resuming (session shifting).
CIGNEX Datamatics Confidential www.cignex.com
Database Requirements
6
Agility in
Development
& Deployment
High
Availability
Flexibility
in Schema
Enterprise
Level
Support
High
Performance
CIGNEX Datamatics Confidential www.cignex.com
• Global Coverage
• 24x7 Support
• Ease of
maintenance
Why MongoDB?
7
• Programming
Language drivers
• Shorter Dev cycle
• Faster deployment
• Automatic failover
• Redundancy
• ~100% uptime
Agility in
Development
& Deployment
• Easy integration
• Ease of schema
design
• Document oriented
storage
Loose Schema
Replication
Driver Support
Strong Community
• Concurrent CRUD
• Fast Updates
• Write distribution
with Sharding
Indexes & Sharding
Availability
Flexibility
in Schema
Enterprise
Level
Support
High
Performance
CIGNEX Datamatics Confidential www.cignex.com
Sharding – What is it?
8
• Distributes single logical database across multiple mongod
nodes
• Advantages:
– Raises limits of data size beyond a single node
– Increases Write capacity
– Ability to support larger working sets
– Read scaling (By the means of targeting specific shards through
routed requests and distributed data. It is possible to support good
amount of Scatter-gather requests if used judiciously. )
CIGNEX Datamatics Confidential www.cignex.com
Sharding – When to use?
9
Storage
Drive
Your data set approaches or exceeds the storage capacity
of a single node in your system
Working Set
RAM
The size of your system’s active working set will soon
exceed the capacity of the maximum amount of RAM
for your system
Storage
Drive
Your system has a large amount of write
activity, a single MongoDB instance cannot
write data fast enough to meet demand, and all
other approaches have not reduced contention
CIGNEX Datamatics Confidential www.cignex.com
Sharding - Features
10
• Range-based Data Partitioning
• Automatic Data volume distribution
• Transparent query routing
• Horizontal capacity
– Additional write capacity through distribution
– Right shard key allows expansion of working set
CIGNEX Datamatics Confidential www.cignex.com
Solution: Approach
1111
• Schema Design
• Collections and Field DefinitionsSchema
• Document Size
• Total expected data sizeDatabase Size
• Frequency of CRUD operations
• Read/Write ratioConcurrent Load
• Replication, Backup and Automatic Failover
• Right Replication Factor (RF)
• Read Scaling for the use cases with eventual consistency.
Availability
• Working Set
• Access PatternsIndexing
• Horizontal Scaling
• Read/Write ScalingSharding
• Cluster sizing
• RAM and Disk storageHardware Sizing
CIGNEX Datamatics Confidential www.cignex.com
To Shard Or Not To Shard ?
• Sharding is a very powerful technique provided by
MongoDB to scale, but it should be used only after due
diligence, else it proves to be an over kill.
• It brings substantial amount of overhead from
infrastructure and maintenance standpoint.
• It should be used only when you have done all the possible
optimizations for the single node and still the write
capacity of the single node proves to be a bottleneck.
• In production minimum 6 server instances are required to
have a sharded cluster with no failover capability.
• In production we can not afford to have no
redundancy/failover. Hence minimum RF of 2 is required
which also brings an arbiter node into picture.
12
CIGNEX Datamatics Confidential www.cignex.com
To Shard Or Not To Shard ?
13
Inserts And Updates With No Sharding
CIGNEX Datamatics Confidential www.cignex.com
AppServerAppServerAppServer
Solution: Architecture
14
mongod
Primary
mongod
Secondary
mongod
Arbiter
Shard 1
mongodmongod
Config Servers
mongod
Routed Requests
from mongos to shards
mongod
Primary
mongod
Secondary
mongod
Arbiter
Shard 2
mongos
Load
Balancer
Data TierApp Tier
mongod
Primary
mongod
Secondary
mongod
Arbiter
Shard n
mongosmongos
CIGNEX Datamatics Confidential www.cignex.com
Shard Keys
• The ideal shard key :
– High cardinality which makes it
easy for MongoDB to split the
chunks.
– Higher “randomness”
– Targeted queries
– May need to be computed
15
Shard Keys:
Exist in every document in a
collection. MongoDB uses shard
key to distribute documents
among the shards. Just like
indexes, they can be either a
single field, or a compound key.
CIGNEX Datamatics Confidential www.cignex.com
Choosing Right Shard Key
16
Different approach for Shard Keys
• Approach 1: Random Key – UserId + AssetId
• Approach 2: Coarsely ascending key + Random Key –
YearMonth + UserId + AssetId
• Hashed Shard Keys (Not Tested/Applicable here.)
– New in version 2.4.
– Hashed shard keys use a hashed index of a single field as the shard
key to partition data across your sharded cluster.
– Field should good cardinality.
– Hashed keys work well with fields that increase monotonically.
CIGNEX Datamatics Confidential www.cignex.com
Benchmarking / Load Testing Approach
17
Automated scripts with varied load
CIGNEX Datamatics Confidential www.cignex.com
Results - INSERTS
18
Over 80 million documents
inserted with a decreasing
threshold over 10 million
Over 225 million documents
inserted at a stable rate of 6000
documents/sec
Approach 1
Approach 2
Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
CIGNEX Datamatics Confidential www.cignex.com
Results - UPDATES
19
Over 50 million documents updated
at avg. 400 documents/sec
Over 100 million documents
updated at as high as. 4000
documents/sec
Approach 1
Approach 2
Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
CIGNEX Datamatics Confidential www.cignex.com
Results – INSERT, UPDATE
20
>6000 documents/ second
>70 million records
>6000 documents/ second
>50 million records
Simultaneous INSERT
Simultaneous UPDATE
Approach 2
Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
CIGNEX Datamatics Confidential www.cignex.com
Benchmarking – Sharding Vs Non Sharding
21
Operation Sharding (YearMonth +
UserId)
Non-Sharding
INSERTS ~6000 docs/sec ~2900 docs/sec
UPDATES ~4000 docs/sec ~620 updates/sec
INSERT &
UPDATES
~6000 docs/sec &
~6100 docs/sec
~2000 docs/sec &
~600 docs/sec
Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
CIGNEX Datamatics Confidential www.cignex.com
Key Takeaways
• MongoDB scales & shines.
– Expected - 690 Million CRUD operations per day.
– Achieved - 840 Million CRUD operations per day.
• Plan early for sharding.
• Sharding scales INSERTS/UPDATES Vs Non sharding.
• There is no magic recipe for finding an ideal shard key.
• DO NOT go to production without benchmarking the shard key. Shard key cannot be
changed for the given configuration.
• Use MMS. It’s a great tool to assess the health of the cluster and identify the bottlenecks
well in advance.
• Sharding with Approach 2(Coarsely ascending Key + Random Key) provides sustained
results & better utilization of the RAM (better index locality).
22
Disclaimer: Suitable shard key depends on your data, so while this Shard Key approach delivers good results for this
use case, it is not a generic approach.
CIGNEX Datamatics Confidential www.cignex.com
Key Takeaways
23
• Routed Requests are always faster than scatter/gather requests.
• Identify the consistency requirements for the read queries. In
case of eventual consistency using read preference secondary-
preferred can help you to squeeze more performance.
• Different set of server/s for NON-Sharded collections.
• Indexes to be defined carefully. More number of Indexes
substantially bring down the write throughput.
• Sharded collections should have minimal number of indexes.
Disclaimer: Suitable shard key depends on your data, so while this Shard Key approach delivers good results for this
use case, it is not a generic approach.
CIGNEX Datamatics Confidential www.cignex.com
Our Success Stories : At a Glance
24
1
2
3
4
5
6
Big Data Analytics for Telecom
Optimum network bandwidth management & policy
configuration for telecom companies
Social Media Research Platform
for Legal Firms
Leverage social media & unstructured data analytics for collecting
supporting evidences for trials
US based Advanced GPS
Solutions Provider
Real time analysis of data accumulated from 200,000 GPS based
devices
Global Provider of Risk
Management Solutions
Collection and analysis of data from external and internal
applications delivered to a dashboard
US based Networking
Equipment Leader
Cluster configuration of high volume video uploads including 30
million inserts/hour
European Chemical Giant
Patent search – 10x increased in performance and 20x reduction
in TCO
7
US based Social Security
e-Benefits System
Managing billion object repository with enterprise search and
retrieval
CIGNEX Datamatics Confidential www.cignex.com
For queries reach out to us at info@cignex.com
Thank You. Any Questions ?
Making Open Source Work

More Related Content

What's hot

An Introduction to MongoDB Ops Manager
An Introduction to MongoDB Ops ManagerAn Introduction to MongoDB Ops Manager
An Introduction to MongoDB Ops ManagerMongoDB
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB
 
Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2MongoDB
 
Shift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraShift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraDataStax
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
 
An Introduction to MongoDB Compass
An Introduction to MongoDB CompassAn Introduction to MongoDB Compass
An Introduction to MongoDB CompassMongoDB
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading StrategiesMongoDB
 
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"Fwdays
 
Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2MongoDB
 
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger MongoDB
 
Webinar: Simplifying the Database Experience with MongoDB Atlas
Webinar: Simplifying the Database Experience with MongoDB AtlasWebinar: Simplifying the Database Experience with MongoDB Atlas
Webinar: Simplifying the Database Experience with MongoDB AtlasMongoDB
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB AtlasMongoDB
 
Introduction to MongoDB Enterprise
Introduction to MongoDB EnterpriseIntroduction to MongoDB Enterprise
Introduction to MongoDB EnterpriseMongoDB
 
MongoDB Atlas
MongoDB AtlasMongoDB Atlas
MongoDB AtlasMongoDB
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersScyllaDB
 
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDBIntroducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDBMongoDB
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 

What's hot (20)

An Introduction to MongoDB Ops Manager
An Introduction to MongoDB Ops ManagerAn Introduction to MongoDB Ops Manager
An Introduction to MongoDB Ops Manager
 
MongoDB on Azure
MongoDB on AzureMongoDB on Azure
MongoDB on Azure
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of Things
 
Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2
 
Shift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraShift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to Cassandra
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
An Introduction to MongoDB Compass
An Introduction to MongoDB CompassAn Introduction to MongoDB Compass
An Introduction to MongoDB Compass
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
 
Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2
 
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger
 
Webinar: Simplifying the Database Experience with MongoDB Atlas
Webinar: Simplifying the Database Experience with MongoDB AtlasWebinar: Simplifying the Database Experience with MongoDB Atlas
Webinar: Simplifying the Database Experience with MongoDB Atlas
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB Atlas
 
Introduction to MongoDB Enterprise
Introduction to MongoDB EnterpriseIntroduction to MongoDB Enterprise
Introduction to MongoDB Enterprise
 
MongoDB Atlas
MongoDB AtlasMongoDB Atlas
MongoDB Atlas
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
 
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDBIntroducing MongoDB Stitch, Backend-as-a-Service from MongoDB
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 

Similar to Cignex mongodb-sharding-mongodbdays

MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014Dylan Tong
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About ShardingMongoDB
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBMongoDB
 
Best Practices & Lessons Learned from Deployment of PostgreSQL
 Best Practices & Lessons Learned from Deployment of PostgreSQL Best Practices & Lessons Learned from Deployment of PostgreSQL
Best Practices & Lessons Learned from Deployment of PostgreSQLEDB
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutionssolarisyougood
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresOzgun Erdogan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectHow to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectPeak Hosting
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
 
3. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 20133. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 2013Taldor Group
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsClaudiu Barbura
 
AWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAmazon Web Services
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Germany
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...MongoDB
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Data Con LA
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 

Similar to Cignex mongodb-sharding-mongodbdays (20)

MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 
Best Practices & Lessons Learned from Deployment of PostgreSQL
 Best Practices & Lessons Learned from Deployment of PostgreSQL Best Practices & Lessons Learned from Deployment of PostgreSQL
Best Practices & Lessons Learned from Deployment of PostgreSQL
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectHow to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data Project
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
3. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 20133. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 2013
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatterns
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 
AWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data Analytics
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Cignex mongodb-sharding-mongodbdays

  • 1. CIGNEX Datamatics Confidential www.cignex.com Scaling MongoDB with Sharding – A Case Study Presented by: Nikhil Naib Title: Lead Consultant – Big Data For MongoDB and CIGNEX Datamatics Use Only
  • 2. CIGNEX Datamatics Confidential www.cignex.com Who We Are? • Since 2000, delivering solutions using Open Source technologies to – Address business goals – Increase business velocity – Lower the cost of doing business – Gain competitive advantage • Dramatically reduce Total Cost of Ownership (TCO) & deployment time of IT solutions 2 400+ Implementations 450+ Experts 200+ Integrations 13 Books 5000+ Community Contributions Offices : America | India | UK | Europe | Singapore | Australia Portal Solutions Content Solutions Big Data Analytics Solutions
  • 3. CIGNEX Datamatics Confidential www.cignex.com Our Big Data Analytics Practice 3 Team Size: 110+ Projects: 10+ • 20+ Big Data, 100+ Analytics & DW/BI • Partnership –MongoDB, Cloudera, IBM • Technical expertise –MongoDB, Hadoop, Neo4j, Solr, Pentaho, Talend, Cognos, Business Objects, Tableau, Jasper Reports • Research & Analytics division with data scientists • Connectors/Accelerators, Frameworks • BIGArchive – Enterprise Scale Archival • Liferay MongoDB Store • Drupal MongoDB Connector Big Data Partners Business Intelligence Expertise
  • 4. CIGNEX Datamatics Confidential www.cignex.com 4 • Use Case & Database Requirements • Why MongoDB? • Solution • To Shard Or Not To Shard • Scaling with Sharding – Sharding Basics – Architecture and Hardware Sizing – Sharding – Choosing the RIGHT Shard Key – Benchmarking with Results • Key Takeaways Agenda
  • 5. CIGNEX Datamatics Confidential www.cignex.com 5 Use Case Load Balancer DatabaseDevices 7 Million Users Across Geography Users 8 devices / user Home/Office/Any where High volume of concurrent CRUD requests routes to DB cluster MongoDB Data Storage cluster enabled with sharding, Auto replication for failover, Indexes Ability to access the digital assets of the service provider across array of devices registered by the user with the facility of resuming (session shifting).
  • 6. CIGNEX Datamatics Confidential www.cignex.com Database Requirements 6 Agility in Development & Deployment High Availability Flexibility in Schema Enterprise Level Support High Performance
  • 7. CIGNEX Datamatics Confidential www.cignex.com • Global Coverage • 24x7 Support • Ease of maintenance Why MongoDB? 7 • Programming Language drivers • Shorter Dev cycle • Faster deployment • Automatic failover • Redundancy • ~100% uptime Agility in Development & Deployment • Easy integration • Ease of schema design • Document oriented storage Loose Schema Replication Driver Support Strong Community • Concurrent CRUD • Fast Updates • Write distribution with Sharding Indexes & Sharding Availability Flexibility in Schema Enterprise Level Support High Performance
  • 8. CIGNEX Datamatics Confidential www.cignex.com Sharding – What is it? 8 • Distributes single logical database across multiple mongod nodes • Advantages: – Raises limits of data size beyond a single node – Increases Write capacity – Ability to support larger working sets – Read scaling (By the means of targeting specific shards through routed requests and distributed data. It is possible to support good amount of Scatter-gather requests if used judiciously. )
  • 9. CIGNEX Datamatics Confidential www.cignex.com Sharding – When to use? 9 Storage Drive Your data set approaches or exceeds the storage capacity of a single node in your system Working Set RAM The size of your system’s active working set will soon exceed the capacity of the maximum amount of RAM for your system Storage Drive Your system has a large amount of write activity, a single MongoDB instance cannot write data fast enough to meet demand, and all other approaches have not reduced contention
  • 10. CIGNEX Datamatics Confidential www.cignex.com Sharding - Features 10 • Range-based Data Partitioning • Automatic Data volume distribution • Transparent query routing • Horizontal capacity – Additional write capacity through distribution – Right shard key allows expansion of working set
  • 11. CIGNEX Datamatics Confidential www.cignex.com Solution: Approach 1111 • Schema Design • Collections and Field DefinitionsSchema • Document Size • Total expected data sizeDatabase Size • Frequency of CRUD operations • Read/Write ratioConcurrent Load • Replication, Backup and Automatic Failover • Right Replication Factor (RF) • Read Scaling for the use cases with eventual consistency. Availability • Working Set • Access PatternsIndexing • Horizontal Scaling • Read/Write ScalingSharding • Cluster sizing • RAM and Disk storageHardware Sizing
  • 12. CIGNEX Datamatics Confidential www.cignex.com To Shard Or Not To Shard ? • Sharding is a very powerful technique provided by MongoDB to scale, but it should be used only after due diligence, else it proves to be an over kill. • It brings substantial amount of overhead from infrastructure and maintenance standpoint. • It should be used only when you have done all the possible optimizations for the single node and still the write capacity of the single node proves to be a bottleneck. • In production minimum 6 server instances are required to have a sharded cluster with no failover capability. • In production we can not afford to have no redundancy/failover. Hence minimum RF of 2 is required which also brings an arbiter node into picture. 12
  • 13. CIGNEX Datamatics Confidential www.cignex.com To Shard Or Not To Shard ? 13 Inserts And Updates With No Sharding
  • 14. CIGNEX Datamatics Confidential www.cignex.com AppServerAppServerAppServer Solution: Architecture 14 mongod Primary mongod Secondary mongod Arbiter Shard 1 mongodmongod Config Servers mongod Routed Requests from mongos to shards mongod Primary mongod Secondary mongod Arbiter Shard 2 mongos Load Balancer Data TierApp Tier mongod Primary mongod Secondary mongod Arbiter Shard n mongosmongos
  • 15. CIGNEX Datamatics Confidential www.cignex.com Shard Keys • The ideal shard key : – High cardinality which makes it easy for MongoDB to split the chunks. – Higher “randomness” – Targeted queries – May need to be computed 15 Shard Keys: Exist in every document in a collection. MongoDB uses shard key to distribute documents among the shards. Just like indexes, they can be either a single field, or a compound key.
  • 16. CIGNEX Datamatics Confidential www.cignex.com Choosing Right Shard Key 16 Different approach for Shard Keys • Approach 1: Random Key – UserId + AssetId • Approach 2: Coarsely ascending key + Random Key – YearMonth + UserId + AssetId • Hashed Shard Keys (Not Tested/Applicable here.) – New in version 2.4. – Hashed shard keys use a hashed index of a single field as the shard key to partition data across your sharded cluster. – Field should good cardinality. – Hashed keys work well with fields that increase monotonically.
  • 17. CIGNEX Datamatics Confidential www.cignex.com Benchmarking / Load Testing Approach 17 Automated scripts with varied load
  • 18. CIGNEX Datamatics Confidential www.cignex.com Results - INSERTS 18 Over 80 million documents inserted with a decreasing threshold over 10 million Over 225 million documents inserted at a stable rate of 6000 documents/sec Approach 1 Approach 2 Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
  • 19. CIGNEX Datamatics Confidential www.cignex.com Results - UPDATES 19 Over 50 million documents updated at avg. 400 documents/sec Over 100 million documents updated at as high as. 4000 documents/sec Approach 1 Approach 2 Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
  • 20. CIGNEX Datamatics Confidential www.cignex.com Results – INSERT, UPDATE 20 >6000 documents/ second >70 million records >6000 documents/ second >50 million records Simultaneous INSERT Simultaneous UPDATE Approach 2 Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
  • 21. CIGNEX Datamatics Confidential www.cignex.com Benchmarking – Sharding Vs Non Sharding 21 Operation Sharding (YearMonth + UserId) Non-Sharding INSERTS ~6000 docs/sec ~2900 docs/sec UPDATES ~4000 docs/sec ~620 updates/sec INSERT & UPDATES ~6000 docs/sec & ~6100 docs/sec ~2000 docs/sec & ~600 docs/sec Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
  • 22. CIGNEX Datamatics Confidential www.cignex.com Key Takeaways • MongoDB scales & shines. – Expected - 690 Million CRUD operations per day. – Achieved - 840 Million CRUD operations per day. • Plan early for sharding. • Sharding scales INSERTS/UPDATES Vs Non sharding. • There is no magic recipe for finding an ideal shard key. • DO NOT go to production without benchmarking the shard key. Shard key cannot be changed for the given configuration. • Use MMS. It’s a great tool to assess the health of the cluster and identify the bottlenecks well in advance. • Sharding with Approach 2(Coarsely ascending Key + Random Key) provides sustained results & better utilization of the RAM (better index locality). 22 Disclaimer: Suitable shard key depends on your data, so while this Shard Key approach delivers good results for this use case, it is not a generic approach.
  • 23. CIGNEX Datamatics Confidential www.cignex.com Key Takeaways 23 • Routed Requests are always faster than scatter/gather requests. • Identify the consistency requirements for the read queries. In case of eventual consistency using read preference secondary- preferred can help you to squeeze more performance. • Different set of server/s for NON-Sharded collections. • Indexes to be defined carefully. More number of Indexes substantially bring down the write throughput. • Sharded collections should have minimal number of indexes. Disclaimer: Suitable shard key depends on your data, so while this Shard Key approach delivers good results for this use case, it is not a generic approach.
  • 24. CIGNEX Datamatics Confidential www.cignex.com Our Success Stories : At a Glance 24 1 2 3 4 5 6 Big Data Analytics for Telecom Optimum network bandwidth management & policy configuration for telecom companies Social Media Research Platform for Legal Firms Leverage social media & unstructured data analytics for collecting supporting evidences for trials US based Advanced GPS Solutions Provider Real time analysis of data accumulated from 200,000 GPS based devices Global Provider of Risk Management Solutions Collection and analysis of data from external and internal applications delivered to a dashboard US based Networking Equipment Leader Cluster configuration of high volume video uploads including 30 million inserts/hour European Chemical Giant Patent search – 10x increased in performance and 20x reduction in TCO 7 US based Social Security e-Benefits System Managing billion object repository with enterprise search and retrieval
  • 25. CIGNEX Datamatics Confidential www.cignex.com For queries reach out to us at info@cignex.com Thank You. Any Questions ? Making Open Source Work