SlideShare a Scribd company logo
1 of 89
1© 2017 Rogue Wave Software, Inc. All Rights Reserved. 1
What You Need to Know
Before You Deploy Your Next
MongoDB Implementation
2© 2017 Rogue Wave Software, Inc. All Rights Reserved. 2
Presenter
Bill Crowell
Enterprise Architect
Open Source Support
Rogue Wave Software
3© 2017 Rogue Wave Software, Inc. All Rights Reserved. 3
Who am I?
• Enterprise Architect in the RogueWave Open Source Software group
• 22+ years experience encompassing EDI, insurance, retail, entertainment, banking, and
health care sectors (Fortune 500 companies)
• Worked in various software roles related to full stack development including:
– User interface (Java Server Faces, JavaScript, Spring MVC, Node.js, Angular,
some PHP)
– Middleware (MQ/JMS, WebServices/SOA, and REST)
– Databases (NoSQL and RDBMS)
– Big Data (Apache Hadoop and Spark)
– Security (RBAC and SSO)
– DevOps (Jenkins CI with Docker and SCM)
– Infrastructure
– Testing (performance)
– Training and mentorship
• Worked with proprietary and OSS projects
• Contributed to OSS SSO-project Central Authentication Service
• Primary focus is helping others to applying open source in the enterprise.
4© 2017 Rogue Wave Software, Inc. All Rights Reserved. 4
Why should I be in this session?
• Your RDBMS or current database system is not meeting your needs.
• You are considering deploying a NoSQL database.
• You are thinking about deploying MongoDB.
• You are currently developing or are in production with MongoDB.
• You have deployed MongoDB and need help!
This presentation is real use cases that are common causes of pain for
MongoDB deployments.
This is where I see customers commonly struggle with Mongo.
5© 2017 Rogue Wave Software, Inc. All Rights Reserved. 5
What is MongoDB?
• Developed by in 2007 (based in New York).
• Released as open source in 2009 under the GNU Affero General Public
License (AGPL) and Apache License (language drivers)
• Community free edition has been downloaded 30 million times 1
• NoSQL Database: Uses collections of documents instead of rows in a table.
• Written in C++, C and JavaScript
• Dynamic schema design: Provides flexibility and changes usually have
minimal impact on code. Be agile.
1 https://techcrunch.com/2017/09/21/database-provider-mongodb-has-filed-to-go-public/
6© 2017 Rogue Wave Software, Inc. All Rights Reserved. 6
A bit of history first…
• changes name to MongoDB to align closer to the product name in
August 2013.
• June 2016, MongoDB Atlas is released on the cloud initially on Amazon Web
Services.
• June 2017, MongoDB Atlas includes and . MongoDB
Stitch (Beta) is launched which sits on top of MongoDB and aggregates 3rd-
party REST services like:
• Filed to go public on September 21st, 20171
• Last Thursday, MongoDB stock was traded on the . The stock
skyrocketed 33% on the first day of trading ($34/share). @ $30.50 today.
• The company is worth an estimated $1.6 billion dollars.
1 https://www.sec.gov/Archives/edgar/data/1441816/000104746917006014/a2233365zs-1.htm
7© 2017 Rogue Wave Software, Inc. All Rights Reserved. 7
Who uses MongoDB?
Source: https://www.mongodb.com/who-uses-mongodb
About 4,300 paid licenses1
1 https://www.sec.gov/Archives/edgar/data/1441816/000104746917006396/a2233556zs-1a.htm
a
8© 2017 Rogue Wave Software, Inc. All Rights Reserved. 8
MongoDB Use Cases
High-volume data where structure can change…
• Real-time Analytics: Collect data from censors
“ uses it with special screwdriver drills that can measure the torque of the
screws as they install them in airplanes. By tracking torque, they can make airplanes
safer…”1
• Product Catalogs: Categories and inventory
• Reporting: Website traffic logs
• Stock Trading
• Click-stream Ad Campaigns
• Social Media
• Data Analysis: Call records and data mining
• Demographics and Biometrics 2
• Content Management: News, comment fields, photos
• Metadata and Asset Management: Type ahead searches
• Report Aggregation: Merge data from desperate systems into one record.
https://www.mongodb.com/use-cases
1 http://www.businessinsider.com/people-told-the-mongodb-founders-they-were-completely-crazy-2017-10
2 https://techcrunch.com/2013/12/06/inside-indias-aadhar-the-worlds-biggest-biometrics-database/
9© 2017 Rogue Wave Software, Inc. All Rights Reserved. 9
How is Data Stored in MongoDB?
• JSON is stored in Mongo as binary-encoded serialization format called
BSON (which is transparent to the developer).
• BSON adds extra info to documents to allow for easier traversal.
• BSON provides additional data types not part of JSON spec (Date and
BinData types) and ordered fields: http://bsonspec.org
{
"conference": "ZendCon 2017”
}
x28x00x00x00
x02
conferencex00
x06x00x00x00ZendCon 2017x00
x00
// total document size
// 0x02 = type String
// field name (including collection name)
// field value
// 0x00 = type EOO ('end of object')
ad
10© 2017 Rogue Wave Software, Inc. All Rights Reserved. 10
Working with BSON in PHP
http://php.net/manual/en/book.bson.php
11© 2017 Rogue Wave Software, Inc. All Rights Reserved. 11
MongoDB Versions
Free: Open source: https://github.com/mongodb/mongo
• Command-line shell, database and config server, query router, basic troubleshooting
tools, MongoDB Monitoring Service (MMS)
Paid: GUIs and Cloud-based Management (Atlas)
• On-Premise
– Professional: Ops Manager (monitoring, query optimization, automate
configuration
– Enterprise Advanced: Ops Manager, additional storage engines (encrypted and
in-memory), Compass, advanced security (LDAP, RBAC, TLS), auditing
• Cloud-based Service: Atlas
– Free: 512MB Elastic Block Storage (EBS), shared RAM, 3-node replica set,
monitoring and alerts, encryption
– Essential: Starts at 8₵/hour, elastic scaling, snapshot backups (1st GB free, then
$2.50/GB/month), performance panel, enhanced monitoring and alerts, uptime
SLA.
– Professional: Compass, proactive issue detection, schema/database design
support, enhanced support, 2-hour support SLA
12© 2017 Rogue Wave Software, Inc. All Rights Reserved. 12
MongoDB Total Cost of Ownership
A Total Cost of Ownership Comparison of MongoDB & Oracle1
http://s3.amazonaws.com/info-mongodb-com/TCO_MongoDB_vs._Oracle.pdf
Oracle software maintenance and support costs (as analogous as possible to
MongoDB configurations):
• Oracle Database Enterprise Edition ($47,500 per core) plus Oracle RAC pricing
($23,000 per core), for a total of $70,500 per core. Discounts of 0% for small
deployments to 80% for large.
• Demands 50% of one DBA’s time (small). Requiring 1.5 full-time DBAs (large)
• “We assume a conservative 50% discount on the list price for the smaller and
larger projects. Additionally, we apply a further 50% discount on top of that to
account for Oracle's core processor licensing factor. Amounts to $17,625 per core
for both projects.”
MongoDB software maintenance and support costs:
• Smaller projects: $11,990/server/year
• Larger projects: $10,800/server/year (%10 discount)
• Demands 25% of one DBA’s time (small) and 75% of one DBA’s time (large).
• Assuming 22% of license costs for Oracle.
1 Thanks to Richard Sherrard (Director Product Management in Product Management) for the link!
a
Assumes:
10% of hardware maintenance and support costs of 10% for both.
Hardware maintenance and support costs of 10% of the
hardware purchase price for both MongoDB and
Oracle.
13© 2017 Rogue Wave Software, Inc. All Rights Reserved. 13
Development versus Deployment
A Tale of Two Roads
The most common issue is not with application development, but with
deploying the application to production and not understanding the
application’s requirements and misconfiguring the software or under
sizing the infrastructure.
Usually this comes at a time when changing the software configuration (and
hardware if on-premise) is very difficult. Code (usually) is simpler to change.
14© 2017 Rogue Wave Software, Inc. All Rights Reserved. 14
MongoDB Components
• MongoDB Driver: Talks to mongos/mongod
• mongos: Query router for sharded clusters; routing proxy
process.
• mongod: Database including primary/secondary nodes
• mongod: Config server which stores metadata
• mongo: Interactive JavaScript shell (type-ahead)
Zend Server
Driver
Application
Query Router
(mongos)
Data Center 1
(3-node replica set)
Primary (mongod)
Config Servers
(3-node replica set)
Data Center 1 Config Server 1
Data Center 2 Config Server 2
Data Center 3 Config Server 3
Secondary
(mongod)
Secondary
(mongod)
Data Center 2
(3-node replica set)
Primary (mongod)
Secondary
(mongod)
Secondary
(mongod)
a
15© 2017 Rogue Wave Software, Inc. All Rights Reserved. 15
Why would I use MongoDB?
Example: RDBMS One-To-Many Relationship
SELECT * FROM USERS INNER JOIN EMAIL_ADDRESS ON USERS.USER_ID =
EMAIL_ADDRESS.USER_ID;
Much thought must be taken to not change the schema later on. This is hard to do
especially if the tables are shared by other applications.
Field Value
user_id 1
username wcrowell
firstname William
lastname Crowell
Field Value
id 10
user_id 1
email william.crowell@abc.com
Field Value
id 11
user_id 1
email wcrowell@xyz.com
Users Table
Email Address Table
16© 2017 Rogue Wave Software, Inc. All Rights Reserved. 16
Why would I use MongoDB?
Example: MongoDB One-To-Many Relationship
{
"_id": 1,
"username": "wcrowell",
"firstname": "William",
"lastname": "Crowell",
"email": [
"william.crowell@abc.com",
"wcrowell@xyz.com"
]
}
Document-Based Data Model
• Nested fields allowing for a richer data model and requiring less joins than tables.
• Database changes are easily made: Lack of schema can make your data model more fluid.
• You can collapse a multi-table RDBMS model into a single MongoDB collection using arrays
and nested documents.
• Many-to-many relationships can be modeled as arrays in MongoDB.
17© 2017 Rogue Wave Software, Inc. All Rights Reserved. 17
MongoDB Levels of Granularity
• A database contains a collection of documents. Top-level named grouping in the system.
• A collection is a group of documents similar to a table.
• A document is similar to a row in a table and is the simplest unit of data.
• A chunk is a group of documents clustered by values on a field. (more on this later)
Important limits: https://docs.mongodb.com/manual/reference/limits/
• 16MB limit on documents.
• Maximum document nesting depth is 100.
Database: drinks
Collection: beers
Chunk: All documents
with field ‘beer’ from a - c
Document:
{”beer”: “Bud”}
Collection: wines
Document:
{”beer”: “Blue Moon”}
Document:
{”beer”: “Bass”}
Chunk: All documents
with field ‘beer’ from d - g
Document:
{”beer”: “Corona”}
Document:
{”beer”: “Dogfish”}
Document:
{”beer”: “Guiness”}
Chunk: All documents
with field ‘wine’ from a - c
Document:
{”wine”: “Blush”}
Document:
{”wine”: “Chardonnay”}
Document:
{”wine”: “Champagne”}
Chunk: All documents
with field ‘wine’ from d - g
Document:
{”wine”: “Dolcetto”}
Document:
{”wine”: “Eiswein”}
Document:
{”wine”: “Frascati”}
What manages how documents and collections are stored?
a
18© 2017 Rogue Wave Software, Inc. All Rights Reserved. 18
Storage Engines
What is a storage engine?
“A storage engine is the part of a database that is responsible for managing
how data is stored, both in memory and on disk. Many databases support
multiple storage engines, where different engines perform better for specific
workloads. For example, one storage engine might offer better performance
for read-heavy workloads, and another might support a higher throughput for
write operations.”1
Pluggable storage engines are a key feature in many open source projects
(middleware and database) which allows the user to tailor the software to
their application’s needs.
1 https://docs.mongodb.com/manual/faq/storage/#what-is-a-storage-engine
19© 2017 Rogue Wave Software, Inc. All Rights Reserved. 19
Storage Engines and Locking
Locking in Mongo has come a long way. A single write operation on a
document used to lock the entire Mongo instance. This meant every
database was locked for a single write operation on a document. Very
inefficient. For a Big Data application that is a big deal.
Database locking can directly impact performance. The finer grained a lock
is the better in terms of contention and performance.
All databases, regardless of RDBMS or NoSQL, implement some type of
locking to ensure consistency. The locking mechanism in pluggable storage
engines available with Mongo can differ between implementations.
20© 2017 Rogue Wave Software, Inc. All Rights Reserved. 20
Storage Engines and Locking
Global: The entire Mongo instance is locked until the lock is released. This includes all
databases, collections, and documents. These locks are very expensive.
Database: Only the database and all collections owned by that database are locked.
Database locks are still expensive.
Collection: Only the documents in the collection (table) are locked. Better.
Document: Individual documents can be locked instead of locking the entire collection
which improves performance for write-heavy applications. Only the WiredTiger storage
engine implements document-level locking.
There are operations in Mongo that can cause locking at each level1.
There are 3 (really 4) storage engines. Each is tailored to different workloads.
1 https://docs.mongodb.com/manual/faq/concurrency/
21© 2017 Rogue Wave Software, Inc. All Rights Reserved. 21
Storage Engines: MMAPv1 and In-Memory
MMAPv11: Storage engine for memory mapped files. Used in pre-3.x Mongo.
As of version 3.2, is no longer the default. Great for high volume inserts,
reads, and updating existing documents. Writes to the disk every 60 seconds
(customizable) and uses on-disk journal to maintain durability. Uses all free
memory on the machine for the cache and yields to other processes that need
memory. Swaps to disk as needed.
• Uses lots of disk space
• Implements collection-level locking
• Maximum of 32TB (using a 64-byte key)
In-Memory2: Enterprise license only. Does not use any disk. If Mongo is
shutdown, then the data is lost. High-performance. Real-time analytics.
• 50% of RAM – 1GB
1 https://docs.mongodb.com/manual/core/mmapv1/
2 https://docs.mongodb.com/manual/core/inmemory/
22© 2017 Rogue Wave Software, Inc. All Rights Reserved. 22
Storage Engines: WiredTiger
WiredTiger1 (default): Features:
• Introduced in March 2015 with MongoDB 3.0. More CPU-intensive.
• WiredTiger uses multi-version concurrency control (MVCC) to perform
write locks.
• Locks can be done on a global/instance-level, database, collection, or
document level. Promises 7-10x better write performance.
• Provides on-disk data compression (index and documents). Up to 80%
less storage (snappy or zlib compression)
• Version 3.4+: 50% of RAM – 1GB or 256MB
• Version 3.2: 60% of RAM – 1GB or 1GB
• Encryption at Rest: Enterprise license only. HIPPA-compliant. AES256-
CBC (default). 3.2+ only.
1 https://docs.mongodb.com/manual/core/wiredtiger/
23© 2017 Rogue Wave Software, Inc. All Rights Reserved. 23
MongoDB PHP Drivers
PHP MongoDB Driver Homepage
https://docs.mongodb.com/ecosystem/drivers/php/
http://php.net/manual/en/set.mongodb.php
1) MongoDB Driver for PHP from PHP Extension Community Library (PECL)
https://pecl.php.net/package/mongodb
• Thin (bare-bones) limited functionality driver
• Currently maintained (version 1.3.1 released October 16th, 2017)
2) MongoDB PHP Library
https://docs.mongodb.com/php-library/current/
• Wrapper for the lower-level PHP driver above
• Recommended fully-featured driver
• Requires PHP 5.4+, libbson, and libmongoc and OpenSSL
• Documentation: https://docs.mongodb.com/php-library/current/
24© 2017 Rogue Wave Software, Inc. All Rights Reserved. 24
MongoDB PHP Drivers
MongoDB Compatibility1
The following compatibility table specifies the recommended version(s) of the
MongoDB PHP driver for use with a specific version of MongoDB.
1 https://docs.mongodb.com/ecosystem/drivers/driver-compatibility-reference/#php-driver-compatibility
PHP Driver MongDB 2.4 MongoDB 2.6 MongoDB 3.0 MongoDB 3.2 MongonDB 3.4
PHPLIB 1.1 +
mongodb-1.2*
Yes Yes Yes Yes Yes
PHPLIB 1.0 +
mongodb-1.1*
Yes Yes Yes Yes
mongodb-1.1* Yes Yes Yes Yes
mongodb-1.0* Yes Yes Yes
mongo-1.6** Yes Yes Yes
mongo-1.5** Yes Yes
mongo-1.4** Yes Yes
mongo-1.3** Yes
*New driver
**Legacy driver
25© 2017 Rogue Wave Software, Inc. All Rights Reserved. 25
MongoDB PHP Drivers
PHP Language Compatibility1
The following compatibility table specifies the recommended version(s) of the MongoDB PHP driver
for use with a specific version of PHP/Zend2.
1 https://docs.mongodb.com/ecosystem/drivers/driver-compatibility-reference/#reference-compatibility-language-php
2 https://framework.zend.com/blog/2017-06-06-zf-php-7-1.html
2 https://zend18.zendesk.com/hc/en-us/articles/217058968-PHP-Versions-and-APIs
3 Clark Everetts
PHP Driver PHP 5.6
Zend 8.0
PHP 5.6
Zend 8.5LTS
PHP 7.0.15
Zend 9.0.2
PHP 7.1.3/7
Zend 9.1
HHVM 3.12 HHVM 3.15
mongodb-1.2* Yes Yes Yes Yes Yes Yes
mongodb-1.1* Yes Yes Yes Yes Yes
mongodb-1.0* Yes Yes Yes
mongo-1.6** Yes Yes
mongo-1.3-1.5** Yes Yes
*New driver
**Legacy driver
Note: Support for PHP 5.6 is available via Zend Server 8.5, not 8.03.
HHVM - HipHop Virtual Machine (Facebook)
26© 2017 Rogue Wave Software, Inc. All Rights Reserved. 26
Demo: PHPLIB with MongoDB
http://php.net/manual/en/mongodb.tutorial.library.php
Install Composer (package manager).
Install the library: composer require mongodb/mongodb
Creates a bootstrap for dependency classes: vendor/autoload.php
Entire library: https://docs.mongodb.com/php-library/current/
Libraries used:
PHP 7.1.10
Apache 2.4.25
MongoDB PHPLIB Extension 1.2.9
http://localhost/phpinfo.php
2 files: list.php and create.php
No schema is defined (e.g.
collections/documents).
d
27© 2017 Rogue Wave Software, Inc. All Rights Reserved. 27
Hardware: Memory
The most important part of a Mongo deployment is RAM and not CPU. Mongo uses much less
CPU than a RDBMS. Insufficient RAM is the most common performance issue. RAM will contain
your indexes and working set.
The working set for an application should be able to fit comfortably in memory. This is the amount
includes:
• Data or collections: Containers for like documents stored in extents. Number of pages accessed
per second by active users on the system.
• Indexes on the collections.
Also consider the following:
• The period of time the data and indexes need to be retained.
• Connection pooling (1MB per active thread).
• Account for fragmentation.
• Operations on the data including sorting and aggregation (map reduce).
This does not mean all of the documents and indexes in the database have to fit within RAM. Only
documents and collections (a majority) your application accesses are part of the working set.
28© 2017 Rogue Wave Software, Inc. All Rights Reserved. 28
Hardware: Memory
1For example, you have a year’s worth of data, and assume each month is
1GB of data totaling 12GB. For every month of data you have 1GB of
indexes totaling 12GB. If your application is accessing 12 month’s worth of
data, then your working set is: 12GB of data and 12GB of indexes = 24GB.
If you had 8GB RAM and your application started accessing 6 month’s
worth of data (6GB data + 6GB indexes), then your working set would start
to exceed the available RAM and poorly perform as time progresses (as
more data is accessed).
You want to prevent Mongo from paging in and paging out documents in your
working set.
1 https://stackoverflow.com/questions/6453584/what-does-it-mean-to-fit-working-set-into-ram-for-mongodb
29© 2017 Rogue Wave Software, Inc. All Rights Reserved. 29
Hardware: Memory
How much RAM do I need for my deployment?
It depends on:
• An application’s typical use cases/access patterns. Every application is
different. Requires understanding your application.
• How well the queries are indexed. If queries are doing full collection
scans, then more memory will be used.
Always allow room to grow.
Best practice: Create automated test scripts (e.g. PHPUnit1) focusing on
repeatability and keep them updated as the application changes. Monitor the
application using the approaches mentioned here. Take time to do this as it
will pay dividends in the long run. Sell this to your management team that
you need to do this for the application’s success.
1 https://zendframework.github.io/zend-test/phpunit/
30© 2017 Rogue Wave Software, Inc. All Rights Reserved. 30
Hardware: Paging
You do not get a notification when Mongo is paging. So, how do I know if I am paging?
Run the following from a mongo prompt:
db.serverStatus().extra_info
{
"page_faults" : 6137
}
Example, client ran a perf test with 8GB of RAM:
{
"page_faults" : 353
}
After increasing to 16GB of RAM:
{
"page_faults" : 76
}
31© 2017 Rogue Wave Software, Inc. All Rights Reserved. 31
Hardware: Disks
If Mongo must access disk, then prefer RAID10 or SSD-based VM over HDD
(which will cost more).
“Solid state drives (SSDs) can outperform spinning hard disks (HDDs) by
100 times or more for random workloads” 1 and are increasingly more
affordable.
Maximizing MongoDB Performance on AWS
https://www.mongodb.com/blog/post/maximizing-mongodb-performance-on-
aws
1 https://docs.mongodb.com/manual/core/write-performance/#storage-performance
32© 2017 Rogue Wave Software, Inc. All Rights Reserved. 32
Security
Disabling SELinux
“Problems have been reported when using MongoDB with SELinux enabled.
To avoid issues, disable SELinux when possible.”1
There are methods on how to work around this without disabling SELinux.
Try audit2why on the /var/log/audit/audit.log to view the violations, and build
custom policies with audit2allow2.
1 https://docs.mongodb.com/manual/tutorial/install-mongodb-on-red-hat/
1 https://docs.mongodb.com/manual/administration/production-notes/#recommended-configuration
2 https://serverfault.com/questions/770227/selinux-setup-for-mongodb
33© 2017 Rogue Wave Software, Inc. All Rights Reserved. 33
Security
Do enable MongoDB security!
Tutorial: Enable Authentication
https://docs.mongodb.com/manual/tutorial/enable-authentication/
Major security alert as 40,000 MongoDB databases left unsecured (February
2015)
https://www.techworm.net/2015/02/major-security-alert-40000-mongodb-
databases-left-unsecured.html
This was discovered by 3 students in Germany by checking TCP port 27017.
You can easily go out on (search engine for IoT) and do a search on
MongoDB:
shodan download --limit -1 mongodb "product:MongoDB”
34© 2017 Rogue Wave Software, Inc. All Rights Reserved. 34
Security
It's Still the Data, Stupid! (December 2015 by John Matherly)
https://blog.shodan.io/its-still-the-data-stupid/
“At the moment, there are at least 35,000 publicly available, unauthenticated
instances of MongoDB running on the Internet... There's a total of 684.4 TB of
data exposed on the Internet via publicly accessible MongoDB instances that
don't have any form of authentication.”
MongoDB ransacking starts again: Hackers ransom 26,000 unsecured
instances (September 5th, 2017 by Liam Tung)
http://www.zdnet.com/article/mongodb-ransacking-starts-again-hackers-ransom-
26000-unsecured-instances/
“Three groups of hackers have wiped around 26,000 MongoDB databases over
the weekend and demanded victims to pay about ~$650 (495 pounds) to have
them restored.”
35© 2017 Rogue Wave Software, Inc. All Rights Reserved. 35
Operating System Tuning: THP
Disable Transparent Huge Pages (THP) for Linux:
https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/
Why?
When an application is accessing memory, you want memory to be accessed
contiguously. Databases tend to have sparse rather than contiguous memory
access patterns.
Set readahead to 0 Regardless of Storage (e.g. SSD, HDD)
https://docs.mongodb.com/manual/administration/production-notes/#readahead
Why?
“Setting a higher readahead benefits sequential I/O operations1.” MongoDB
accesses the disk in random patterns. Increasing this value can degrade
performance.
36© 2017 Rogue Wave Software, Inc. All Rights Reserved. 36
Operating System Tuning: File System Types
Use XFS and not EXT4 or NFS
https://docs.mongodb.com/manual/administration/production-notes/#kernel-and-file-systems
Why?
XFS is arguably better for concurrent writes. It is best to run mongoperf on your system to check.
“With the WiredTiger storage engine, use of XFS is strongly recommended to avoid performance issues
that may occur when using EXT4 with WiredTiger1.”
“Avoid using NFS drives for your dbPath. Using NFS drives can result in degraded and unstable
performance1.”
XFS vs EXT4 – Comparing MongoDB Performance on AWS EC2
https://scalegrid.io/blog/xfs-vs-ext4-comparing-mongodb-performance-on-aws-ec2/
“In performance terms, XFS is indeed a force multiplier when paired with high speed disks that it can take
real advantage from. For low to mid-end systems, it doesn’t seem to be able to do much to improve your
performance.”
Windows Only: Do not use FAT file system. Use NTFS instead.
1 https://docs.mongodb.com/manual/administration/production-checklist-operations/#filesystem
37© 2017 Rogue Wave Software, Inc. All Rights Reserved. 37
Operating System Tuning: Disk Space
Disk Space
Have enough disk space for the size of your data, indexes, and log files plus
plenty of room for expansion.
Running db.stats() (proactively) and looking at the following fields
should give an idea:
• avgObjSize: Average size of files allocated for this database.
• dataSize: Size of BSON objects in the database.
• storageSize: Total space allocated for collection extents. Extra space
reserved for collection growth and unallocated deleted space.
38© 2017 Rogue Wave Software, Inc. All Rights Reserved. 38
Operating System Tuning: NUMA
Disable NUMA (Non-Uniform Access Memory)
https://docs.mongodb.com/manual/administration/production-notes/#mongodb-and-numa-hardware
What is NUMA?
It is used to increase processor speed on a multi-core system without increasing load on the processor bus. This is where
you would have 2 memory pools where each core has some degree of proximity to each memory pool on the bus.
Why disable NUMA?
It can cause memory to paged in and out unnecessarily. Mongo will complain if it expects NUMA is enabled. Accessing
local memory is faster than remote.
How is NUMA disabled?
On Windows, it can be configured in the BIOS.
On Linux:
echo 0 | sudo tee /proc/sys/vm/zone_reclaim_mode
Or:
sudo sysctl -w vm.zone_reclaim_mode=0
Then any Mongo application (e.g. mongod, mongos, mongo) must be started with numactl:
numactl --interleave=all /usr/bin/mongod --quiet -f /etc/mongod.conf run
Note: This may not be necessary when bound to a single NUMA node:
See: https://jira.mongodb.org/browse/SERVER-25984
8GB
RAM
CP
U 0
CP
U 2
CP
U 4
CP
U 6
CP
U 1
CP
U 3
CP
U 5
CP
U 7
8GB
RAM
bus
a
39© 2017 Rogue Wave Software, Inc. All Rights Reserved. 39
Operating System Tuning: atime
Disabling Last Accessed Time
Disable last accessed time (atime) in the file system table (/etc/fstab)
entries for volumes containing Mongo database files.
Why disable last accessed time?
This can provide a significant performance improvement unless you have an
application that relies on atime.
How is atime disabled on Linux?
/dev/mapper/datavg-datalv /apps xfs defaults,noatime 0 0
/dev/mapper/appvg-appsloglv /apps/logs xfs
defaults,noatime 0 0
40© 2017 Rogue Wave Software, Inc. All Rights Reserved. 40
Tools: Finding the Bottleneck
iostat1: Number of accesses over time to the disk.
Example: iostat –xmt 1
• %util: This is the most useful field for a quick check, it indicates what percent
of the time the device/drive is in use.
• avgrq-sz: Average request size. Smaller number for this value reflect more
random IO operations.
vmstat2: How much data being used and fitting into memory and page faults.
mongostat3: Provides a quick overview of the status of a currently
running mongod or mongos instance. Similar to vmstat.
Profiling4: mongod --profile <level:0> --slowms <milliseconds: 100>
1 https://docs.mongodb.com/manual/administration/production-notes/#iostat
2 https://docs.mongodb.com/manual/faq/diagnostics/#how-do-i-read-memory-statistics-in-the-unix-top-command
3 https://docs.mongodb.com/manual/reference/program/mongostat/
4 https://docs.mongodb.com/manual/reference/program/mongod/#bin.mongod
http://edgystuff.tumblr.com/post/81219256714/tips-to-check-and-improve-your-storage-io
41© 2017 Rogue Wave Software, Inc. All Rights Reserved. 41
Tools: Finding the Bottleneck
mongoperf1: Checks disk I/O performance independently of MongoDB.
mongoperf can overstate performance problems in ext-X filesystems
https://jira.mongodb.org/browse/SERVER-13417
“People who use mongoperf to compare XFS and ext-X might get results that overstate the benefits of
XFS…The workaround is to let mongoperf use multiple files. That would also make the mongoperf load
more realistic given mongodb will use many files.”
Set the file size, # of threads, read/write operations in a .conf file:
{ nThreads:1024, fileSizeMB:1000, mmf:false, r:true, w:true, syncDelay:60 }
mongoperf < ./mongoperf.conf
Tips to check and improve your storage IO performance with MongoDB
http://edgystuff.tumblr.com/post/81219256714/tips-to-check-and-improve-your-storage-io
mongotop2: Tracks time spent reading/writing per namespace (database/collection).
MongoDB Monitoring Service (MMS): Graphical user display for monitoring, backup, and deployment.
http://api.mongodb.com/mms/
1 https://docs.mongodb.com/manual/reference/program/mongoperf/
2 https://docs.mongodb.com/manual/reference/program/mongotop/
42© 2017 Rogue Wave Software, Inc. All Rights Reserved. 42
Indexes
What makes a good index?
• The query optimizer chooses the most efficient query plan for the available indexes.
• An index an a unique ID value field is very selective. When multiple indexes are
involved, Mongo evaluates the indexes and uses the more highly selective index.
• Will the query planner use the index? If the fields named in the query are part of the
index, then yes.
• The index should be selective to narrow down the results for a given key.
• Index on a boolean field only are usually not selected because only two possible values
(true or false) will not narrow down the selection.
• Mongo keeps statistics on index hits to see if a key match points to a few or many
documents.
• Mongo can select the wrong index to use meaning that the other index would perform
better.
– Mongo caches the selection and may remember a non-optimal choice.
– Statistically an index may look good, but another might perform better.
43© 2017 Rogue Wave Software, Inc. All Rights Reserved. 43
MongoDB Indexes
Indexes are implemented as a B-tree data structure1.
Each collection has an index on _id automatically.
MongoDB allows 64 indexes per collection2.
Run explain on your application’s main queries to determine if they are using indexes.
See if a column is indexed;
db.beers.find( { "beer": "Stickee Monkee" } ).explain()
If queryPlanner.winningPlan.stage equals:
• “IDHACK”: Uses special ID index strategy to retrieve the documents for this query.
• “COLLSCAN”: This is a collection scan which means the query had to visit every document
in the collection. For large databases, MongoDB would have to page all of the
documents into memory which is very slow.
• “IXSCAN”: Index scan.
• “FETCH”: Document retrieval (IXSCAN index hit).
1 https://docs.mongodb.com/manual/indexes/#create-an-index
2 https://docs.mongodb.com/manual/reference/limits/#Number-of-Indexes-per-Collection
d
44© 2017 Rogue Wave Software, Inc. All Rights Reserved. 44
MongoDB Indexes
Get execution statistics:
db.<collection>.find({_id:ObjectId("595411070a797e7aaeff2733")}).exp
lain(‘executionStats’)
Look at the following:
• executionStats.totalDocsExamined (previously named nscannedObjects). If
this number is high, then an index was not used.
• executionStats.totalKeysExamined: The number of index entries scanned. If it is
zero, then an index was not used in the query.
• executionStats.executionStages.nReturned: # of results the query returned.
Reference:
Explain results:
https://docs.mongodb.com/manual/reference/explain-results/
Get indexes:
db.<collection>.getIndexes()
d
45© 2017 Rogue Wave Software, Inc. All Rights Reserved. 45
Indexes
Create an index:
db.<collection>.createIndex({<field>: <1 for ascending, 2
for descending>})
• Creates an index and collection if it does not exist.
• Direction can be 1 for ascending, -1 for descending
“If a write operation modifies an indexed field, MongoDB updates all
indexes that have the modified field as a key1.”
d
1 https://docs.mongodb.com/manual/faq/indexes/#how-do-write-operations-affect-indexes
46© 2017 Rogue Wave Software, Inc. All Rights Reserved. 46
2 Types of Scaling
The traditional way to scale a RDBMS is to add more hardware. Eventually,
a price or scale limit is hit making this approach unfeasible.
• Vertical: Adding CPU, faster disks, additional memory on a bare-metal
machine. Simple to implement.
• Horizontal: Pooling resources to distribute the load and data across
multiple machines. Cloud solutions allow us to scale up or down
depending on need. One way to accomplish this is through sharding.
*It is essential to understand your application’s requirements regarding
latency and throughput, the volume and type of data, and the period of
time the data is kept.
47© 2017 Rogue Wave Software, Inc. All Rights Reserved. 47
Replication and Replica Sets
• Replication
– Protects your data but is not a backup.
– Provides redundancy by synchronizing across nodes.
– Disaster recovery: Automatic failover when your primary node fails.
– Used to scale reads.
– Goal: Always have one full copy of your data at all times.
• What is a replica set?
– Replication where a configured group of nodes automatically
synchronize their data and fail over when a node is no longer
available.
– A recommended minimum of 3 nodes
48© 2017 Rogue Wave Software, Inc. All Rights Reserved. 48
Replica Set: What does it look like?
Driver
Application
Data Center 1
Secondary
Primary
Secondary
• Typical setup: A primary (writable) and 2
secondary (read-only nodes make up a 3-
node replica set.
• Near 100% uptime is critical: A loss of 1 node
does not take down the entire replica set.
• Primary and secondary are copies of each
other.
• Writes can only be done on the primary.
• Reads can be performed on the secondary.
• A priority can be set that favors a node to
become a primary in case of failover1.
• If there are 2 data centers with 2 nodes in
one data center, then use one of the data
centers for DR.
• Not scalable: Primary is largest common
denominator (memory, drive space)
Asynchronous
Replication
Data Center 2
(DR)
Secondary
Secondary
1 https://docs.mongodb.com/manual/tutorial/adjust-replica-set-member-priority/
2 https://docs.mongodb.com/manual/reference/replica-configuration/#rsconf.members[n].priority
a
Ok, that’s great. How do we scale horizontally?
49© 2017 Rogue Wave Software, Inc. All Rights Reserved. 49
Sharding
• What is sharding?
– A shard contains a subset of the sharded data. Shards are
deployed as a replica set1.
– Distributes the load by partitioning documents into smaller
manageable pieces on less powerful machines so one machine does
not have to store everything.
– Partitioning is abstract to the application.
– Used to scale writes. Most operations are inserts and updates for
large-volume applications.
– Adds overhead and complexity: Moving data off overloaded shards
takes time and resources.
– Balancing and redistribution of the data across the shards is done
automatically.
1 https://docs.mongodb.com/manual/core/sharded-cluster-components/
50© 2017 Rogue Wave Software, Inc. All Rights Reserved. 50
When to use sharding?
• Geo-locality: Support geographically distributed deployments of optimal
user experience for customers in many locations.
– Lower network latency.
– Great for mobile applications.
• Scalability: The working set growth is unbounded and exceeds the
available RAM on the largest node/server. When not enough resources
exist on a single machine and there is a lot of I/O, configuring sharding and
spreading the load across several machines may reduce the load on an
individual server.
• Hardware Optimization: Enhancing performance without the cost.
• Low Recovery Times: Mitigate the impact on from server failure.
51© 2017 Rogue Wave Software, Inc. All Rights Reserved. 51
Sharding
• When not to use sharding?
– New deployments: Prototype first with a large amount of data. Use
a replica set first, and then move to sharding later on if required.
– A prototype (or performance test) allows for a more accurate
estimate on the hardware.
– If you determine sharding is needed within the first 6 to 12 months
after deployment of a project, then plan for a sharded cluster.
– An application has a tendency to access data not local.
– Infrastructure limitations: Sharding requires several servers.
– The application has one shard.
52© 2017 Rogue Wave Software, Inc. All Rights Reserved. 52
Should I use sharding in this use
case?
• I own a trucking company that owns 1,000 18-wheelers delivering steel
beams to construction companies. The U.S. Department of Transportation
enacted the Transportation Recall Enhancement, Accountability, and
Documentation Act (TREAD) which required Tire Pressure Monitoring
System (TPMS) be installed on each truck to automatically measure tire
pressure for each tire in 1-minute increments using sensors while each
truck is moving. The data is uploaded to the home office where it is
required to be kept for 3 years.
a
53© 2017 Rogue Wave Software, Inc. All Rights Reserved. 53
Sharding Deployments
• A sharded cluster contains:
– Shards store the application data. In a sharded cluster, only the mongos
routers or system administrators should be connecting directly to the
shards.
– mongos query routers cache the cluster metadata and use it to route
operations to the correct shard or shards.
– Config servers persistently store metadata about the cluster, including
which shard has what subset of the data.
• Guidelines
– Each member of a replica set, whether it’s a complete replica or an
arbiter, needs to live on a distinct machine.
– Replica set arbiters are lightweight enough to share a machine with
another process. Arbiters can be placed on a mongos query router.
– Config servers can optionally share a machine. The only hard
requirement is that all config servers in the config cluster reside on
distinct machines.
54© 2017 Rogue Wave Software, Inc. All Rights Reserved. 54
Sharding Caveats
“My application performed fine with a replica set, and now my application
performs terribly with a sharded cluster! Why?”
Initial load of data is not split and as a result written to one shard. As data
is written more chunks are created. Chunks are is a group of documents
clustered by values on a field (e.g. list of beers between letters a – d). As
more chunks are created and certain thresholds are met, Mongo moves the
data to other shards/servers to balance out. This process called a
migration.
What does this look like?
55© 2017 Rogue Wave Software, Inc. All Rights Reserved. 55
Sharding Migrations
Driver
Application
Query Router
(mongos)
Query Router
(mongos)
…
Chunk
64MB
Shard 1 Shard 3Shard 2 Shard 4
• A balancer process (automatically)
monitors the number of chunks on each
shard.
• The first data load all writes go to one
shard/server.
• As data is loaded the chunks are created
and segmented as the chunks grow.
Chunk
64MB
Chunk
64MB
Chunk
64MB
Chunk
64MB
Chunk
64MB
Chunk
64MB
Chunk
64MB
Chunk
64MB
Chunk
64MB
• When a shard fills to capacity, the
balancer issues a migration.
• Migrations take time because data is
being moved from server to server.
• The goal: Equal # of chunks per shard.
Chunk
64MB
Chunk
64MB
a
56© 2017 Rogue Wave Software, Inc. All Rights Reserved. 56
Sharding Caveats
• The data can be pre-split to mitigate this issue (using either a command1 or
script2). The chunks will be created and balanced ahead of time. When data
is written then the data will be distributed according to your sharding strategy.
• Be proactive! Your application’s performance can degrade. A good rule of
thumb is to plan to add a new shard at least several weeks before the indexes
and working set on your existing shards reach 90% of RAM.
• Fixed-size collections (or capped collections) cannot be sharded.
• Use care with mapReduce and aggregation functions with sharding as they
tend to globally lock the database:
Remove unnecessary global lock during "replace" out action
https://jira.mongodb.org/browse/SERVER-13552
“During map-reduce operation there are unnecessary global lock used and
should be removed.”
1 https://docs.mongodb.com/manual/reference/command/shardCollection/
1 https://docs.mongodb.com/manual/tutorial/migrate-chunks-in-sharded-cluster/
2 https://docs.mongodb.com/manual/tutorial/create-chunks-in-sharded-cluster/
57© 2017 Rogue Wave Software, Inc. All Rights Reserved. 57
Deployment Strategy
Zend Server
Driver
Application
Query Router
(mongos)
Data Center 1
Secondary
(shard 2)
Primary (shard 1)
Secondary
(shard 3)
Zend Server
Driver
Application
Query Router
(mongos)
Zend Server
Driver
Application
Query Router
(mongos)
Query Router
(mongos)
Config Servers
(3-node replica set)
Data Center 1 Config Server 1
Data Center 2 Config Server 2
Data Center 3 Config Server 3
Data Center 2 Data Center 3
Primary (shard 2)
Query Router
(mongos)
Primary (shard 3)
Query Router
(mongos)
Secondary
(shard 1)
Secondary
(shard 3)
Secondary
(shard 1)
Secondary
(shard 2)
• Query routers can be placed at the application
server or shard primary. There are typically
more mongos servers than application servers
(1-to-1 mongos/mongod).
• Config servers are a replica set.
• Setup provides Redundancy and HA.
• Shards spread across data centers.
a
58© 2017 Rogue Wave Software, Inc. All Rights Reserved. 58
Shard Keys
• A shard key is set by the developer/data modeler which describes a
range of values in how a data set should be partitioned. It determines
how a collection of documents are spread over partitions.
• Data is segregated into chunks by the shard key. The chunks are
distributed across shards residing across multiple servers.
• Data modelers can struggle with defining a good shard key which can
cause problems with performance later on. You cannot change a shard
key after sharding a collection. Choose the shard key wisely. You
would need to dump and restore the data to repartition the data for a new
shard key. Not trivial.
Customers commonly pick the wrong shard key and get stuck.
59© 2017 Rogue Wave Software, Inc. All Rights Reserved. 59
Good Shard Keys
Good shard keys exhibit 5 properties:
• Cardinality: For set A = { “East”, “Central”, “West” } the set has 3 elements
giving it a cardinality of 3. This means I can have only 3 chunks the
balancer can create. This can limit horizontal scaling in the cluster.
Although, high cardinality does not mean the data will be distributed evenly
across the cluster.
• Write Distribution: Writes should be distributed evenly (as possible)
across the cluster.
• Query isolation: Majority of queries for the same collection result in
targeting one (or a few) particular shard(s).
• Reliability: Do some or all documents get affected after one shard goes
down?
• Index locality: Locality is how the indexes are accessed. Do we have to
page in the entire index each time to query?
60© 2017 Rogue Wave Software, Inc. All Rights Reserved. 60
Bad Shard Keys
• Data center: This would be a good shard tag, but does not have good
cardinality. If we had 3 data centers, then the data can be split up into 3
chunks. There will be large chunks that cannot be split resulting in one shard
with more data than the others.
• Customer IDs: May not be as well distributed if some customers are larger
than others (cardinality). Depends on use case.
• Timestamp: Bad write distribution because writes could be targeted more to
one chunk than the others. Example: My website activity is greatest between
the hours of 10am and 4pm.
• Hashed timestamp: An MD5 hash of a timestamp provides random
distribution, but would cause the queries to be scattered across the cluster.
61© 2017 Rogue Wave Software, Inc. All Rights Reserved. 61
Ranged Sharding
MongoDB provides 3 types of sharding strategies:
• Range1/User-defined: Shard key defines a range of values are segmented
into sequential chunks. Documents with “close” shard key values are likely to
be in the same chunk or shard. For example: A primary key for a collection of
documents is an auto-increment integer. 0 – 10 go to shard 1, 11 – 20 go to
shard 2, 21 – 30 go to shard 3, and so on. Composite keys are supported with
this strategy.
Deploy Sharded Cluster using Ranged Sharding
1 https://docs.mongodb.com/manual/tutorial/deploy-sharded-cluster-ranged-sharding/
Shard 1
{ x: minKey }
Shard 2 Shard 3 Shard 4
{ x: maxKey }{ x: 11} { x: 21} { x: 31}
Chunk 1
1
2
Chunk 2
7
8
Chunk 1
11
17
Chunk 1
21
Chunk 2
29
30
Chunk 1
35
38
62© 2017 Rogue Wave Software, Inc. All Rights Reserved. 62
Hashed Sharding
• Hashed1: Subset of range sharding. MD5 hash is applied on the key to
ensure data is spread randomly within MD5 range of values. Composite
keys are not supported.
Deploy Sharded Cluster using Hashed Sharding
1 https://docs.mongodb.com/manual/tutorial/deploy-sharded-cluster-hashed-sharding/
Shard 1 Shard 2 Shard 3 Shard 4
{ x: maxKey }{ x: d3d9446802...}
Chunk 1
c4ca4238a0b923820dcc509a
6f75849b
MD5 Hash Function
1
{ x: minKey }
Chunk 1
d3d9446802a44259755d38e
6d163e820
c20ad4d76fe97759aa27a0c
99bff6710
Chunk 1
8e296a067a37563370ded05
f5a3bf3ec
Chunk 1
Chunk 2
{ x: 34173cb38f07f...} { x: c0c7c76d30bd3...}
a
63© 2017 Rogue Wave Software, Inc. All Rights Reserved. 63
U.S.
U.S.
U.S.
U.S.
U.S.
Tag-Aware Shard Keys
• Subset of shards are tagged and assigned to a sub-range of the shard-key. This creates
zones where each zone with one or more shards. Most relevant data should reside on
shards geographically closest to the application servers.
AMER
Secondary
Primary
Secondary
EMEA
Secondary
Primary
Secondary
APAC
Secondary
Primary
Secondary
United States
Secondary
Primary
Secondary
ZONE:
U.S.
Canada
Mexico
England
Germany
y
France
India
China
Japan
TAGS:
U.S.
U.S.
U.S.
U.S.
EMEA/APAC
Secondary
Primary
Secondary
a
64© 2017 Rogue Wave Software, Inc. All Rights Reserved. 64
Which type of shard key should I use?
Use Cases
• Scalability: Range or hash
• Geo-locality: Tag-aware
• Hardware Optimization: Tag-aware
• Low Recovery Times: Range or hash
Combining tag-aware with range or hash is acceptable.
A good shard key usually has the following (in this order):
• A random component like a universally unique identifier (UUID).
– Note: Shard key sizes are limited to 512 bytes.
WiredTigerIndex::insert: key too large to index,
failing 1060 { : new Date(1500393090062)…
• An increasing sequence like a timestamp.
65© 2017 Rogue Wave Software, Inc. All Rights Reserved. 65
Links
MongoDB Documentation
https://docs.mongodb.com/
MongoDB Limits and Thresholds
https://docs.mongodb.com/manual/reference/limits/
MongoDB Production Checklist
https://docs.mongodb.com/manual/administration/production-checklist-
operations/
Who uses MongoDB?
https://www.mongodb.com/who-uses-mongodb
MongoDB IRC Chat
irc://irc.freenode.net/#mongodb
66© 2017 Rogue Wave Software, Inc. All Rights Reserved. 66
Links
API Documentation for MongoDB Drivers
https://api.mongodb.com/
MongoDB Integration and Tools
https://docs.mongodb.com/ecosystem/tools/
MongoDB Drivers (All)
https://docs.mongodb.com/ecosystem/drivers/
mongo Keyboard Shortcuts
https://docs.mongodb.com/manual/reference/program/mongo/#keyboard-
shortcuts
Back Up a Sharded Cluster with Database Dumps
https://docs.mongodb.com/manual/tutorial/backup-sharded-cluster-with-
database-dumps/
67© 2017 Rogue Wave Software, Inc. All Rights Reserved. 67
Links
MongoDB in Action Manning Forum
http://manning-sandbox.com/forum.jspa?forumID=677
MongoDB Videos
https://www.mongodb.com/presentations/
MongoDB University
https://university.mongodb.com/
MongoDB Events
https://www.mongodb.com/events/
Community Support Forum
http://groups.google.com/group/mongodb-user
68© 2017 Rogue Wave Software, Inc. All Rights Reserved. 68
Links
ServerFault
https://serverfault.com/questions/tagged/mongodb
Stack Overflow
https://stackoverflow.com/questions/tagged/mongodb
69© 2017 Rogue Wave Software, Inc. All Rights Reserved. 69
Q/A
Questions?
70© 2017 Rogue Wave Software, Inc. All Rights Reserved. 70
Thank you!
Be inspired to do something BIG!
71© 2017 Rogue Wave Software, Inc. All Rights Reserved. 71
Slides Removed Due to Time Constraints
72© 2017 Rogue Wave Software, Inc. All Rights Reserved. 72
Security: Roles
• Built-In Roles
https://docs.mongodb.com/manual/reference/built-in-roles/
– userAdminAnyDatabase: Complete access
– backup: Ability to run mongodump
• User-Defined Roles
73© 2017 Rogue Wave Software, Inc. All Rights Reserved. 73
Data Archival
Backing up and restoring MongoDB1
• Best for replica set/sharded cluster
– MongoDB Cloud Manager (Enterprise Advanced only)
– Ops Manager: On-premise (Enterprise Advanced only)
• mongodump/mongorestore
– Not the best solution for larger databases.
– Does impact mongod performance while running.
• File system snapshot
– Snapshots: Not specific to MongoDB
• More efficient but volume must support snapshots (Logical Volume Manager on Linux)
• Must have journaling enabled and journal must be on the same logical volume.
Journaling is enabled by default on 64-bit builds (--journal).
• Sharded cluster backup is more difficult: “disable the balancer and capture a snapshot
from every shard as well as a config server at approximately the same moment in time”
– cp or rsync
Remember to test your backup and restore process no matter what strategy you choose.
1 https://docs.mongodb.com/manual/core/backups/
74© 2017 Rogue Wave Software, Inc. All Rights Reserved. 74
Data Archival
Journaling
• Journaling is enabled by default.
• Prevents data corruption.
• Every write is flushed to the journal every 100 milliseconds.
• Journaling can be disabled to increase performance; however, replication
should be enabled to ensure durability.
75© 2017 Rogue Wave Software, Inc. All Rights Reserved. 75
Data Archival
mongodump/mongorestore1
• Excludes the local database in its output.
• Requires the backup role if security is enabled.
• mongod must be running since storage engines have different file layouts.
• For sharding clusters, mongos (sharding router) must be running.
• Suggestion: Create a specific operating system user to run the backup and
only give access to the backups to the backup user.
1 https://docs.mongodb.com/manual/tutorial/backup-and-restore-tools/
76© 2017 Rogue Wave Software, Inc. All Rights Reserved. 76
Data Archival
Examples:
Export and import all databases (excluding local and admin)1:
mongodump
mongorestore <location of dump>
Drop all documents before inserting:
mongorestore --drop <location of dump>
Export a database:
mongodump --db <database>
Export a specific database and collection:
mongodump --db <database> --collection <collection>
Restore a specific database and collection:
mongodump --drop --db <database> --collection <collection> <location of .bson file of
collection>
Export a specific database and collection with security enabled:
mongodump --db <database> --collection <collection> --username <username> --password
<password>
1 https://docs.mongodb.com/manual/tutorial/backup-and-restore-tools/
77© 2017 Rogue Wave Software, Inc. All Rights Reserved. 77
Data Archival
• Protecting the backups
– Backups are stored in BSON/binary format.
– Binary does not mean encrypted.
– bsondump can be used to read backups:
<$MONGO_ROOT>/bin/bsondump
78© 2017 Rogue Wave Software, Inc. All Rights Reserved. 78
Indexes
Query Selectors
$in and $all can take advantage of indexes. $ne and $nin cannot unless used with another operator.
Cannot use index and uses a collection scan instead:
{timeframe: {$nin: ['morning', 'afternoon']}}
Can use index: {timeframe: 'evening'}
JavaScript Query Operators
This statement cannot use an index and is singled-threaded:
db.reviews.find({'$where': "this.helpful_votes > 3"})
Regular Expressions
MongoDB is compiled with PCRE (Perl Compatible Regular Expressions)
Unless a prefix-style query is used then a regular expression query cannot use an index.
Example of a prefix-style query “starts with”: db.users.find({'last_name': /^Sm/})
Using case-insensitive flag against an index field nullifies the use of an index because indexes are case-sensitive.
Use the new text search capability or use an external search engine instead.
Another option is to store the searched field as all lower case and search against that field making it an indexed case-
insensitive search.
The modulo operator $mod will not use an index.
79© 2017 Rogue Wave Software, Inc. All Rights Reserved. 79
Tools
mtools: https://www.mongodb.com/blog/post/introducing-mtools
A set of Python scripts to parse and filter MongoDB log files.
• mloginfo: Log file analysis. This is the most useful tool within mtools.
mloginfo mongod_prod.log –queries
• mlogfilter: Filtering tool used to pipe output to other tools. Helps
narrow down a search.
• mplotqueries: Creates a graph. Has some bugs.
mplotqueries mongod_prod.log
• mlogvis: Web-based version of mplotqueries.
80© 2017 Rogue Wave Software, Inc. All Rights Reserved. 80
Hardware: Faults and Memory
2 types of page faults:
• Soft fault: Moves memory pages from one list to another (e.g. OS file
cache).
• Hard fault: Mongo accesses the disk to store or retrieve data. This is
what we are trying to prevent.
81© 2017 Rogue Wave Software, Inc. All Rights Reserved. 81
Operating System Tuning: Disk Space
Check ulimit settings:
https://docs.mongodb.com/manual/reference/ulimit/
Recommended settings:
• -f (file size): unlimited
• -t (cpu time): unlimited
• -v (virtual memory): unlimited
• -n (open files): 64000
• -m (memory size): unlimited
• -u (processes/threads): 64000
82© 2017 Rogue Wave Software, Inc. All Rights Reserved. 82
Indexes
Index are created immediately when declared.
Indexes by default are created in the foreground which is a locking activity
on the collection.
To create an index in the background1:
db.<collection>.ensureIndex({…}, { background: true })
• Background index builds allow reads and writes while the index is being
built.
• Background index builds take longer and indexes can be larger than if they
were built in the foreground.
1 https://docs.mongodb.com/manual/reference/method/db.collection.createIndex/
83© 2017 Rogue Wave Software, Inc. All Rights Reserved. 83
Indexes
How do we remove old documents automatically?
Time-To-Live (TTL) Indexes
Create a TTL index on a collection to expire documents after 30 days:
db.<collection>.ensureIndex({ time: 1}, {expireAfterSeconds:
60*60*24*30})
• Only one TTL index is allowed per collection.
• The field the TTL index is declared on must be a date type and cannot be the _id field.
• expireAfterSeconds specifies how old the document can be before it is removed.
Do not be disappointed if a document is not deleted/expired when you expect.
• An internal job runs every 60 seconds and queries for documents whose date is older than that
interval relative to the current date and time.
– This means that documents could live 60 seconds longer than the expiration date.
• If you have a large amount of documents that need to expire when you first create a TTL index,
then this can cause a bottleneck because deletion is a write activity. Write activities require I/O
and locks which require time. It may make sense to delete documents manually in batches and
then implement a TTL index. This has less impact.
84© 2017 Rogue Wave Software, Inc. All Rights Reserved. 84
Indexes
Forcing an index to be rebuilt:
An index rebuild can be triggered by the following:
db.<collection>.reIndex()
Or by issuing a repair which will cause all indexes to be dropped and recreated:
db.repairDatabase()
Caveat: Global write lock! 1
Compaction:
Compact defragments documents (or releases unused disk space) and rebuilds indexes.
Compact is a blocking operation, and run this only during a maintenance window on a
primary or an offline secondary in a replica set. To compact a collection:
db.runCommand( { compact: ‘<collection’ } )
Note: Compaction does run differently depending on storage engines.
d
85© 2017 Rogue Wave Software, Inc. All Rights Reserved. 85
Production Replica Set Configurations
• 2 Replicas and 1 Arbiter
– Arbiter runs on an application server.
– Both replicas get their own machine.
• Arbiters
– Replica sets vote on a new primary if the old primary goes down.
– Lightweight mongod process that break ties if the nodes cannot
agree on a new primary.
– Use an arbiter if there are an equal number of nodes between
data centers.
– Does not replicate data.
Ok, that’s great. How do we scale horizontally?
86© 2017 Rogue Wave Software, Inc. All Rights Reserved. 86
Indexes
Compound Indexes
• By having more than one key, it increases selectivity. Therefore, multiple queries can
benefit.
• Determine if the index really speeds up a query by look at explain().
• Minimize index size and count: Having less indexes can be a performance gain which
requires less space and time to build the index. Adding an index for every possible query
is not a good idea. This is a balancing act.
• Maximize selectivity.
Limitations2
• Index key value sizes are limited to 1024 bytes. Do not index too many fields. Do a
Object.bsonsize( { … } ) on the key values to help determine the size.
• Index key names are limited to 128 characters.
• Compound indexes can contain up to 31 fields.
• A single collection can have no more than 64 indexes. If you are hitting this limitation,
then you are over-indexing and reevaluate your data model.2 https://docs.mongodb.com/manual/reference/limits/#indexes
87© 2017 Rogue Wave Software, Inc. All Rights Reserved. 87
Sharding Deployments
Driver
Application
Query Router
(mongos)
Query Router
(mongos)
…
Data Center 1
Secondary
Primary
Secondary
Data Center 2
Secondary
Primary
Secondary
Data Center 3
Secondary
Primary
Secondary
Data Center N
Secondary
Primary
Secondary
Shard 1 Shard 3Shard 2
…
Shard N
Principals
• A node and server are one and the same.
• A primary (writable) and 2 secondary (read-only
nodes make up a 3-node replicate set.
• Primary and secondary are copies of each other.
• Shards reside across multiple servers.
• Load and data is distributed across each shard
facilitating scaling.
a
88© 2017 Rogue Wave Software, Inc. All Rights Reserved. 88
Replication and Replica Sets
Two types of replication: Same replication mechanism. A primary node receives writes
while secondary nodes read and apply writes asynchronously.
• Master-slave: Deprecated.
– Uses a manual mechanism to promote a secondary to primary.
– oplog is only stored on the master (more on this later).
– Supports more than 50 nodes.
• Replica Sets:
– If the primary goes down, then a secondary is promoted to primary and receives
writes.
– Transparent recovery.
– More sophisticated deployment topologies.
– Supports up to 50 nodes.
– Recommended for production.
– Rollbacks: A write is not considered promoted unless it was written to a majority of
member nodes (50%). If the majority cannot accept the write, then all nodes
become read-only.
– A node replays a journal to determine what writes to apply after a failure.
89© 2017 Rogue Wave Software, Inc. All Rights Reserved. 89
Replication and Replica Sets
• Why not use replication?
– Hardware load issues:
• Working set does not fit in RAM.
• Disk speed.
• More than half the operations are writes.
– Secondary nodes must always have an updated copy of data.
• This can be fixed, but it introduces a latency back to the
application.
• How does replication work?
– oplog: Capped collection in the local database containing records of
write data. Steps to reproduce the write are stored each time the primary
is written to. Each entry has a BSON timestamp to track writes.
– Heartbeat: Monitors health and controls failover by pinging every 2
seconds (default).

More Related Content

Viewers also liked

Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ivan Zoratti
 
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDBWebinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
Severalnines
 

Viewers also liked (18)

Strip your TEXT fields - Exeter Web Feb/2016
Strip your TEXT fields - Exeter Web Feb/2016Strip your TEXT fields - Exeter Web Feb/2016
Strip your TEXT fields - Exeter Web Feb/2016
 
Coding like a girl - DjangoCon
Coding like a girl - DjangoConCoding like a girl - DjangoCon
Coding like a girl - DjangoCon
 
MySQL Cluster Whats New
MySQL Cluster Whats NewMySQL Cluster Whats New
MySQL Cluster Whats New
 
20171104 hk-py con-mysql-documentstore_v1
20171104 hk-py con-mysql-documentstore_v120171104 hk-py con-mysql-documentstore_v1
20171104 hk-py con-mysql-documentstore_v1
 
LAMP: Desenvolvendo além do trivial
LAMP: Desenvolvendo além do trivialLAMP: Desenvolvendo além do trivial
LAMP: Desenvolvendo além do trivial
 
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
 
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
 
Sharding using MySQL and PHP
Sharding using MySQL and PHPSharding using MySQL and PHP
Sharding using MySQL and PHP
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
 
Strip your TEXT fields
Strip your TEXT fieldsStrip your TEXT fields
Strip your TEXT fields
 
MySQL 5.7 - 
Tirando o Máximo Proveito
MySQL 5.7 - 
Tirando o Máximo ProveitoMySQL 5.7 - 
Tirando o Máximo Proveito
MySQL 5.7 - 
Tirando o Máximo Proveito
 
LaravelSP - MySQL 5.7: introdução ao JSON Data Type
LaravelSP - MySQL 5.7: introdução ao JSON Data TypeLaravelSP - MySQL 5.7: introdução ao JSON Data Type
LaravelSP - MySQL 5.7: introdução ao JSON Data Type
 
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScaleThe Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
 
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDBWebinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
 
Software Design Patterns in Laravel by Phill Sparks
Software Design Patterns in Laravel by Phill SparksSoftware Design Patterns in Laravel by Phill Sparks
Software Design Patterns in Laravel by Phill Sparks
 
Anatomy of a Modern PHP Application Architecture
Anatomy of a Modern PHP Application Architecture Anatomy of a Modern PHP Application Architecture
Anatomy of a Modern PHP Application Architecture
 
MySQL 8.0 Preview: What Is Coming?
MySQL 8.0 Preview: What Is Coming?MySQL 8.0 Preview: What Is Coming?
MySQL 8.0 Preview: What Is Coming?
 

More from Rogue Wave Software

More from Rogue Wave Software (20)

The Global Influence of Open Banking, API Security, and an Open Data Perspective
The Global Influence of Open Banking, API Security, and an Open Data PerspectiveThe Global Influence of Open Banking, API Security, and an Open Data Perspective
The Global Influence of Open Banking, API Security, and an Open Data Perspective
 
No liftoff, touchdown, or heartbeat shall miss because of a software failure
No liftoff, touchdown, or heartbeat shall miss because of a software failureNo liftoff, touchdown, or heartbeat shall miss because of a software failure
No liftoff, touchdown, or heartbeat shall miss because of a software failure
 
Disrupt or be disrupted – Using secure APIs to drive digital transformation
Disrupt or be disrupted – Using secure APIs to drive digital transformationDisrupt or be disrupted – Using secure APIs to drive digital transformation
Disrupt or be disrupted – Using secure APIs to drive digital transformation
 
Leveraging open banking specifications for rigorous API security – What’s in...
Leveraging open banking specifications for rigorous API security –  What’s in...Leveraging open banking specifications for rigorous API security –  What’s in...
Leveraging open banking specifications for rigorous API security – What’s in...
 
Adding layers of security to an API in real-time
Adding layers of security to an API in real-timeAdding layers of security to an API in real-time
Adding layers of security to an API in real-time
 
Getting the most from your API management platform: A case study
Getting the most from your API management platform: A case studyGetting the most from your API management platform: A case study
Getting the most from your API management platform: A case study
 
Advanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applicationsAdvanced technologies and techniques for debugging HPC applications
Advanced technologies and techniques for debugging HPC applications
 
The forgotten route: Making Apache Camel work for you
The forgotten route: Making Apache Camel work for youThe forgotten route: Making Apache Camel work for you
The forgotten route: Making Apache Camel work for you
 
Are open source and embedded software development on a collision course?
Are open source and embedded software development on a  collision course?Are open source and embedded software development on a  collision course?
Are open source and embedded software development on a collision course?
 
Three big mistakes with APIs and microservices
Three big mistakes with APIs and microservices Three big mistakes with APIs and microservices
Three big mistakes with APIs and microservices
 
5 strategies for enterprise cloud infrastructure success
5 strategies for enterprise cloud infrastructure success5 strategies for enterprise cloud infrastructure success
5 strategies for enterprise cloud infrastructure success
 
PSD2 & Open Banking: How to go from standards to implementation and compliance
PSD2 & Open Banking: How to go from standards to implementation and compliancePSD2 & Open Banking: How to go from standards to implementation and compliance
PSD2 & Open Banking: How to go from standards to implementation and compliance
 
Java 10 and beyond: Keeping up with the language and planning for the future
Java 10 and beyond: Keeping up with the language and planning for the futureJava 10 and beyond: Keeping up with the language and planning for the future
Java 10 and beyond: Keeping up with the language and planning for the future
 
How to keep developers happy and lawyers calm (Presented at ESC Boston)
How to keep developers happy and lawyers calm (Presented at ESC Boston)How to keep developers happy and lawyers calm (Presented at ESC Boston)
How to keep developers happy and lawyers calm (Presented at ESC Boston)
 
Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)Open source applied - Real world use cases (Presented at Open Source 101)
Open source applied - Real world use cases (Presented at Open Source 101)
 
How to migrate SourcePro apps from Solaris to Linux
How to migrate SourcePro apps from Solaris to LinuxHow to migrate SourcePro apps from Solaris to Linux
How to migrate SourcePro apps from Solaris to Linux
 
Approaches to debugging mixed-language HPC apps
Approaches to debugging mixed-language HPC appsApproaches to debugging mixed-language HPC apps
Approaches to debugging mixed-language HPC apps
 
Enterprise Linux: Justify your migration from Red Hat to CentOS
Enterprise Linux: Justify your migration from Red Hat to CentOSEnterprise Linux: Justify your migration from Red Hat to CentOS
Enterprise Linux: Justify your migration from Red Hat to CentOS
 
Walk through an enterprise Linux migration
Walk through an enterprise Linux migrationWalk through an enterprise Linux migration
Walk through an enterprise Linux migration
 
How to keep developers happy and lawyers calm
How to keep developers happy and lawyers calmHow to keep developers happy and lawyers calm
How to keep developers happy and lawyers calm
 

Recently uploaded

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 

Recently uploaded (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

What you need to know before you deploy your next MongoDB implementation

  • 1. 1© 2017 Rogue Wave Software, Inc. All Rights Reserved. 1 What You Need to Know Before You Deploy Your Next MongoDB Implementation
  • 2. 2© 2017 Rogue Wave Software, Inc. All Rights Reserved. 2 Presenter Bill Crowell Enterprise Architect Open Source Support Rogue Wave Software
  • 3. 3© 2017 Rogue Wave Software, Inc. All Rights Reserved. 3 Who am I? • Enterprise Architect in the RogueWave Open Source Software group • 22+ years experience encompassing EDI, insurance, retail, entertainment, banking, and health care sectors (Fortune 500 companies) • Worked in various software roles related to full stack development including: – User interface (Java Server Faces, JavaScript, Spring MVC, Node.js, Angular, some PHP) – Middleware (MQ/JMS, WebServices/SOA, and REST) – Databases (NoSQL and RDBMS) – Big Data (Apache Hadoop and Spark) – Security (RBAC and SSO) – DevOps (Jenkins CI with Docker and SCM) – Infrastructure – Testing (performance) – Training and mentorship • Worked with proprietary and OSS projects • Contributed to OSS SSO-project Central Authentication Service • Primary focus is helping others to applying open source in the enterprise.
  • 4. 4© 2017 Rogue Wave Software, Inc. All Rights Reserved. 4 Why should I be in this session? • Your RDBMS or current database system is not meeting your needs. • You are considering deploying a NoSQL database. • You are thinking about deploying MongoDB. • You are currently developing or are in production with MongoDB. • You have deployed MongoDB and need help! This presentation is real use cases that are common causes of pain for MongoDB deployments. This is where I see customers commonly struggle with Mongo.
  • 5. 5© 2017 Rogue Wave Software, Inc. All Rights Reserved. 5 What is MongoDB? • Developed by in 2007 (based in New York). • Released as open source in 2009 under the GNU Affero General Public License (AGPL) and Apache License (language drivers) • Community free edition has been downloaded 30 million times 1 • NoSQL Database: Uses collections of documents instead of rows in a table. • Written in C++, C and JavaScript • Dynamic schema design: Provides flexibility and changes usually have minimal impact on code. Be agile. 1 https://techcrunch.com/2017/09/21/database-provider-mongodb-has-filed-to-go-public/
  • 6. 6© 2017 Rogue Wave Software, Inc. All Rights Reserved. 6 A bit of history first… • changes name to MongoDB to align closer to the product name in August 2013. • June 2016, MongoDB Atlas is released on the cloud initially on Amazon Web Services. • June 2017, MongoDB Atlas includes and . MongoDB Stitch (Beta) is launched which sits on top of MongoDB and aggregates 3rd- party REST services like: • Filed to go public on September 21st, 20171 • Last Thursday, MongoDB stock was traded on the . The stock skyrocketed 33% on the first day of trading ($34/share). @ $30.50 today. • The company is worth an estimated $1.6 billion dollars. 1 https://www.sec.gov/Archives/edgar/data/1441816/000104746917006014/a2233365zs-1.htm
  • 7. 7© 2017 Rogue Wave Software, Inc. All Rights Reserved. 7 Who uses MongoDB? Source: https://www.mongodb.com/who-uses-mongodb About 4,300 paid licenses1 1 https://www.sec.gov/Archives/edgar/data/1441816/000104746917006396/a2233556zs-1a.htm a
  • 8. 8© 2017 Rogue Wave Software, Inc. All Rights Reserved. 8 MongoDB Use Cases High-volume data where structure can change… • Real-time Analytics: Collect data from censors “ uses it with special screwdriver drills that can measure the torque of the screws as they install them in airplanes. By tracking torque, they can make airplanes safer…”1 • Product Catalogs: Categories and inventory • Reporting: Website traffic logs • Stock Trading • Click-stream Ad Campaigns • Social Media • Data Analysis: Call records and data mining • Demographics and Biometrics 2 • Content Management: News, comment fields, photos • Metadata and Asset Management: Type ahead searches • Report Aggregation: Merge data from desperate systems into one record. https://www.mongodb.com/use-cases 1 http://www.businessinsider.com/people-told-the-mongodb-founders-they-were-completely-crazy-2017-10 2 https://techcrunch.com/2013/12/06/inside-indias-aadhar-the-worlds-biggest-biometrics-database/
  • 9. 9© 2017 Rogue Wave Software, Inc. All Rights Reserved. 9 How is Data Stored in MongoDB? • JSON is stored in Mongo as binary-encoded serialization format called BSON (which is transparent to the developer). • BSON adds extra info to documents to allow for easier traversal. • BSON provides additional data types not part of JSON spec (Date and BinData types) and ordered fields: http://bsonspec.org { "conference": "ZendCon 2017” } x28x00x00x00 x02 conferencex00 x06x00x00x00ZendCon 2017x00 x00 // total document size // 0x02 = type String // field name (including collection name) // field value // 0x00 = type EOO ('end of object') ad
  • 10. 10© 2017 Rogue Wave Software, Inc. All Rights Reserved. 10 Working with BSON in PHP http://php.net/manual/en/book.bson.php
  • 11. 11© 2017 Rogue Wave Software, Inc. All Rights Reserved. 11 MongoDB Versions Free: Open source: https://github.com/mongodb/mongo • Command-line shell, database and config server, query router, basic troubleshooting tools, MongoDB Monitoring Service (MMS) Paid: GUIs and Cloud-based Management (Atlas) • On-Premise – Professional: Ops Manager (monitoring, query optimization, automate configuration – Enterprise Advanced: Ops Manager, additional storage engines (encrypted and in-memory), Compass, advanced security (LDAP, RBAC, TLS), auditing • Cloud-based Service: Atlas – Free: 512MB Elastic Block Storage (EBS), shared RAM, 3-node replica set, monitoring and alerts, encryption – Essential: Starts at 8₵/hour, elastic scaling, snapshot backups (1st GB free, then $2.50/GB/month), performance panel, enhanced monitoring and alerts, uptime SLA. – Professional: Compass, proactive issue detection, schema/database design support, enhanced support, 2-hour support SLA
  • 12. 12© 2017 Rogue Wave Software, Inc. All Rights Reserved. 12 MongoDB Total Cost of Ownership A Total Cost of Ownership Comparison of MongoDB & Oracle1 http://s3.amazonaws.com/info-mongodb-com/TCO_MongoDB_vs._Oracle.pdf Oracle software maintenance and support costs (as analogous as possible to MongoDB configurations): • Oracle Database Enterprise Edition ($47,500 per core) plus Oracle RAC pricing ($23,000 per core), for a total of $70,500 per core. Discounts of 0% for small deployments to 80% for large. • Demands 50% of one DBA’s time (small). Requiring 1.5 full-time DBAs (large) • “We assume a conservative 50% discount on the list price for the smaller and larger projects. Additionally, we apply a further 50% discount on top of that to account for Oracle's core processor licensing factor. Amounts to $17,625 per core for both projects.” MongoDB software maintenance and support costs: • Smaller projects: $11,990/server/year • Larger projects: $10,800/server/year (%10 discount) • Demands 25% of one DBA’s time (small) and 75% of one DBA’s time (large). • Assuming 22% of license costs for Oracle. 1 Thanks to Richard Sherrard (Director Product Management in Product Management) for the link! a Assumes: 10% of hardware maintenance and support costs of 10% for both. Hardware maintenance and support costs of 10% of the hardware purchase price for both MongoDB and Oracle.
  • 13. 13© 2017 Rogue Wave Software, Inc. All Rights Reserved. 13 Development versus Deployment A Tale of Two Roads The most common issue is not with application development, but with deploying the application to production and not understanding the application’s requirements and misconfiguring the software or under sizing the infrastructure. Usually this comes at a time when changing the software configuration (and hardware if on-premise) is very difficult. Code (usually) is simpler to change.
  • 14. 14© 2017 Rogue Wave Software, Inc. All Rights Reserved. 14 MongoDB Components • MongoDB Driver: Talks to mongos/mongod • mongos: Query router for sharded clusters; routing proxy process. • mongod: Database including primary/secondary nodes • mongod: Config server which stores metadata • mongo: Interactive JavaScript shell (type-ahead) Zend Server Driver Application Query Router (mongos) Data Center 1 (3-node replica set) Primary (mongod) Config Servers (3-node replica set) Data Center 1 Config Server 1 Data Center 2 Config Server 2 Data Center 3 Config Server 3 Secondary (mongod) Secondary (mongod) Data Center 2 (3-node replica set) Primary (mongod) Secondary (mongod) Secondary (mongod) a
  • 15. 15© 2017 Rogue Wave Software, Inc. All Rights Reserved. 15 Why would I use MongoDB? Example: RDBMS One-To-Many Relationship SELECT * FROM USERS INNER JOIN EMAIL_ADDRESS ON USERS.USER_ID = EMAIL_ADDRESS.USER_ID; Much thought must be taken to not change the schema later on. This is hard to do especially if the tables are shared by other applications. Field Value user_id 1 username wcrowell firstname William lastname Crowell Field Value id 10 user_id 1 email william.crowell@abc.com Field Value id 11 user_id 1 email wcrowell@xyz.com Users Table Email Address Table
  • 16. 16© 2017 Rogue Wave Software, Inc. All Rights Reserved. 16 Why would I use MongoDB? Example: MongoDB One-To-Many Relationship { "_id": 1, "username": "wcrowell", "firstname": "William", "lastname": "Crowell", "email": [ "william.crowell@abc.com", "wcrowell@xyz.com" ] } Document-Based Data Model • Nested fields allowing for a richer data model and requiring less joins than tables. • Database changes are easily made: Lack of schema can make your data model more fluid. • You can collapse a multi-table RDBMS model into a single MongoDB collection using arrays and nested documents. • Many-to-many relationships can be modeled as arrays in MongoDB.
  • 17. 17© 2017 Rogue Wave Software, Inc. All Rights Reserved. 17 MongoDB Levels of Granularity • A database contains a collection of documents. Top-level named grouping in the system. • A collection is a group of documents similar to a table. • A document is similar to a row in a table and is the simplest unit of data. • A chunk is a group of documents clustered by values on a field. (more on this later) Important limits: https://docs.mongodb.com/manual/reference/limits/ • 16MB limit on documents. • Maximum document nesting depth is 100. Database: drinks Collection: beers Chunk: All documents with field ‘beer’ from a - c Document: {”beer”: “Bud”} Collection: wines Document: {”beer”: “Blue Moon”} Document: {”beer”: “Bass”} Chunk: All documents with field ‘beer’ from d - g Document: {”beer”: “Corona”} Document: {”beer”: “Dogfish”} Document: {”beer”: “Guiness”} Chunk: All documents with field ‘wine’ from a - c Document: {”wine”: “Blush”} Document: {”wine”: “Chardonnay”} Document: {”wine”: “Champagne”} Chunk: All documents with field ‘wine’ from d - g Document: {”wine”: “Dolcetto”} Document: {”wine”: “Eiswein”} Document: {”wine”: “Frascati”} What manages how documents and collections are stored? a
  • 18. 18© 2017 Rogue Wave Software, Inc. All Rights Reserved. 18 Storage Engines What is a storage engine? “A storage engine is the part of a database that is responsible for managing how data is stored, both in memory and on disk. Many databases support multiple storage engines, where different engines perform better for specific workloads. For example, one storage engine might offer better performance for read-heavy workloads, and another might support a higher throughput for write operations.”1 Pluggable storage engines are a key feature in many open source projects (middleware and database) which allows the user to tailor the software to their application’s needs. 1 https://docs.mongodb.com/manual/faq/storage/#what-is-a-storage-engine
  • 19. 19© 2017 Rogue Wave Software, Inc. All Rights Reserved. 19 Storage Engines and Locking Locking in Mongo has come a long way. A single write operation on a document used to lock the entire Mongo instance. This meant every database was locked for a single write operation on a document. Very inefficient. For a Big Data application that is a big deal. Database locking can directly impact performance. The finer grained a lock is the better in terms of contention and performance. All databases, regardless of RDBMS or NoSQL, implement some type of locking to ensure consistency. The locking mechanism in pluggable storage engines available with Mongo can differ between implementations.
  • 20. 20© 2017 Rogue Wave Software, Inc. All Rights Reserved. 20 Storage Engines and Locking Global: The entire Mongo instance is locked until the lock is released. This includes all databases, collections, and documents. These locks are very expensive. Database: Only the database and all collections owned by that database are locked. Database locks are still expensive. Collection: Only the documents in the collection (table) are locked. Better. Document: Individual documents can be locked instead of locking the entire collection which improves performance for write-heavy applications. Only the WiredTiger storage engine implements document-level locking. There are operations in Mongo that can cause locking at each level1. There are 3 (really 4) storage engines. Each is tailored to different workloads. 1 https://docs.mongodb.com/manual/faq/concurrency/
  • 21. 21© 2017 Rogue Wave Software, Inc. All Rights Reserved. 21 Storage Engines: MMAPv1 and In-Memory MMAPv11: Storage engine for memory mapped files. Used in pre-3.x Mongo. As of version 3.2, is no longer the default. Great for high volume inserts, reads, and updating existing documents. Writes to the disk every 60 seconds (customizable) and uses on-disk journal to maintain durability. Uses all free memory on the machine for the cache and yields to other processes that need memory. Swaps to disk as needed. • Uses lots of disk space • Implements collection-level locking • Maximum of 32TB (using a 64-byte key) In-Memory2: Enterprise license only. Does not use any disk. If Mongo is shutdown, then the data is lost. High-performance. Real-time analytics. • 50% of RAM – 1GB 1 https://docs.mongodb.com/manual/core/mmapv1/ 2 https://docs.mongodb.com/manual/core/inmemory/
  • 22. 22© 2017 Rogue Wave Software, Inc. All Rights Reserved. 22 Storage Engines: WiredTiger WiredTiger1 (default): Features: • Introduced in March 2015 with MongoDB 3.0. More CPU-intensive. • WiredTiger uses multi-version concurrency control (MVCC) to perform write locks. • Locks can be done on a global/instance-level, database, collection, or document level. Promises 7-10x better write performance. • Provides on-disk data compression (index and documents). Up to 80% less storage (snappy or zlib compression) • Version 3.4+: 50% of RAM – 1GB or 256MB • Version 3.2: 60% of RAM – 1GB or 1GB • Encryption at Rest: Enterprise license only. HIPPA-compliant. AES256- CBC (default). 3.2+ only. 1 https://docs.mongodb.com/manual/core/wiredtiger/
  • 23. 23© 2017 Rogue Wave Software, Inc. All Rights Reserved. 23 MongoDB PHP Drivers PHP MongoDB Driver Homepage https://docs.mongodb.com/ecosystem/drivers/php/ http://php.net/manual/en/set.mongodb.php 1) MongoDB Driver for PHP from PHP Extension Community Library (PECL) https://pecl.php.net/package/mongodb • Thin (bare-bones) limited functionality driver • Currently maintained (version 1.3.1 released October 16th, 2017) 2) MongoDB PHP Library https://docs.mongodb.com/php-library/current/ • Wrapper for the lower-level PHP driver above • Recommended fully-featured driver • Requires PHP 5.4+, libbson, and libmongoc and OpenSSL • Documentation: https://docs.mongodb.com/php-library/current/
  • 24. 24© 2017 Rogue Wave Software, Inc. All Rights Reserved. 24 MongoDB PHP Drivers MongoDB Compatibility1 The following compatibility table specifies the recommended version(s) of the MongoDB PHP driver for use with a specific version of MongoDB. 1 https://docs.mongodb.com/ecosystem/drivers/driver-compatibility-reference/#php-driver-compatibility PHP Driver MongDB 2.4 MongoDB 2.6 MongoDB 3.0 MongoDB 3.2 MongonDB 3.4 PHPLIB 1.1 + mongodb-1.2* Yes Yes Yes Yes Yes PHPLIB 1.0 + mongodb-1.1* Yes Yes Yes Yes mongodb-1.1* Yes Yes Yes Yes mongodb-1.0* Yes Yes Yes mongo-1.6** Yes Yes Yes mongo-1.5** Yes Yes mongo-1.4** Yes Yes mongo-1.3** Yes *New driver **Legacy driver
  • 25. 25© 2017 Rogue Wave Software, Inc. All Rights Reserved. 25 MongoDB PHP Drivers PHP Language Compatibility1 The following compatibility table specifies the recommended version(s) of the MongoDB PHP driver for use with a specific version of PHP/Zend2. 1 https://docs.mongodb.com/ecosystem/drivers/driver-compatibility-reference/#reference-compatibility-language-php 2 https://framework.zend.com/blog/2017-06-06-zf-php-7-1.html 2 https://zend18.zendesk.com/hc/en-us/articles/217058968-PHP-Versions-and-APIs 3 Clark Everetts PHP Driver PHP 5.6 Zend 8.0 PHP 5.6 Zend 8.5LTS PHP 7.0.15 Zend 9.0.2 PHP 7.1.3/7 Zend 9.1 HHVM 3.12 HHVM 3.15 mongodb-1.2* Yes Yes Yes Yes Yes Yes mongodb-1.1* Yes Yes Yes Yes Yes mongodb-1.0* Yes Yes Yes mongo-1.6** Yes Yes mongo-1.3-1.5** Yes Yes *New driver **Legacy driver Note: Support for PHP 5.6 is available via Zend Server 8.5, not 8.03. HHVM - HipHop Virtual Machine (Facebook)
  • 26. 26© 2017 Rogue Wave Software, Inc. All Rights Reserved. 26 Demo: PHPLIB with MongoDB http://php.net/manual/en/mongodb.tutorial.library.php Install Composer (package manager). Install the library: composer require mongodb/mongodb Creates a bootstrap for dependency classes: vendor/autoload.php Entire library: https://docs.mongodb.com/php-library/current/ Libraries used: PHP 7.1.10 Apache 2.4.25 MongoDB PHPLIB Extension 1.2.9 http://localhost/phpinfo.php 2 files: list.php and create.php No schema is defined (e.g. collections/documents). d
  • 27. 27© 2017 Rogue Wave Software, Inc. All Rights Reserved. 27 Hardware: Memory The most important part of a Mongo deployment is RAM and not CPU. Mongo uses much less CPU than a RDBMS. Insufficient RAM is the most common performance issue. RAM will contain your indexes and working set. The working set for an application should be able to fit comfortably in memory. This is the amount includes: • Data or collections: Containers for like documents stored in extents. Number of pages accessed per second by active users on the system. • Indexes on the collections. Also consider the following: • The period of time the data and indexes need to be retained. • Connection pooling (1MB per active thread). • Account for fragmentation. • Operations on the data including sorting and aggregation (map reduce). This does not mean all of the documents and indexes in the database have to fit within RAM. Only documents and collections (a majority) your application accesses are part of the working set.
  • 28. 28© 2017 Rogue Wave Software, Inc. All Rights Reserved. 28 Hardware: Memory 1For example, you have a year’s worth of data, and assume each month is 1GB of data totaling 12GB. For every month of data you have 1GB of indexes totaling 12GB. If your application is accessing 12 month’s worth of data, then your working set is: 12GB of data and 12GB of indexes = 24GB. If you had 8GB RAM and your application started accessing 6 month’s worth of data (6GB data + 6GB indexes), then your working set would start to exceed the available RAM and poorly perform as time progresses (as more data is accessed). You want to prevent Mongo from paging in and paging out documents in your working set. 1 https://stackoverflow.com/questions/6453584/what-does-it-mean-to-fit-working-set-into-ram-for-mongodb
  • 29. 29© 2017 Rogue Wave Software, Inc. All Rights Reserved. 29 Hardware: Memory How much RAM do I need for my deployment? It depends on: • An application’s typical use cases/access patterns. Every application is different. Requires understanding your application. • How well the queries are indexed. If queries are doing full collection scans, then more memory will be used. Always allow room to grow. Best practice: Create automated test scripts (e.g. PHPUnit1) focusing on repeatability and keep them updated as the application changes. Monitor the application using the approaches mentioned here. Take time to do this as it will pay dividends in the long run. Sell this to your management team that you need to do this for the application’s success. 1 https://zendframework.github.io/zend-test/phpunit/
  • 30. 30© 2017 Rogue Wave Software, Inc. All Rights Reserved. 30 Hardware: Paging You do not get a notification when Mongo is paging. So, how do I know if I am paging? Run the following from a mongo prompt: db.serverStatus().extra_info { "page_faults" : 6137 } Example, client ran a perf test with 8GB of RAM: { "page_faults" : 353 } After increasing to 16GB of RAM: { "page_faults" : 76 }
  • 31. 31© 2017 Rogue Wave Software, Inc. All Rights Reserved. 31 Hardware: Disks If Mongo must access disk, then prefer RAID10 or SSD-based VM over HDD (which will cost more). “Solid state drives (SSDs) can outperform spinning hard disks (HDDs) by 100 times or more for random workloads” 1 and are increasingly more affordable. Maximizing MongoDB Performance on AWS https://www.mongodb.com/blog/post/maximizing-mongodb-performance-on- aws 1 https://docs.mongodb.com/manual/core/write-performance/#storage-performance
  • 32. 32© 2017 Rogue Wave Software, Inc. All Rights Reserved. 32 Security Disabling SELinux “Problems have been reported when using MongoDB with SELinux enabled. To avoid issues, disable SELinux when possible.”1 There are methods on how to work around this without disabling SELinux. Try audit2why on the /var/log/audit/audit.log to view the violations, and build custom policies with audit2allow2. 1 https://docs.mongodb.com/manual/tutorial/install-mongodb-on-red-hat/ 1 https://docs.mongodb.com/manual/administration/production-notes/#recommended-configuration 2 https://serverfault.com/questions/770227/selinux-setup-for-mongodb
  • 33. 33© 2017 Rogue Wave Software, Inc. All Rights Reserved. 33 Security Do enable MongoDB security! Tutorial: Enable Authentication https://docs.mongodb.com/manual/tutorial/enable-authentication/ Major security alert as 40,000 MongoDB databases left unsecured (February 2015) https://www.techworm.net/2015/02/major-security-alert-40000-mongodb- databases-left-unsecured.html This was discovered by 3 students in Germany by checking TCP port 27017. You can easily go out on (search engine for IoT) and do a search on MongoDB: shodan download --limit -1 mongodb "product:MongoDB”
  • 34. 34© 2017 Rogue Wave Software, Inc. All Rights Reserved. 34 Security It's Still the Data, Stupid! (December 2015 by John Matherly) https://blog.shodan.io/its-still-the-data-stupid/ “At the moment, there are at least 35,000 publicly available, unauthenticated instances of MongoDB running on the Internet... There's a total of 684.4 TB of data exposed on the Internet via publicly accessible MongoDB instances that don't have any form of authentication.” MongoDB ransacking starts again: Hackers ransom 26,000 unsecured instances (September 5th, 2017 by Liam Tung) http://www.zdnet.com/article/mongodb-ransacking-starts-again-hackers-ransom- 26000-unsecured-instances/ “Three groups of hackers have wiped around 26,000 MongoDB databases over the weekend and demanded victims to pay about ~$650 (495 pounds) to have them restored.”
  • 35. 35© 2017 Rogue Wave Software, Inc. All Rights Reserved. 35 Operating System Tuning: THP Disable Transparent Huge Pages (THP) for Linux: https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/ Why? When an application is accessing memory, you want memory to be accessed contiguously. Databases tend to have sparse rather than contiguous memory access patterns. Set readahead to 0 Regardless of Storage (e.g. SSD, HDD) https://docs.mongodb.com/manual/administration/production-notes/#readahead Why? “Setting a higher readahead benefits sequential I/O operations1.” MongoDB accesses the disk in random patterns. Increasing this value can degrade performance.
  • 36. 36© 2017 Rogue Wave Software, Inc. All Rights Reserved. 36 Operating System Tuning: File System Types Use XFS and not EXT4 or NFS https://docs.mongodb.com/manual/administration/production-notes/#kernel-and-file-systems Why? XFS is arguably better for concurrent writes. It is best to run mongoperf on your system to check. “With the WiredTiger storage engine, use of XFS is strongly recommended to avoid performance issues that may occur when using EXT4 with WiredTiger1.” “Avoid using NFS drives for your dbPath. Using NFS drives can result in degraded and unstable performance1.” XFS vs EXT4 – Comparing MongoDB Performance on AWS EC2 https://scalegrid.io/blog/xfs-vs-ext4-comparing-mongodb-performance-on-aws-ec2/ “In performance terms, XFS is indeed a force multiplier when paired with high speed disks that it can take real advantage from. For low to mid-end systems, it doesn’t seem to be able to do much to improve your performance.” Windows Only: Do not use FAT file system. Use NTFS instead. 1 https://docs.mongodb.com/manual/administration/production-checklist-operations/#filesystem
  • 37. 37© 2017 Rogue Wave Software, Inc. All Rights Reserved. 37 Operating System Tuning: Disk Space Disk Space Have enough disk space for the size of your data, indexes, and log files plus plenty of room for expansion. Running db.stats() (proactively) and looking at the following fields should give an idea: • avgObjSize: Average size of files allocated for this database. • dataSize: Size of BSON objects in the database. • storageSize: Total space allocated for collection extents. Extra space reserved for collection growth and unallocated deleted space.
  • 38. 38© 2017 Rogue Wave Software, Inc. All Rights Reserved. 38 Operating System Tuning: NUMA Disable NUMA (Non-Uniform Access Memory) https://docs.mongodb.com/manual/administration/production-notes/#mongodb-and-numa-hardware What is NUMA? It is used to increase processor speed on a multi-core system without increasing load on the processor bus. This is where you would have 2 memory pools where each core has some degree of proximity to each memory pool on the bus. Why disable NUMA? It can cause memory to paged in and out unnecessarily. Mongo will complain if it expects NUMA is enabled. Accessing local memory is faster than remote. How is NUMA disabled? On Windows, it can be configured in the BIOS. On Linux: echo 0 | sudo tee /proc/sys/vm/zone_reclaim_mode Or: sudo sysctl -w vm.zone_reclaim_mode=0 Then any Mongo application (e.g. mongod, mongos, mongo) must be started with numactl: numactl --interleave=all /usr/bin/mongod --quiet -f /etc/mongod.conf run Note: This may not be necessary when bound to a single NUMA node: See: https://jira.mongodb.org/browse/SERVER-25984 8GB RAM CP U 0 CP U 2 CP U 4 CP U 6 CP U 1 CP U 3 CP U 5 CP U 7 8GB RAM bus a
  • 39. 39© 2017 Rogue Wave Software, Inc. All Rights Reserved. 39 Operating System Tuning: atime Disabling Last Accessed Time Disable last accessed time (atime) in the file system table (/etc/fstab) entries for volumes containing Mongo database files. Why disable last accessed time? This can provide a significant performance improvement unless you have an application that relies on atime. How is atime disabled on Linux? /dev/mapper/datavg-datalv /apps xfs defaults,noatime 0 0 /dev/mapper/appvg-appsloglv /apps/logs xfs defaults,noatime 0 0
  • 40. 40© 2017 Rogue Wave Software, Inc. All Rights Reserved. 40 Tools: Finding the Bottleneck iostat1: Number of accesses over time to the disk. Example: iostat –xmt 1 • %util: This is the most useful field for a quick check, it indicates what percent of the time the device/drive is in use. • avgrq-sz: Average request size. Smaller number for this value reflect more random IO operations. vmstat2: How much data being used and fitting into memory and page faults. mongostat3: Provides a quick overview of the status of a currently running mongod or mongos instance. Similar to vmstat. Profiling4: mongod --profile <level:0> --slowms <milliseconds: 100> 1 https://docs.mongodb.com/manual/administration/production-notes/#iostat 2 https://docs.mongodb.com/manual/faq/diagnostics/#how-do-i-read-memory-statistics-in-the-unix-top-command 3 https://docs.mongodb.com/manual/reference/program/mongostat/ 4 https://docs.mongodb.com/manual/reference/program/mongod/#bin.mongod http://edgystuff.tumblr.com/post/81219256714/tips-to-check-and-improve-your-storage-io
  • 41. 41© 2017 Rogue Wave Software, Inc. All Rights Reserved. 41 Tools: Finding the Bottleneck mongoperf1: Checks disk I/O performance independently of MongoDB. mongoperf can overstate performance problems in ext-X filesystems https://jira.mongodb.org/browse/SERVER-13417 “People who use mongoperf to compare XFS and ext-X might get results that overstate the benefits of XFS…The workaround is to let mongoperf use multiple files. That would also make the mongoperf load more realistic given mongodb will use many files.” Set the file size, # of threads, read/write operations in a .conf file: { nThreads:1024, fileSizeMB:1000, mmf:false, r:true, w:true, syncDelay:60 } mongoperf < ./mongoperf.conf Tips to check and improve your storage IO performance with MongoDB http://edgystuff.tumblr.com/post/81219256714/tips-to-check-and-improve-your-storage-io mongotop2: Tracks time spent reading/writing per namespace (database/collection). MongoDB Monitoring Service (MMS): Graphical user display for monitoring, backup, and deployment. http://api.mongodb.com/mms/ 1 https://docs.mongodb.com/manual/reference/program/mongoperf/ 2 https://docs.mongodb.com/manual/reference/program/mongotop/
  • 42. 42© 2017 Rogue Wave Software, Inc. All Rights Reserved. 42 Indexes What makes a good index? • The query optimizer chooses the most efficient query plan for the available indexes. • An index an a unique ID value field is very selective. When multiple indexes are involved, Mongo evaluates the indexes and uses the more highly selective index. • Will the query planner use the index? If the fields named in the query are part of the index, then yes. • The index should be selective to narrow down the results for a given key. • Index on a boolean field only are usually not selected because only two possible values (true or false) will not narrow down the selection. • Mongo keeps statistics on index hits to see if a key match points to a few or many documents. • Mongo can select the wrong index to use meaning that the other index would perform better. – Mongo caches the selection and may remember a non-optimal choice. – Statistically an index may look good, but another might perform better.
  • 43. 43© 2017 Rogue Wave Software, Inc. All Rights Reserved. 43 MongoDB Indexes Indexes are implemented as a B-tree data structure1. Each collection has an index on _id automatically. MongoDB allows 64 indexes per collection2. Run explain on your application’s main queries to determine if they are using indexes. See if a column is indexed; db.beers.find( { "beer": "Stickee Monkee" } ).explain() If queryPlanner.winningPlan.stage equals: • “IDHACK”: Uses special ID index strategy to retrieve the documents for this query. • “COLLSCAN”: This is a collection scan which means the query had to visit every document in the collection. For large databases, MongoDB would have to page all of the documents into memory which is very slow. • “IXSCAN”: Index scan. • “FETCH”: Document retrieval (IXSCAN index hit). 1 https://docs.mongodb.com/manual/indexes/#create-an-index 2 https://docs.mongodb.com/manual/reference/limits/#Number-of-Indexes-per-Collection d
  • 44. 44© 2017 Rogue Wave Software, Inc. All Rights Reserved. 44 MongoDB Indexes Get execution statistics: db.<collection>.find({_id:ObjectId("595411070a797e7aaeff2733")}).exp lain(‘executionStats’) Look at the following: • executionStats.totalDocsExamined (previously named nscannedObjects). If this number is high, then an index was not used. • executionStats.totalKeysExamined: The number of index entries scanned. If it is zero, then an index was not used in the query. • executionStats.executionStages.nReturned: # of results the query returned. Reference: Explain results: https://docs.mongodb.com/manual/reference/explain-results/ Get indexes: db.<collection>.getIndexes() d
  • 45. 45© 2017 Rogue Wave Software, Inc. All Rights Reserved. 45 Indexes Create an index: db.<collection>.createIndex({<field>: <1 for ascending, 2 for descending>}) • Creates an index and collection if it does not exist. • Direction can be 1 for ascending, -1 for descending “If a write operation modifies an indexed field, MongoDB updates all indexes that have the modified field as a key1.” d 1 https://docs.mongodb.com/manual/faq/indexes/#how-do-write-operations-affect-indexes
  • 46. 46© 2017 Rogue Wave Software, Inc. All Rights Reserved. 46 2 Types of Scaling The traditional way to scale a RDBMS is to add more hardware. Eventually, a price or scale limit is hit making this approach unfeasible. • Vertical: Adding CPU, faster disks, additional memory on a bare-metal machine. Simple to implement. • Horizontal: Pooling resources to distribute the load and data across multiple machines. Cloud solutions allow us to scale up or down depending on need. One way to accomplish this is through sharding. *It is essential to understand your application’s requirements regarding latency and throughput, the volume and type of data, and the period of time the data is kept.
  • 47. 47© 2017 Rogue Wave Software, Inc. All Rights Reserved. 47 Replication and Replica Sets • Replication – Protects your data but is not a backup. – Provides redundancy by synchronizing across nodes. – Disaster recovery: Automatic failover when your primary node fails. – Used to scale reads. – Goal: Always have one full copy of your data at all times. • What is a replica set? – Replication where a configured group of nodes automatically synchronize their data and fail over when a node is no longer available. – A recommended minimum of 3 nodes
  • 48. 48© 2017 Rogue Wave Software, Inc. All Rights Reserved. 48 Replica Set: What does it look like? Driver Application Data Center 1 Secondary Primary Secondary • Typical setup: A primary (writable) and 2 secondary (read-only nodes make up a 3- node replica set. • Near 100% uptime is critical: A loss of 1 node does not take down the entire replica set. • Primary and secondary are copies of each other. • Writes can only be done on the primary. • Reads can be performed on the secondary. • A priority can be set that favors a node to become a primary in case of failover1. • If there are 2 data centers with 2 nodes in one data center, then use one of the data centers for DR. • Not scalable: Primary is largest common denominator (memory, drive space) Asynchronous Replication Data Center 2 (DR) Secondary Secondary 1 https://docs.mongodb.com/manual/tutorial/adjust-replica-set-member-priority/ 2 https://docs.mongodb.com/manual/reference/replica-configuration/#rsconf.members[n].priority a Ok, that’s great. How do we scale horizontally?
  • 49. 49© 2017 Rogue Wave Software, Inc. All Rights Reserved. 49 Sharding • What is sharding? – A shard contains a subset of the sharded data. Shards are deployed as a replica set1. – Distributes the load by partitioning documents into smaller manageable pieces on less powerful machines so one machine does not have to store everything. – Partitioning is abstract to the application. – Used to scale writes. Most operations are inserts and updates for large-volume applications. – Adds overhead and complexity: Moving data off overloaded shards takes time and resources. – Balancing and redistribution of the data across the shards is done automatically. 1 https://docs.mongodb.com/manual/core/sharded-cluster-components/
  • 50. 50© 2017 Rogue Wave Software, Inc. All Rights Reserved. 50 When to use sharding? • Geo-locality: Support geographically distributed deployments of optimal user experience for customers in many locations. – Lower network latency. – Great for mobile applications. • Scalability: The working set growth is unbounded and exceeds the available RAM on the largest node/server. When not enough resources exist on a single machine and there is a lot of I/O, configuring sharding and spreading the load across several machines may reduce the load on an individual server. • Hardware Optimization: Enhancing performance without the cost. • Low Recovery Times: Mitigate the impact on from server failure.
  • 51. 51© 2017 Rogue Wave Software, Inc. All Rights Reserved. 51 Sharding • When not to use sharding? – New deployments: Prototype first with a large amount of data. Use a replica set first, and then move to sharding later on if required. – A prototype (or performance test) allows for a more accurate estimate on the hardware. – If you determine sharding is needed within the first 6 to 12 months after deployment of a project, then plan for a sharded cluster. – An application has a tendency to access data not local. – Infrastructure limitations: Sharding requires several servers. – The application has one shard.
  • 52. 52© 2017 Rogue Wave Software, Inc. All Rights Reserved. 52 Should I use sharding in this use case? • I own a trucking company that owns 1,000 18-wheelers delivering steel beams to construction companies. The U.S. Department of Transportation enacted the Transportation Recall Enhancement, Accountability, and Documentation Act (TREAD) which required Tire Pressure Monitoring System (TPMS) be installed on each truck to automatically measure tire pressure for each tire in 1-minute increments using sensors while each truck is moving. The data is uploaded to the home office where it is required to be kept for 3 years. a
  • 53. 53© 2017 Rogue Wave Software, Inc. All Rights Reserved. 53 Sharding Deployments • A sharded cluster contains: – Shards store the application data. In a sharded cluster, only the mongos routers or system administrators should be connecting directly to the shards. – mongos query routers cache the cluster metadata and use it to route operations to the correct shard or shards. – Config servers persistently store metadata about the cluster, including which shard has what subset of the data. • Guidelines – Each member of a replica set, whether it’s a complete replica or an arbiter, needs to live on a distinct machine. – Replica set arbiters are lightweight enough to share a machine with another process. Arbiters can be placed on a mongos query router. – Config servers can optionally share a machine. The only hard requirement is that all config servers in the config cluster reside on distinct machines.
  • 54. 54© 2017 Rogue Wave Software, Inc. All Rights Reserved. 54 Sharding Caveats “My application performed fine with a replica set, and now my application performs terribly with a sharded cluster! Why?” Initial load of data is not split and as a result written to one shard. As data is written more chunks are created. Chunks are is a group of documents clustered by values on a field (e.g. list of beers between letters a – d). As more chunks are created and certain thresholds are met, Mongo moves the data to other shards/servers to balance out. This process called a migration. What does this look like?
  • 55. 55© 2017 Rogue Wave Software, Inc. All Rights Reserved. 55 Sharding Migrations Driver Application Query Router (mongos) Query Router (mongos) … Chunk 64MB Shard 1 Shard 3Shard 2 Shard 4 • A balancer process (automatically) monitors the number of chunks on each shard. • The first data load all writes go to one shard/server. • As data is loaded the chunks are created and segmented as the chunks grow. Chunk 64MB Chunk 64MB Chunk 64MB Chunk 64MB Chunk 64MB Chunk 64MB Chunk 64MB Chunk 64MB Chunk 64MB • When a shard fills to capacity, the balancer issues a migration. • Migrations take time because data is being moved from server to server. • The goal: Equal # of chunks per shard. Chunk 64MB Chunk 64MB a
  • 56. 56© 2017 Rogue Wave Software, Inc. All Rights Reserved. 56 Sharding Caveats • The data can be pre-split to mitigate this issue (using either a command1 or script2). The chunks will be created and balanced ahead of time. When data is written then the data will be distributed according to your sharding strategy. • Be proactive! Your application’s performance can degrade. A good rule of thumb is to plan to add a new shard at least several weeks before the indexes and working set on your existing shards reach 90% of RAM. • Fixed-size collections (or capped collections) cannot be sharded. • Use care with mapReduce and aggregation functions with sharding as they tend to globally lock the database: Remove unnecessary global lock during "replace" out action https://jira.mongodb.org/browse/SERVER-13552 “During map-reduce operation there are unnecessary global lock used and should be removed.” 1 https://docs.mongodb.com/manual/reference/command/shardCollection/ 1 https://docs.mongodb.com/manual/tutorial/migrate-chunks-in-sharded-cluster/ 2 https://docs.mongodb.com/manual/tutorial/create-chunks-in-sharded-cluster/
  • 57. 57© 2017 Rogue Wave Software, Inc. All Rights Reserved. 57 Deployment Strategy Zend Server Driver Application Query Router (mongos) Data Center 1 Secondary (shard 2) Primary (shard 1) Secondary (shard 3) Zend Server Driver Application Query Router (mongos) Zend Server Driver Application Query Router (mongos) Query Router (mongos) Config Servers (3-node replica set) Data Center 1 Config Server 1 Data Center 2 Config Server 2 Data Center 3 Config Server 3 Data Center 2 Data Center 3 Primary (shard 2) Query Router (mongos) Primary (shard 3) Query Router (mongos) Secondary (shard 1) Secondary (shard 3) Secondary (shard 1) Secondary (shard 2) • Query routers can be placed at the application server or shard primary. There are typically more mongos servers than application servers (1-to-1 mongos/mongod). • Config servers are a replica set. • Setup provides Redundancy and HA. • Shards spread across data centers. a
  • 58. 58© 2017 Rogue Wave Software, Inc. All Rights Reserved. 58 Shard Keys • A shard key is set by the developer/data modeler which describes a range of values in how a data set should be partitioned. It determines how a collection of documents are spread over partitions. • Data is segregated into chunks by the shard key. The chunks are distributed across shards residing across multiple servers. • Data modelers can struggle with defining a good shard key which can cause problems with performance later on. You cannot change a shard key after sharding a collection. Choose the shard key wisely. You would need to dump and restore the data to repartition the data for a new shard key. Not trivial. Customers commonly pick the wrong shard key and get stuck.
  • 59. 59© 2017 Rogue Wave Software, Inc. All Rights Reserved. 59 Good Shard Keys Good shard keys exhibit 5 properties: • Cardinality: For set A = { “East”, “Central”, “West” } the set has 3 elements giving it a cardinality of 3. This means I can have only 3 chunks the balancer can create. This can limit horizontal scaling in the cluster. Although, high cardinality does not mean the data will be distributed evenly across the cluster. • Write Distribution: Writes should be distributed evenly (as possible) across the cluster. • Query isolation: Majority of queries for the same collection result in targeting one (or a few) particular shard(s). • Reliability: Do some or all documents get affected after one shard goes down? • Index locality: Locality is how the indexes are accessed. Do we have to page in the entire index each time to query?
  • 60. 60© 2017 Rogue Wave Software, Inc. All Rights Reserved. 60 Bad Shard Keys • Data center: This would be a good shard tag, but does not have good cardinality. If we had 3 data centers, then the data can be split up into 3 chunks. There will be large chunks that cannot be split resulting in one shard with more data than the others. • Customer IDs: May not be as well distributed if some customers are larger than others (cardinality). Depends on use case. • Timestamp: Bad write distribution because writes could be targeted more to one chunk than the others. Example: My website activity is greatest between the hours of 10am and 4pm. • Hashed timestamp: An MD5 hash of a timestamp provides random distribution, but would cause the queries to be scattered across the cluster.
  • 61. 61© 2017 Rogue Wave Software, Inc. All Rights Reserved. 61 Ranged Sharding MongoDB provides 3 types of sharding strategies: • Range1/User-defined: Shard key defines a range of values are segmented into sequential chunks. Documents with “close” shard key values are likely to be in the same chunk or shard. For example: A primary key for a collection of documents is an auto-increment integer. 0 – 10 go to shard 1, 11 – 20 go to shard 2, 21 – 30 go to shard 3, and so on. Composite keys are supported with this strategy. Deploy Sharded Cluster using Ranged Sharding 1 https://docs.mongodb.com/manual/tutorial/deploy-sharded-cluster-ranged-sharding/ Shard 1 { x: minKey } Shard 2 Shard 3 Shard 4 { x: maxKey }{ x: 11} { x: 21} { x: 31} Chunk 1 1 2 Chunk 2 7 8 Chunk 1 11 17 Chunk 1 21 Chunk 2 29 30 Chunk 1 35 38
  • 62. 62© 2017 Rogue Wave Software, Inc. All Rights Reserved. 62 Hashed Sharding • Hashed1: Subset of range sharding. MD5 hash is applied on the key to ensure data is spread randomly within MD5 range of values. Composite keys are not supported. Deploy Sharded Cluster using Hashed Sharding 1 https://docs.mongodb.com/manual/tutorial/deploy-sharded-cluster-hashed-sharding/ Shard 1 Shard 2 Shard 3 Shard 4 { x: maxKey }{ x: d3d9446802...} Chunk 1 c4ca4238a0b923820dcc509a 6f75849b MD5 Hash Function 1 { x: minKey } Chunk 1 d3d9446802a44259755d38e 6d163e820 c20ad4d76fe97759aa27a0c 99bff6710 Chunk 1 8e296a067a37563370ded05 f5a3bf3ec Chunk 1 Chunk 2 { x: 34173cb38f07f...} { x: c0c7c76d30bd3...} a
  • 63. 63© 2017 Rogue Wave Software, Inc. All Rights Reserved. 63 U.S. U.S. U.S. U.S. U.S. Tag-Aware Shard Keys • Subset of shards are tagged and assigned to a sub-range of the shard-key. This creates zones where each zone with one or more shards. Most relevant data should reside on shards geographically closest to the application servers. AMER Secondary Primary Secondary EMEA Secondary Primary Secondary APAC Secondary Primary Secondary United States Secondary Primary Secondary ZONE: U.S. Canada Mexico England Germany y France India China Japan TAGS: U.S. U.S. U.S. U.S. EMEA/APAC Secondary Primary Secondary a
  • 64. 64© 2017 Rogue Wave Software, Inc. All Rights Reserved. 64 Which type of shard key should I use? Use Cases • Scalability: Range or hash • Geo-locality: Tag-aware • Hardware Optimization: Tag-aware • Low Recovery Times: Range or hash Combining tag-aware with range or hash is acceptable. A good shard key usually has the following (in this order): • A random component like a universally unique identifier (UUID). – Note: Shard key sizes are limited to 512 bytes. WiredTigerIndex::insert: key too large to index, failing 1060 { : new Date(1500393090062)… • An increasing sequence like a timestamp.
  • 65. 65© 2017 Rogue Wave Software, Inc. All Rights Reserved. 65 Links MongoDB Documentation https://docs.mongodb.com/ MongoDB Limits and Thresholds https://docs.mongodb.com/manual/reference/limits/ MongoDB Production Checklist https://docs.mongodb.com/manual/administration/production-checklist- operations/ Who uses MongoDB? https://www.mongodb.com/who-uses-mongodb MongoDB IRC Chat irc://irc.freenode.net/#mongodb
  • 66. 66© 2017 Rogue Wave Software, Inc. All Rights Reserved. 66 Links API Documentation for MongoDB Drivers https://api.mongodb.com/ MongoDB Integration and Tools https://docs.mongodb.com/ecosystem/tools/ MongoDB Drivers (All) https://docs.mongodb.com/ecosystem/drivers/ mongo Keyboard Shortcuts https://docs.mongodb.com/manual/reference/program/mongo/#keyboard- shortcuts Back Up a Sharded Cluster with Database Dumps https://docs.mongodb.com/manual/tutorial/backup-sharded-cluster-with- database-dumps/
  • 67. 67© 2017 Rogue Wave Software, Inc. All Rights Reserved. 67 Links MongoDB in Action Manning Forum http://manning-sandbox.com/forum.jspa?forumID=677 MongoDB Videos https://www.mongodb.com/presentations/ MongoDB University https://university.mongodb.com/ MongoDB Events https://www.mongodb.com/events/ Community Support Forum http://groups.google.com/group/mongodb-user
  • 68. 68© 2017 Rogue Wave Software, Inc. All Rights Reserved. 68 Links ServerFault https://serverfault.com/questions/tagged/mongodb Stack Overflow https://stackoverflow.com/questions/tagged/mongodb
  • 69. 69© 2017 Rogue Wave Software, Inc. All Rights Reserved. 69 Q/A Questions?
  • 70. 70© 2017 Rogue Wave Software, Inc. All Rights Reserved. 70 Thank you! Be inspired to do something BIG!
  • 71. 71© 2017 Rogue Wave Software, Inc. All Rights Reserved. 71 Slides Removed Due to Time Constraints
  • 72. 72© 2017 Rogue Wave Software, Inc. All Rights Reserved. 72 Security: Roles • Built-In Roles https://docs.mongodb.com/manual/reference/built-in-roles/ – userAdminAnyDatabase: Complete access – backup: Ability to run mongodump • User-Defined Roles
  • 73. 73© 2017 Rogue Wave Software, Inc. All Rights Reserved. 73 Data Archival Backing up and restoring MongoDB1 • Best for replica set/sharded cluster – MongoDB Cloud Manager (Enterprise Advanced only) – Ops Manager: On-premise (Enterprise Advanced only) • mongodump/mongorestore – Not the best solution for larger databases. – Does impact mongod performance while running. • File system snapshot – Snapshots: Not specific to MongoDB • More efficient but volume must support snapshots (Logical Volume Manager on Linux) • Must have journaling enabled and journal must be on the same logical volume. Journaling is enabled by default on 64-bit builds (--journal). • Sharded cluster backup is more difficult: “disable the balancer and capture a snapshot from every shard as well as a config server at approximately the same moment in time” – cp or rsync Remember to test your backup and restore process no matter what strategy you choose. 1 https://docs.mongodb.com/manual/core/backups/
  • 74. 74© 2017 Rogue Wave Software, Inc. All Rights Reserved. 74 Data Archival Journaling • Journaling is enabled by default. • Prevents data corruption. • Every write is flushed to the journal every 100 milliseconds. • Journaling can be disabled to increase performance; however, replication should be enabled to ensure durability.
  • 75. 75© 2017 Rogue Wave Software, Inc. All Rights Reserved. 75 Data Archival mongodump/mongorestore1 • Excludes the local database in its output. • Requires the backup role if security is enabled. • mongod must be running since storage engines have different file layouts. • For sharding clusters, mongos (sharding router) must be running. • Suggestion: Create a specific operating system user to run the backup and only give access to the backups to the backup user. 1 https://docs.mongodb.com/manual/tutorial/backup-and-restore-tools/
  • 76. 76© 2017 Rogue Wave Software, Inc. All Rights Reserved. 76 Data Archival Examples: Export and import all databases (excluding local and admin)1: mongodump mongorestore <location of dump> Drop all documents before inserting: mongorestore --drop <location of dump> Export a database: mongodump --db <database> Export a specific database and collection: mongodump --db <database> --collection <collection> Restore a specific database and collection: mongodump --drop --db <database> --collection <collection> <location of .bson file of collection> Export a specific database and collection with security enabled: mongodump --db <database> --collection <collection> --username <username> --password <password> 1 https://docs.mongodb.com/manual/tutorial/backup-and-restore-tools/
  • 77. 77© 2017 Rogue Wave Software, Inc. All Rights Reserved. 77 Data Archival • Protecting the backups – Backups are stored in BSON/binary format. – Binary does not mean encrypted. – bsondump can be used to read backups: <$MONGO_ROOT>/bin/bsondump
  • 78. 78© 2017 Rogue Wave Software, Inc. All Rights Reserved. 78 Indexes Query Selectors $in and $all can take advantage of indexes. $ne and $nin cannot unless used with another operator. Cannot use index and uses a collection scan instead: {timeframe: {$nin: ['morning', 'afternoon']}} Can use index: {timeframe: 'evening'} JavaScript Query Operators This statement cannot use an index and is singled-threaded: db.reviews.find({'$where': "this.helpful_votes > 3"}) Regular Expressions MongoDB is compiled with PCRE (Perl Compatible Regular Expressions) Unless a prefix-style query is used then a regular expression query cannot use an index. Example of a prefix-style query “starts with”: db.users.find({'last_name': /^Sm/}) Using case-insensitive flag against an index field nullifies the use of an index because indexes are case-sensitive. Use the new text search capability or use an external search engine instead. Another option is to store the searched field as all lower case and search against that field making it an indexed case- insensitive search. The modulo operator $mod will not use an index.
  • 79. 79© 2017 Rogue Wave Software, Inc. All Rights Reserved. 79 Tools mtools: https://www.mongodb.com/blog/post/introducing-mtools A set of Python scripts to parse and filter MongoDB log files. • mloginfo: Log file analysis. This is the most useful tool within mtools. mloginfo mongod_prod.log –queries • mlogfilter: Filtering tool used to pipe output to other tools. Helps narrow down a search. • mplotqueries: Creates a graph. Has some bugs. mplotqueries mongod_prod.log • mlogvis: Web-based version of mplotqueries.
  • 80. 80© 2017 Rogue Wave Software, Inc. All Rights Reserved. 80 Hardware: Faults and Memory 2 types of page faults: • Soft fault: Moves memory pages from one list to another (e.g. OS file cache). • Hard fault: Mongo accesses the disk to store or retrieve data. This is what we are trying to prevent.
  • 81. 81© 2017 Rogue Wave Software, Inc. All Rights Reserved. 81 Operating System Tuning: Disk Space Check ulimit settings: https://docs.mongodb.com/manual/reference/ulimit/ Recommended settings: • -f (file size): unlimited • -t (cpu time): unlimited • -v (virtual memory): unlimited • -n (open files): 64000 • -m (memory size): unlimited • -u (processes/threads): 64000
  • 82. 82© 2017 Rogue Wave Software, Inc. All Rights Reserved. 82 Indexes Index are created immediately when declared. Indexes by default are created in the foreground which is a locking activity on the collection. To create an index in the background1: db.<collection>.ensureIndex({…}, { background: true }) • Background index builds allow reads and writes while the index is being built. • Background index builds take longer and indexes can be larger than if they were built in the foreground. 1 https://docs.mongodb.com/manual/reference/method/db.collection.createIndex/
  • 83. 83© 2017 Rogue Wave Software, Inc. All Rights Reserved. 83 Indexes How do we remove old documents automatically? Time-To-Live (TTL) Indexes Create a TTL index on a collection to expire documents after 30 days: db.<collection>.ensureIndex({ time: 1}, {expireAfterSeconds: 60*60*24*30}) • Only one TTL index is allowed per collection. • The field the TTL index is declared on must be a date type and cannot be the _id field. • expireAfterSeconds specifies how old the document can be before it is removed. Do not be disappointed if a document is not deleted/expired when you expect. • An internal job runs every 60 seconds and queries for documents whose date is older than that interval relative to the current date and time. – This means that documents could live 60 seconds longer than the expiration date. • If you have a large amount of documents that need to expire when you first create a TTL index, then this can cause a bottleneck because deletion is a write activity. Write activities require I/O and locks which require time. It may make sense to delete documents manually in batches and then implement a TTL index. This has less impact.
  • 84. 84© 2017 Rogue Wave Software, Inc. All Rights Reserved. 84 Indexes Forcing an index to be rebuilt: An index rebuild can be triggered by the following: db.<collection>.reIndex() Or by issuing a repair which will cause all indexes to be dropped and recreated: db.repairDatabase() Caveat: Global write lock! 1 Compaction: Compact defragments documents (or releases unused disk space) and rebuilds indexes. Compact is a blocking operation, and run this only during a maintenance window on a primary or an offline secondary in a replica set. To compact a collection: db.runCommand( { compact: ‘<collection’ } ) Note: Compaction does run differently depending on storage engines. d
  • 85. 85© 2017 Rogue Wave Software, Inc. All Rights Reserved. 85 Production Replica Set Configurations • 2 Replicas and 1 Arbiter – Arbiter runs on an application server. – Both replicas get their own machine. • Arbiters – Replica sets vote on a new primary if the old primary goes down. – Lightweight mongod process that break ties if the nodes cannot agree on a new primary. – Use an arbiter if there are an equal number of nodes between data centers. – Does not replicate data. Ok, that’s great. How do we scale horizontally?
  • 86. 86© 2017 Rogue Wave Software, Inc. All Rights Reserved. 86 Indexes Compound Indexes • By having more than one key, it increases selectivity. Therefore, multiple queries can benefit. • Determine if the index really speeds up a query by look at explain(). • Minimize index size and count: Having less indexes can be a performance gain which requires less space and time to build the index. Adding an index for every possible query is not a good idea. This is a balancing act. • Maximize selectivity. Limitations2 • Index key value sizes are limited to 1024 bytes. Do not index too many fields. Do a Object.bsonsize( { … } ) on the key values to help determine the size. • Index key names are limited to 128 characters. • Compound indexes can contain up to 31 fields. • A single collection can have no more than 64 indexes. If you are hitting this limitation, then you are over-indexing and reevaluate your data model.2 https://docs.mongodb.com/manual/reference/limits/#indexes
  • 87. 87© 2017 Rogue Wave Software, Inc. All Rights Reserved. 87 Sharding Deployments Driver Application Query Router (mongos) Query Router (mongos) … Data Center 1 Secondary Primary Secondary Data Center 2 Secondary Primary Secondary Data Center 3 Secondary Primary Secondary Data Center N Secondary Primary Secondary Shard 1 Shard 3Shard 2 … Shard N Principals • A node and server are one and the same. • A primary (writable) and 2 secondary (read-only nodes make up a 3-node replicate set. • Primary and secondary are copies of each other. • Shards reside across multiple servers. • Load and data is distributed across each shard facilitating scaling. a
  • 88. 88© 2017 Rogue Wave Software, Inc. All Rights Reserved. 88 Replication and Replica Sets Two types of replication: Same replication mechanism. A primary node receives writes while secondary nodes read and apply writes asynchronously. • Master-slave: Deprecated. – Uses a manual mechanism to promote a secondary to primary. – oplog is only stored on the master (more on this later). – Supports more than 50 nodes. • Replica Sets: – If the primary goes down, then a secondary is promoted to primary and receives writes. – Transparent recovery. – More sophisticated deployment topologies. – Supports up to 50 nodes. – Recommended for production. – Rollbacks: A write is not considered promoted unless it was written to a majority of member nodes (50%). If the majority cannot accept the write, then all nodes become read-only. – A node replays a journal to determine what writes to apply after a failure.
  • 89. 89© 2017 Rogue Wave Software, Inc. All Rights Reserved. 89 Replication and Replica Sets • Why not use replication? – Hardware load issues: • Working set does not fit in RAM. • Disk speed. • More than half the operations are writes. – Secondary nodes must always have an updated copy of data. • This can be fixed, but it introduces a latency back to the application. • How does replication work? – oplog: Capped collection in the local database containing records of write data. Steps to reproduce the write are stored each time the primary is written to. Each entry has a BSON timestamp to track writes. – Heartbeat: Monitors health and controls failover by pinging every 2 seconds (default).

Editor's Notes

  1. The currently maintained driver is the mongodb extension available from PECL. This driver can be used stand-alone, although it is very bare-bones. You should consider using the driver with the complimentary PHP library, which implements a more full-featured API on top of the bare-bones driver.
  2. mmf: memory-mapped files