Bart Oles - Severalnines AB
Database performance affects organizational performance, and we tend to look for quick fixes when under stress. But how can we better understand our database workload and factors that may cause harm to it? What are the limitations in MongoDB that could potentially impact cluster performance?
In this talk, we will show you how to identify the factors that limit database performance. We will start with the free MongoDB Cloud monitoring tools. Then we will move on to log files and queries. To be able to achieve optimal use of hardware resources, we will take a look into kernel optimization and other crucial OS settings. Finally, we will look into how to examine performance of MongoDB replication.
4. Copyright 2017 Severalnines AB
Free to download
Initial 30 days Enterprise trial
Converts into free Community Edition
Enterprise / paid versions available
6. Agenda
Copyright 2018 Severalnines AB
● Why performance cheat sheet?
● Free monitoring for performance
● Logging database operations
● Capturing queries - database profiler
● Checking operating system parameters
● Working with the Explain Plan
● Measuring replication lag performance
● Live demo
● Other
8. Performance complexity
● Services running on multiple hosts
○ Replication
○ Sharding
○ Clustering
● Multiple Data Centers
○ Cloud and/or On-prem
○ Disaster Recovery
● Load balancing and Single point of contact IP
○ For workload management, HA, query caching...
○ E.g., HAProxy, KeepAlived/VIP, ProxySQL, MaxScale
9. Why we need a database monitoring system
● Data is a key asset of the organisation
● Databases are important as they manage the source of truth
● Database is complex - IO, transaction engine, query optimizer,
caches, locks, versioning,...
● Very dependent on OS, IO subsystems, network
● Distribution across multiple instances makes it even more complex
● Good database monitoring helps make sense of all that
10. MongoDB
● Similar to most other databases
● Understand the utilization of the hardware
● Capacity planning
● Determine the type of an issue
● I/O related?
● CPU related?
● Network related?
11. Why we need a database monitoring system
● CPU utilization (should I add more nodes to the cluster?)
● Network utilization (am I running out of bandwidth?)
● Ping (how badly latency affects my MongoDB cluster?)
● Disk throughput and IOPS (am I within my hardware limits?)
● Disk space (do I have to plan for larger disks?)
● Memory utilization (do I suffer from a memory leak?)
12. Why we need a database monitoring system
● Storage engine specific
● MMAP
● WiredTiger
● MongoRocks
● Insight in how the engine performs
● Internal congestion
13. Why we need a database monitoring system
● CPU, IO or lock related
● Outcome: similar to Galera
● Lagging behind could cause a full sync
14. Performance monitoring vs metrics
● Similar to most other databases
● Throughput of the cluster
● Relate throughput to cluster performance
● Determine the type of an issue
● Request spikes?
● Write amplification related?
● Queueing?
15. Monitoring vs Trending
● Monitoring system (i.e. Nagios)
● Checks if services are healthy
● Sends pages
● Trending system (i.e. Cacti, Graphite)
● Collects metrics
● Generate graphs
● Availability
● Do more than just opening a
connection
● Measure true status of nodes and
cluster
● Test read/write
● Open essential databases and
collections
● Keep an eye on the replication lag
● Increase oplog size?
● Check the full topology
16. Monitoring vs Trending
● Trending
● Plot trends of key (performance) metrics
● Find problems before they arise
● Pre-emptive problem management
● Trending tools
● Granularity of sampling
● More datapoints = better
● Periodical (daily/weekly) healthchecks
● Insight into all aspects of the database operations
● Post mortem and proactive monitoring
● Capacity planning
19. Logging database operations
Operation Execution Times (READ, WRITES, COMMANDS)
Disk utilization (MAX UTIL % OF ANY DRIVE, AVERAGE UTIL % OF ALL DRIVES)
Memory (RESIDENT, VIRTUAL, MAPPED)
Network - Input / Output (BYTES IN, BYTES OUT)
Network - Num Requests (NUM REQUESTS)
Opcounters (INSERT, QUERY, UPDATE, DELETE, GETMORE, COMMAND)
Opcounters - Replication (INSERT, QUERY, UPDATE, DELETE, GETMORE,
COMMAND)
Query Targeting (SCANNED / RETURNED, SCANNED OBJECTS / RETURNED)
Queues (READERS, WRITERS, TOTAL)
System Cpu Usage (USER, NICE, KERNEL, IOWAIT, IRQ, SOFT IRQ, STEAL,
GUEST)
20. Why we need a database monitoring system
db.getFreeMonitoringStatus()
{ resource: { cluster : true }, actions: [ "setFreeMonitoring",
"checkFreeMonitoringStatus" ] }
db.serverStatus()
21. Why we need a database monitoring system
{
"state" : "enabled",
"message" : "To see your monitoring data, navigate to the unique URL
below. Anyone you share the URL with will also be able to view this page.
You can disable monitoring at any time by running
db.disableFreeMonitoring().",
"url" :
"https://cloud.mongodb.com/freemonitoring/cluster/XEARVO6RB2OTXEAHKHLKJ5V
6KV3FAM6B",
"userReminder" : "",
"ok" : 1
}
23. Logging database operations
db.getLogComponents()
Log messages include many components. This is to provide a functional categorization of the
messages. For each of the component, you can set different log verbosity. The current list of
components is:
ACCESS, COMMAND, CONTROL, FTD, GEO, INDEX, NETWORK, QUERY, REPL_HB, REPL,
ROLLBACK, REPL, SHARDING, STORAGE, RECOVERY, JOURNAL, STORAGE, WRITE.
24. Examples
To list the 10 most recent:
db.system.profile.find().limit(10).sort(
{ ts : -1 }
).pretty()
To list all:
db.system.profile.find( { op:
{ $ne : 'command' }
} ).pretty()
To list all:
db.system.profile.find(
{ ns : 'mydb.test' }
).pretty()
25. MongoDB logging
/var/log/mongodb/mongod.log
You can find MongoDB configuration file at /etc/mongod.conf.
Here is sample data:
2018-07-01T23:09:27.101+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to
node1:27017
2018-07-01T23:09:27.102+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Failed to connect
to node1:27017 - HostUnreachable: Connection refused
2018-07-01T23:09:27.102+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Dropping all
pooled connections to node1:27017 due to failed operation on a connection
2018-07-01T23:09:27.102+0000 I REPL_HB [replexec-2] Error in heartbeat (requestId: 21589) to
node1:27017, response status: HostUnreachable: Connection refused
2018-07-01T23:09:27.102+0000 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to
node1:27017
27. MongoDB Oplog
● Similar to MySQL binary logs
● Oplog: a special collection
● Limited size
● Eviction of transactions (FIFO)
● Replication window
● Time between first and last transaction in the oplog
28. MongoDB Connections
● Similar to MySQL when handling connections
● Client drivers may support connection pooling
● Multiple non-blocking queries can use the same
connection
● Spawns new connections when low on threshold
● Increase of connections
● Locking issues
● Application request bursts
30. Checking operating system parameters -
network
net.core.somaxconn (increase the value)
net.ipv4.tcp_max_syn_backlog (increase the value)
net.ipv4.tcp_fin_timeout (reduce the value)
net.ipv4.tcp_keepalive_intvl (reduce the value)
net.ipv4.tcp_keepalive_time (reduce the value)
40. Measuring replication lag performance
rs.printSlaveReplicationInfo()
rs.status()
● Replication Metrics
● Throughput of the replication
● Durability of the oplog
● Replication lag
● Comparable to Galera replication
● Quorum based
● At least one secondary needs to acknowledge