As cloud adoption has grown more rapidly in the last decade , how DBA's a can add more value to system and bring in more scalability to the DB server. This talk was presented at Open Source India 2018 conference by Kabilesh and Manosh of Mydbops. They share a few experience and value addition made to customers during their consulting process.
1. Evolution of DBA in the Cloud Era
Presenter by
Manosh Malai
Kabilesh PR
www.mydbops.com info@mydbops.com
2. Kabilesh P.R
Co-Founder/DB Consultant @Mydbops
kabi@mydbops.com
Manosh Malai
Senior Devops and DB/Cloud Consultant @Mydbops
mmalai@mydbops.com
@ManoshMalai
About us
3. Mydbops at a Glance
● Mydbops is on Database Consulting with core specialization on MySQL and MongoDB
Administration and Support. Scaling a few largest MySQL Farms in India and US.
● Mydbops was created with a motto of developing a Devops model for Database
administration.
● We help organisations to scale in MySQL/Mongo and implement the advanced technologies in
MySQL/Mongo.
● Founded in 2015, HQ in Bangalore India, 25 Employees.
5. Agenda
● DBA’s Top responsibilities before cloud & after DBaaS
● Fascinating features in DBaaS
● Where DBA skills are needed in cloud.
● Choosing the right vendor.
● Capacity planning and Cost management
● DBA role in cloud
● Real production use case
● Conclusion
6. DBA’s Top Responsibilities: Before Cloud
Installing, upgrades
and Patching
Fine Tuning
React to DB issues
Backup/Recovery
Ensure DB
Availability
7. After DBaas
Purview of
The DBass
Provider
Installing,
upgrading and
Patching
Monitoring &
Alerting
Some level
Config
Tweaking
Backup
8. Fascinating Feature in DBaaS
Elasticity
● Provisioned IOPS
● Instance resize in minutes (vertical)
Security & Patching
● Host level security is provided by
Vendor
● Host patches is also taken by
vendor
HA
● Multi-Availability Zones
● Cross-Region Replicas
● Scale out globally
Scalability
● Read-replica within region(horizontal)
● Load balancer
10. Database Schema Design and Optimization
Where DBA Skill needed on Cloud?
Experience on Cost Control ($)
Query Optimization
Support on Choosing Cloud Vendor
In-depth knowledge of Capacity planning
Database Schema Design and Optimisation
Expertise in Database Infrastructure Design
11. Provide Cloud Vendor Advice
Lot of Parameters need to be consider before choosing a right Cloud
vendor
13. Database Features & Control
4
Full Customizable RAM,
Disk Size & Instance Type
5
Replication Technology
1
DB Super User Access
2
SSH Access To
Underlying Machine
3
Plugin Extension
Support
6
Custom Read Replica
Set Config
14. Backup and Security
4
Encryption at Rest
5
Control Access With
Security Group
1
Custom Schedule For
Backup
2
Selective Restores
3
Continuous
Backup/Point-In-Time
Recovery
15. Support and Monitoring
4
Configuration Management
& Tuning
5
24X7 Support, 1 Hour
Response SLA
1
Access to DB Server
Logs
2
Monitoring OS & DB
Metrics
3
Slow Query Analysis
16. Capacity Planning and Cost Management
● Capacity planning for the cloud isn’t about getting an exactly right answer.
● Aim to avoid underestimation or overestimation.
● Check with current footprint of the database including any advance feature such as standby and clustering ?
Do Right-Sizing
● Measuring and Managing ROI
○ You can't manage what you can't measure.
○ Before scaling the resource we need to answer the question what, where, when, why and how it
meet the business requirement.
○ we know ROI is then a matter of acting on that data.
17. Database Infrastructure Design
● Understand your system limits.
● Estimate your application throughput at Peak.
● Distribute the load.
● Do not hard code endpoints, Use LB / DNS.
● Group servers based on business flows, Do not mix
○ Prod
○ Reports
○ Analytics
● Handle connection pooling with care.
● Having decades of data ? Think where it can be handled better
18. Database Schema
● Choose right data type the data requires.
● Aware of data type limitations with Texts, Blobs, Json when used
● Choose right storage engine, format
● Partitioning table is not only a DB activity
● Primary key / Natural key ?
● Get the normalisation right
Interesting Read
http://www.ovaistariq.net/199/databases-normalization-or-denormalization-which-is-the-better-technique/#.W7yq5kUzYYI
19. Query optimization
Facts
● Index based on query and it’s fingerprint.
● Scans reduction can help you to decide index addition.
● Composite keys can be beneficial
● Complex queries can be broken down as SP’s.
● Don't add index on all your columns
● Adding too many index can slow down writes
20. Query optimization
Subqueries
● Rows returned by subquery decides usage.
● Few can be easily rewritten into joins.
long running queries
● sometimes it may require most of the data to be processed. (Summaries can help to speed up process).
● Locking can be a cause, Identify Cause.
● MySQL JSON explain is verbose and detailed.
21. Use Case
Our X is an e-commerce website and multi-brand top selling retailer selling beauty and wellness products from all
the leading brands. X is committed to bring to you the best and the biggest brands, an incredible shopping
experience and excellent customer service.
23. Problem Statements:
● Slow response from DB for both reads & Writes
● CPU usage in DB boxes rages to 85%.
● Unable to handle peak time load
● High connection usage
● DB locking at high Concurrency
26. OPtimized Throughput of Instances provisioned
● Below is the Max throughput for the described instance, during stress test with sysbench load
https://www.slideshare.net/KiranVittalapurThimm/benchmarking-aws-instanceformysqldatabaseserverce
2
Machine Type CPU IO Throughput MAX Iops CPU Speed CPU Capacity
Anticipated
Write QPS
Anticipated
Read QPS
Approximate
Capacity
r4.8xlarge 32 875 37500 2.3 73.6 74210 76953 75582
r4.16xlarge 64 1750 75000 2.3 147 82429 99708 91069
28. Is the full capacity being used ?
Being having this Big server with High capacity, below is some of the usage at the Max.
● Mixed load QPS of 50k
● Max write IO usage at 4500 IOPS
● Max Read IO usage at 1500 IOPS
● CPU usage spiked at 85%
● Connections at a Range of 2k
29. Optimizations Done at DB level
● Complete review of the Systems CPU, IO usage (Audit)
● Query Optimization with Indexing and rewrite (Reduced high CPU and IO usage)
● Schema level optimisation to use the right data-type (Reduced disk & memory Footprint)
● Transaction logs resize based on the incoming write set (Reduced commit latency WRT writes)
● Parameter group changed with Optimised values based on the instance type
Applying all these in a step-by-step fashion, we could see a considerable performance improvement
30. Complete review of the System
Knowing what your system is upto is the best way to optimise based on the workload, We had used
PMM (Percona Monitoring & Management).
● Completely Open Source
● Easy to deploy
● Deeper insights on MySQL and Innodb Internals
● Real Time Performance Stats
● QAN (Query Analytics)
● Integration with Cloud watch.
● Beautiful Dashboards
31. Resolving Slow Writes
● Since with RDS this parameter Innodb_log_file_size remains the same as default of 128MB, Regardless of
workload.
● Redo log should hold at least an hour of transaction
● The hourly writes exceeded by a huge margin.
32. Resolving Slow Writes
● Having small transaction log size can slow down your write performance. Ie., commit latency, by waiting for
transaction logs to get freed up.
● After resizing InnoDB Transaction logs has reduced our write IO contention and CPU usage.
33. Resolving Slow Reads
● Focus on your top 20% queries
● Use the right tool to collect and visualise queries
34. Resolving slow Reads
● Queries were profiled against a test instance, new indexes and query rewrite was proposed.
● We had also identified some Duplicate indexes too.
● After applying new indexes, Query rewrite & also removing the duplicate index we see drastic improve in
performance.
36. Schema level optimisation to use the right data-type & Duplicate Index Removed
Data types should be chosen based the characteristics of the DATA that will be stored,
Choosing the right data-type can help you to reduce your disk,Memory footprint which obviously helps in
faster writes and retrieval of data.
Having too many indexes instead of the right index can lead to duplicate index and which ultimately
reduces the write throughput of your DB, Since the index pages has also to be updated.
With our client X we had seen many duplicate index, which was analysed using the
pt-duplicate-key-checker.
37. Current Utilization and Future Scope
PAST PRESENT FUTURE(Expectation)
100K50K 50K
1 QPS
-
85% 40%2 CPU
8K+Burstable Capacity6K 2.5K3 IOPS