Bart Oles - Severalnines AB
Organizations need an appropriate disaster recovery plan to mitigate the impact of downtime. But how much should a business invest? Designing a highly available system comes at a cost, and not all businesses and indeed not all applications need five 9's availability.
We will explain fundamental disaster recovery concepts and walk you through the relevant options from the MySQL & MariaDB ecosystem to meet different tiers of disaster recovery requirements, and demonstrate how to automate an appropriate disaster recovery plan.
4. Copyright 2017 Severalnines AB
Free to download
Initial 30 days Enterprise trial
Converts into free Community Edition
Enterprise / paid versions available
8. Agenda
- How it is implemented?
- What is encrypted:
- Tablespaces?
- General tablespace?
- Double write buffer/parallel double write buffer?
- Temporary tablespaces? (KEY BLOCKS)
- Binlogs?
- Slow/general/error logs?
- MyISAM? MyRocks? X?
- Performance overhead.
- Backups?
- Transportable tablespaces. Transfer key.
- Plugins
- Keyrings in general
- Key rotation?
- General-Purpose Keyring Key-Management
Functions
- Keyring_file
- Is useful? How to make it profitable?
- Keyring Vault
- How does it work?
- How to make a transition from keyring_file
10. What is Disaster Recovery?
● Failures
○ Operational (power, network, IT systems)
○ Natural (hurricane, flood, fire, earthquake)
○ Human caused (operator error, malicious
activity, terrorism)
● Drivers
○ How fast can we get up and running
○ What data have we lost
○ How can we reduce risk
Policies, tools & procedures that ensure your data is secure
and protected in case of an outage or serious catastrophe
13. “We Offer 100% Availability, But We Exclude… “
● Planned outages
○ e.g., server or network maintenance
● Failure of network, power or facilities
delivered by an upstream provider
● DOS attacks, hacker activity or other
malicious events
● Acts of God
○ e.g., weather related - hurricane, flood
27. Backup with No Hot Site
● Physical vs Logical backup
○ High impact on RTO
● Combine Full & Incremental
○ PITR-compatible to reduce RPO
● Schrödinger’s backup
○ “The condition of any backup is
unknown until a restore is attempted”
● Encryption
● Keep a copy of latest backup in active site
28. Backup Retention
● Local Server
○ Up to 1 week
● Local Datacenter
○ Up to 2 weeks
● Remote Datacenter
○ Up to 4 weeks
○ Plus keep monthly backups &
annual backups as required
29. Backup with Hot Site
● We can reinstall DBs and apps
from scratch and restore data
● Recovery time predictable
● In case of AWS, pre-configured
AMIs can be used to quickly
provision the application
environment
30. Asynchronous Replication to Hot Site
● Low RTO
○ ‘Almost current’ data
enables fast failover
● Low RPO
● Add a delayed slave to guard
against operator error
● Backup still important
31. Synchronous Replication to Hot Site
● Highest tier of DR
○ Minimal RPO and RTO
● Data on primary site and hot
sites have same transactional
state
○ Failover instantaneous
and automatic
● Failure detection time is main
culprit that adds to RTO
● 3 sites to avoid network
partitioning
36. Failover the New Normal
● Failover used to be a complex
procedure
○ Required lot of staff
○ Required availability of VPs /
technology heads
● In modern distributed infrastructure,
design for failure
● Considerations
○ How many sites?
○ How to route users to sites?
○ What goes into a failover?