Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

4

Share

Download to read offline

MySQL topology healing at OLA.

Download to read offline

The talk will elaborate on how to detect and Heal your MySQL topology with MySQL Orchestrator .This talk was delivered on Mydbops database Meetup on 27-04-2019 by Anil Yadav, Lead Database Engineer with OLA and Krishna Ramanathan Database Administrator III with OLA.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

MySQL topology healing at OLA.

  1. 1. MySQL Topology Healing Ola Story Anil Yadav Krishna R
  2. 2. Motivation ● Uncertainties in Public Cloud ● Business Continuity ● Data Consistency
  3. 3. What We Had
  4. 4. High availability objectives ● How much outage time can you tolerate? ● How reliable is crash detection? Can you tolerate false positives (premature failovers)? ● How reliable is failover? Where can it fail? ● How well does the solution work cross-data-center? On low and high latency networks? ● Can you afford data loss? To what extent?
  5. 5. Possible Solutions
  6. 6. MHA ● Pros ■ Adoption ■ Data Healing ● Cons ■ Dormant community ■ Topology Awareness ■ Compatibility with Maxscale
  7. 7. MaxScale ● Pros ○ Resident in our Architecture ○ Pluggable ○ Backed By MariaDB ● Cons ○ Latency ○ Topology Awareness ○ No Community
  8. 8. ProxySQL ● Pros ○ Feature Rich ○ Vibrant Community ○ Percona Backed ● Cons ○ Latency ○ Topology awareness
  9. 9. The Chosen One ● MySQL Orchestrator ○ Pros ■ Adoption ■ Topology Awareness ■ Large Installations ● Booking.com ● Github ○ Cons ■ Needs GTID or MaxScale for healing
  10. 10. Building Blocks ● MySQL Orchestrator ● MaxScale Binlog Servers ● Semi Sync Replication ● NVme Storage
  11. 11. Orchestrator In Action ● Pre-Failover Process ● Healing ● Post-Failover Process
  12. 12. orchestrator.conf.json "FailureDetectionPeriodBlockMinutes": 5, "RecoveryPeriodBlockSeconds": 1800, "RecoveryIgnoreHostnameFilters": [‘slave’], "RecoverMasterClusterFilters": ["orch-master"], "RecoverIntermediateMasterClusterFilters": ["orch-master"], "OnFailureDetectionProcesses": [ "echo 'Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}, We dont panic' >> /usr/local/orchestrator/recovery.log","/eni_modules/orch_sendmail.py 'Master {failedHost} detected for {failureType}'" ], "PreFailoverProcesses": [ "echo 'Will recover from {failureType} on {failureCluster}, Failed Host is : {failedHost}' >> /usr/local/orchestrator/recovery.log","/eni_modules/eni_detach.sh {failedHost} {failureType}>> /usr/local/orchestrator/recovery.log" ], "PostFailoverProcesses": [ "echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}, Recovered from faliure>> /usr/local/orchestrator/recovery.log" ], "PostUnsuccessfulFailoverProcesses": [], "PostMasterFailoverProcesses": [ "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /usr/local/orchestrator/recovery.log","/eni_modules/eni_attach.sh {failedHost} {successorHost} >>/usr/local/orchestrator/recovery.log" ], "PostIntermediateMasterFailoverProcesses": [ "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /usr/local/orchestrator/recovery.log" ],
  13. 13. ● Read only set On MySQL Master. ● ENI detached from the Master through AWS CLI. ○ This prevents the chances of split-brain ● Connections are killed. Pre-Failover Process
  14. 14. Healing ● The most ahead binlog server is chosen ● Other binlog servers are grouped under it ○ This makes the topology consistent
  15. 15. Healing ● The new candidate master is chosen ○ This happens through “PromotionIgnoreHostnameFilters” setting, eg : "PromotionIgnoreHostnameFilters": ["slave","lytic","backup"] ● The new Master’s binlog is flushed and the binlog servers are pointed under it
  16. 16. Post-Failover Process ● ENI is attached to the new master through AWS CLI. ● Connections can be seen on the new master at this point. ● This marks the end of the recovery process.
  17. 17. Challenges ● Orchestrator’s upstream does not support Maxscale Binlog servers ● Had to move to the previous version ○ https://github.com/outbrain/orchestrator ● A dead master because of Ec2 failure can reach the state - “checkAndRecoverUnreachableMasterWithStaleSlaves”. ● It was patched to arrive at the state - “checkAndRecoverDeadMaster” ● Orchestrator’s force takeover was failing, so it was patched to follow the same path as a “DeadMaster” ● The forked branch with these changes is at - https://github.com/varunarora123/orchestrator
  18. 18. Demo
  19. 19. Questions ?
  20. 20. We are expanding our team. Reach us out @anil.yadav1@olacabs.com / @krishna.r@olacabs.com
  • a8888dongdong

    May. 2, 2020
  • GautamSomani

    Oct. 22, 2019
  • bsj45

    Oct. 21, 2019
  • raghavendrakn1

    Jul. 20, 2019

The talk will elaborate on how to detect and Heal your MySQL topology with MySQL Orchestrator .This talk was delivered on Mydbops database Meetup on 27-04-2019 by Anil Yadav, Lead Database Engineer with OLA and Krishna Ramanathan Database Administrator III with OLA.

Views

Total views

578

On Slideshare

0

From embeds

0

Number of embeds

286

Actions

Downloads

10

Shares

0

Comments

0

Likes

4

×