2. About me
• Name: Lixun Peng
• Location: Hangzhou, China
• Occupation: Staff Database Kernel Engineer @ Alibaba Cloud
• Interests: MySQL Replication & InnoDB
• Experience:
In the first, I worked as a DBA
Then, I began to modify code, in order to better use
Gradually I became a MySQL Kernel Engineer
3. Agenda
• The problem about Async/Semi-Sync
• How to solve these problems
• How to implement Double-Sync
• How to use Double-Sync
• Several cases
4. Agenda
• The problem about Async/Semi-Sync
• How to solve these problems
• How to implement Double-Sync
• Several cases
5. Problem of Async Replication
• Master doesn’t have to wait ACK from Slave.
• Slave doesn’t know if it dumps the latest binary logs.
• When Master crashes, slave can’t tell if it catches up Master.
• The major problem is slave doesn’t know master’s status.
7. Problem of SemiSync
• Master has to wait ACK from slave.
• Slave will downgrade to async when timeout happens.
• If timeout setting is too small, timeout happens too often.
• If timeout setting is too big, master blocks a lot.
• Slave dump binary logs generated during timeout
asynchronously, after it recover from network failure.
• If Master crashes, slave doesn’t know how replication works
(Async or SemiSync).
• In this case, slave still doesn’t know if it dumps the latest
binary logs.
• Conclusion is SemiSync doesn’t solve the major problem .
10. Background & Target
• Background
• SA team guarantee the server availability: 99.999%
• Net Ops team guarantee the network availability: 99.999%
• Assuming master and network doesn’t fail at the same time.
• Target
• Slave knows if it catch up master.
• Slave knows how data in master side it doesn’t have.
• Key Point: Clarify Slave's status!
11. Agenda
• The problem about Async/Semi-Sync
• How to solve these problems
• How to implement Double-Sync
• Several cases
12. Solve the weak point of SemiSync
• Even network recover after failure, slave still has to dump the
binary logs generated during timeout asynchronously.
• If timeout happens and slave gives up the binary logs generated
during timeout, what will happen afterwards if master only send the
latest position & logs?
• When network is down, slave always knows the latest position.
• Slave can know if its data is the same with Master or not.
• How to catch up data modification when network is down?
• Async replication can still dump binary logs
• So we can use Async replication to do a full log apply.
13. Combine the Async and SemiSync
• Async Replication (Async Channel)
• Dumping continuous binary logs from master.
• Applying logs immediately after slave receives them.
• SemiSync Replication(Sync Channel)
• Dumping the latest binary logs and position.
• Not applying logs immediately. Expired logs are being purged
automatically.
• Analyzing Consistency
• Comparing logs and position from two channels.
16. Agenda
• The problem about Async/Semi-Sync
• How to solve these problems
• How to implement Double-Sync
• Several cases
17. How to create two channels(1)
• Multi-Source replication enables N channels in one slave.
• Problem: when master received two dump requests from the
same server-id servers, it disconnects the previous one.
• Solution: set up special Server-ID (0xFFFFFF) for Sync Channel.
18. How to create two channels (2)
• Problem: there are a SemiSync and a non-SemiSync Channel
in one slave, but the SemiSync settings are global.
• Solution: move SemiSyncSlave class to Master_info.
19. Analyzing consistency
• Using the GTID
• Using the Log_file_name and Log_file_pos
• Learn the process by checking the following pictures J
21. Agenda
• The problem about Async/Semi-Sync
• How to solve these problems
• How to implement Double-Sync
• Several cases
22. CASE 1: Needn’t Fix
• The GTID between Sync and Async Channel are the same.
23. CASE 2: Can’t Fix
• Exists broken gap between Sync and Async Channel.
24. CASE 3: Can Repair
• Combine two channel’s logs to make logs continuous.
25. How to Repair
• Slave waits for the Async Channel to apply all the logs it
receives, then start the SQL THREAD of Sync Channel.
• GTID filters the events which have been applied by Async
Channel.
• A REPAIR SLAVE command is provided to do things
automatically.
26. FAQs (1)
• Q1: Will Alibaba release this feature?
• A1: Of course! Alibaba will release all the patches.
• Q2: When Alibaba release the source codes?
• A2: Check AliSQL’s roadmap.
• Q3: How can I access AliSQL’s source codes?
• A3: https://github.com/alibaba/AliSQL Currently the project is
private. If you want to access it, please email me to provide
your GitHub account.
27. FAQs (2)
• Q4: What’s the difference between 2 Semi-Sync Slaves and
double sync replication?
• A4: In fact they do the same job. Performance is pretty much
the same too. But double sync replication saves one more
slave than 2 Semi-Sync Slaves architecture. When the number
of MySQL servers grows, it will save lots of money.