3. What’s a Time Machine
• Rolling back instances/databases/tables to a snapshot
• Implement on Server-Level to support all engines.
• By full image format binary logs
• Currently, it’s a feature of mysqlbinlog tool (with--flashback
option)
4. Why Time Machine
• Everyone may make mistakes, including a DBA.
• After users mis-operating their data, of course, we can
recovery it from the last full backup set and binary logs.
• But if users’ database is too huge, it will cost so much time!
And usually, mis-operation just modify a few data, but we
need to recovery whole database.
5. How Time Machine Works
• As we know, if binlog_format is ROW (binlog-row-
image=FULL in 5.6 and later), all columns’ values are
store in the row event, so we can get the data before mis-
operation.
• Just do following things:
• Change Event Type, INSERT->DELETE, DELETE->INSERT
• For Update_Event, swapping the SET part and WHERE part
• Applying those events from the last one to the first one which
mis-operation happened.
• All the data will be recovered by inverse operations of mis-
oprerations.
6. Done List
• Full DML support
• Review table support
• Because users may want to check which part of data is flashbacked.
• GTID support (MariaDB)
• We add GTID event support for MariaDB 10.1
• MySQL 5.6 GTID events support is still working
7. ToDo List
• Adding DDL supports
• For ADD INDEX/COLUMN, or CREATE TABLE query, just drop the
index, column, table when running Flashback.
• For DROP INDEX/COLUMN, or DROP TABLE query, copy or
rename the old table to a reserved database. When Flashback is
running, I can drop the new table, and rename the saved old one
to the original database.
• For TRUNCATE table, I just rename the old table to a reserved
database and create a new empty table.
• Adding a script for time machine.
10. Problem of Async Replication
• Master don’t need to wait the ACK from Slave.
• Slave doesn’t know if it dumped the latest binary logs from
Master.
• When crashed, slave can check if itself is the same with
Master or not by its own.
• So,The main problem is that Slave doesn’t know the
status of Master.
12. Problem of SemiSync
• Master needs to wait ACK from Slave.
• Slave will downgrade to Async when timeout happen.
• If the timeout is too small, timeout will happen frequently.
• If the timeout is too big, Master will often be blocked.
• After network is recovered, Slave should dump the binary logs
generated during timeout. During the time, Slave is still Async.
• When a Master is crashed, Slave doesn’t know if the master is
Async or SemiSync.
• So, Slave still doesn’t know if it’s the same with Master or not
when Master crashed.
• So,SemiSync doesn’t solve the main problem of Async
Repplication.
14. Backgroup & Target
• Backgroup
• SA guarantee the server availability: 99.999%
• NA guarantee the network availability: 99.999%
• So, we can assume when the Master is crashed, network will not
timeout at that time point.
• Target
• Slave can know its status by itself. (the same with Master or not)
• If the data isn’t the same with Master, notice the app&dev to fix the
data, and show the range of lost data.
• Key Point: To avoid Slave's status being unknown!
15. Solve the weak point of SemiSync
• Once SemiSync is timeout, even network is recovered, Slave
still need to dump the binary logs generated during timeout,
under Async.
• If SemiSync is timeout, we give up the binary logs during timeout,
Master just send the latest position & logs. What will happen?
• When the network is down, the Slave will always know the latest
position on Master.
• So, Slave can know if its data is the same with Master or not.
• But, if Slave just dump the latest data, how to get the data
during the time when network is down?
• Async replication can dump the continuous binaray logs
• So we can use Async replication to do the full log apply.
16. Combine the Async and SemiSync
• Async Replication(Async_Channel)
• Dumping continuous binary logs to guarantee that the Slave’s logs
are continuous.
• Applying for logs after received immediately.
• SemiSync Replication(Sync_Channel)
• Dumping the latest binary logs to guarantee that the Slave knows
the latest position of Master.
• Will not apply logs after received, just save the logs & position and
outdated logs will be purged automatically.
• Analyzing consistency
• Comparing the received logs positions with these two channels.
18. How to create two channels(1)
• Multi-Source replication can create N channels in one Slave.
• Problem:When Master received two dump requests from the
same Server-ID servers, it will disconnect the previous one.
• Solve:We set Sync Channel as a special Server-ID (0xFFFFFF).
19. How to create two channels (2)
• Problem:There are a SemiSync and a non-SemiSync Channel
in one Slave, but the SemiSync settings are global.
• Solve:We moved SemiSyncSlave class to Master_info.
20. Analyzing consistency
• Using the GTID
• Using the Log_file_name and Log_file_pos
• How to judge, check the following pictures
22. CASE 1: Needn’t Fix
• GTIDs between Sync and Async Channel are the same.
23. CASE 2: Can’t Fix
• Exist broken gap between Sync and Async Channel.
24. CASE 3: Can Repair
• Combine two channel’s logs, it’s continuous.
25. How to Repair
• We wait for the Async Channel till it applied for all logs that
received. Then start the SQL THREAD of Sync Channel.
• GTID will filter the event that applied by Async Channel.
• We provide the REPAIR SLAVE command to do these things
automaticially.
27. Why we need multi-source
• OLAP
• Most of users using MySQL for data sharding.
• Multi-Source can help users to combine their data from
sharding instances.
• If you are using Master-Slave for backup, Multi-Source can
help you to backup many instances into one, it’s easy to
maintain.
29. What changes in the code
• Move Rpl_filter/skip_slave_counters into Master_info.
• Every channels will create a new Master_info.
• Every replication-related function will use the special
Maser_info.
• We create a Master_info_index class to maintain all
Master_info.
30. The Syntax
• CHANGE MASTER ["connection_name"] ...
• FLUSH RELAY LOGS ["connection_name"]
• MASTER_POS_WAIT(....,["connection_name"])
• RESET SLAVE ["connection_name"]
• SHOW RELAYLOG ["connection_name"] EVENTS
• SHOW SLAVE ["connection_name"] STATUS
• SHOW ALL SLAVES STATUS
• START SLAVE ["connection_name"...]
• START ALL SLAVES ...
• STOP SLAVE ["connection_name"] ...
• STOP ALL SLAVES ...
31. The Syntax
• set @@default_master_connection='';
• show status like 'Slave_running';
• set @@default_master_connection=‘connection';
• show status like 'Slave_running';
34. Why we need TMM
• MySQL’s memory limitation just work fine on Storage Engine
• For example in InnoDB: innodb_buffer_pool_size
• In the Server we can limit only some features’ memory, like
sort_buffer_size, join_buffer_size.
• But for big Query,the most of memory cost is from
MEM_ROOT,no option to limit it.
• So when mysqld process used too many memory, we don’t
know which thread is the reason.
• Then we don’t know which thread to kill to release the
memory.
35. How to solve it
• Add a hack in my_malloc.
• Record the malloc size and which thread applied for this
memory
• Calculate a total memory size of all threads.