Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
分析mysql acid设计实现
blog- xiaorui.cc
author- rfyiamcool
innodb 基本结构
innodb acid 概念
innodb buffer pool
mvcc
各种延伸
锁, 各种锁
List
innodb的抽象结构
tablespace
leaf node segment
no-leaf segment
rollback segment
…
extent
…
page
…
row row
row row
row row
trx id...
b+tree 数据结构
15| |56| | 77
15 | |30| | 45 … …
22 row
20 row
17 row
16 row
[16, 17, 20, 22]
31 row
35 row
44 row
32 row
[31,...
file system vs disk
disk - - - > 512 bytes
file system - - -> 4k
innodb engine - - - > 16k
size不同如何上下对齐?
partial page write ?
解决办法是 doublewrite buffer + doublewrite
doublewrite 性能影响? 5% - 10%
dirty page
dirty page
double write buffer (2m)
double write | double write
数据
文件
redo log
crash recovery !!!
what is acid
atomicity
consistency
isolation
durability
atomicity
一个事务为一个原子 不可分割
要么成功 要么失败
consistency
经过一系列改动及异常崩溃,最后的结果是我要的.
最终一致性
强一致性
isolation
mvcc
隔离级别
READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ
SERIALIZABLE
durability
一旦提交成功,那么事务涉及的数据都会持久化
即使系统崩溃,修改的数据也不会丢失
innodb buffer pool
page buffered
page buffered
page buffered
page buffered
page buffered
page buffered
old 37%
new 63%
新数据...
update where blog = ‘xiaorui.cc’
data in buffer pool
Load page to buffer pool
update data and mark dirty page !!!
yes !
no...
innodb buffer pool
lru
dirty page
free list
flush list
一些概念总览
redo
undo
checkpoint
purge
if not checkpoint
写操作都是在buffer pool进行,每次提交都会顺序io增加redo日志.
buffer pool里面的dirty page越来越多 .
redo 越来越大 , 导致recovery时间过长 .
if checkpoint
当 buffer pool 内存不够用时, 将dirty page刷新到文件.
当redo文件过大又不可循环写入时 , 同上
缩短crash recovery时间
checkpoint + lsn
待清理 Redo Log
p1
p2
p3
buffer pool
flush list
创建checkpoint
lsn: 1500 作为下一个点
p4
p5
刷新到磁盘 !
last checkpoint ...
一个事务过程
1. begin
2. undo log
3. udate where id = xxx;
4. redo buffer
5. redo fsync; binlog; commit
if crash ?
if crash ?
if...
redo
when redo save ?
when commit !
顺序io
page number + block
when redo reduce ?
Ringbuffer full
flush timer run
undo
随机io
是否存入文件, 取决于buffer pool
flush log rule
innodb_flush_log_at_trx_commit = 1 Log
Buffer
File
System
Buffer
Lb_log
file
innodb_flush_log_at_trx_commit...
数据库并发处理的演变
基于表的读写锁
基于索引的读写锁 (分离锁)
mvcc
mvcc
读读非阻塞
读写非阻塞
写写阻塞
一句话, 并发的效果,串行的结果
!!!
又是概念
read view
隐藏字段
是否可见的条件
当前读
快照读
mvcc流程图
trx_id 10
trx_id 15
| row| trx_id | rollback ptr | …
read view list
trx_id
roll ptr
data
undo data
trx_id
roll ptr...
幻读
record lock
gap lock
next gap lock
业务数据不一致
事务 A 事务B
ts begin
ts select stock where ticket=10
ts begin
ts update stock=stock-1
ts commit
ts select stock where...
互斥锁 vs 乐观锁
互斥锁
select * from train where ticket = 10 for update;
乐观锁
while 1:
update train set stock=stock-1 where ticket ...
crash recovery
从doublewrite 修正缺斤少两的page
重做提交的redo日志
回滚执行一半的 未提交的 事务
“ Q & A”
–fengyun rui
Upcoming SlideShare
Loading in …5
×

分析mysql acid 设计实现

517 views

Published on

分析mysql acid 设计实现

Published in: Services

分析mysql acid 设计实现

  1. 1. 分析mysql acid设计实现 blog- xiaorui.cc author- rfyiamcool
  2. 2. innodb 基本结构 innodb acid 概念 innodb buffer pool mvcc 各种延伸 锁, 各种锁 List
  3. 3. innodb的抽象结构 tablespace leaf node segment no-leaf segment rollback segment … extent … page … row row row row row row trx id rollback ptr field ptr … segment extent page row
  4. 4. b+tree 数据结构 15| |56| | 77 15 | |30| | 45 … … 22 row 20 row 17 row 16 row [16, 17, 20, 22] 31 row 35 row 44 row 32 row [31, 32, 35, 44] page 有序数组
  5. 5. file system vs disk disk - - - > 512 bytes file system - - -> 4k innodb engine - - - > 16k
  6. 6. size不同如何上下对齐? partial page write ? 解决办法是 doublewrite buffer + doublewrite doublewrite 性能影响? 5% - 10%
  7. 7. dirty page dirty page double write buffer (2m) double write | double write 数据 文件 redo log crash recovery !!!
  8. 8. what is acid atomicity consistency isolation durability
  9. 9. atomicity 一个事务为一个原子 不可分割 要么成功 要么失败
  10. 10. consistency 经过一系列改动及异常崩溃,最后的结果是我要的. 最终一致性 强一致性
  11. 11. isolation mvcc 隔离级别 READ UNCOMMITTED READ COMMITTED REPEATABLE READ SERIALIZABLE
  12. 12. durability 一旦提交成功,那么事务涉及的数据都会持久化 即使系统崩溃,修改的数据也不会丢失
  13. 13. innodb buffer pool page buffered page buffered page buffered page buffered page buffered page buffered old 37% new 63% 新数据 新数据 新数据 新数据
  14. 14. update where blog = ‘xiaorui.cc’ data in buffer pool Load page to buffer pool update data and mark dirty page !!! yes ! no !
  15. 15. innodb buffer pool lru dirty page free list flush list
  16. 16. 一些概念总览 redo undo checkpoint purge
  17. 17. if not checkpoint 写操作都是在buffer pool进行,每次提交都会顺序io增加redo日志. buffer pool里面的dirty page越来越多 . redo 越来越大 , 导致recovery时间过长 .
  18. 18. if checkpoint 当 buffer pool 内存不够用时, 将dirty page刷新到文件. 当redo文件过大又不可循环写入时 , 同上 缩短crash recovery时间
  19. 19. checkpoint + lsn 待清理 Redo Log p1 p2 p3 buffer pool flush list 创建checkpoint lsn: 1500 作为下一个点 p4 p5 刷新到磁盘 ! last checkpoint : 1000 顺序: redo log == flush list ?
  20. 20. 一个事务过程 1. begin 2. undo log 3. udate where id = xxx; 4. redo buffer 5. redo fsync; binlog; commit if crash ? if crash ? if crash ? one of three happen crash ?
  21. 21. redo when redo save ? when commit ! 顺序io page number + block when redo reduce ? Ringbuffer full flush timer run
  22. 22. undo 随机io 是否存入文件, 取决于buffer pool
  23. 23. flush log rule innodb_flush_log_at_trx_commit = 1 Log Buffer File System Buffer Lb_log file innodb_flush_log_at_trx_commit = 0 innodb_flush_log_at_trx_commit = 2 write os cache timer flush every 1s
  24. 24. 数据库并发处理的演变 基于表的读写锁 基于索引的读写锁 (分离锁) mvcc
  25. 25. mvcc 读读非阻塞 读写非阻塞 写写阻塞 一句话, 并发的效果,串行的结果 !!!
  26. 26. 又是概念 read view 隐藏字段 是否可见的条件 当前读 快照读
  27. 27. mvcc流程图 trx_id 10 trx_id 15 | row| trx_id | rollback ptr | … read view list trx_id roll ptr data undo data trx_id roll ptr data … undo dataread & write trx_id in read view <= trx_id
  28. 28. 幻读 record lock gap lock next gap lock
  29. 29. 业务数据不一致 事务 A 事务B ts begin ts select stock where ticket=10 ts begin ts update stock=stock-1 ts commit ts select stock where ticket = 10 ts update stock=stock-1 ts commit stock=1 stock=0 stock=-1 最后 -1
  30. 30. 互斥锁 vs 乐观锁 互斥锁 select * from train where ticket = 10 for update; 乐观锁 while 1: update train set stock=stock-1 where ticket = 10 and stock = x;
  31. 31. crash recovery 从doublewrite 修正缺斤少两的page 重做提交的redo日志 回滚执行一半的 未提交的 事务
  32. 32. “ Q & A” –fengyun rui

×