More Related Content
Similar to MySQL新技术探索与实践 (20)
MySQL新技术探索与实践
- 4. Why ICC
• 为何自己编译MySQL?
官方无静态编译的Innodb Plugin版本
可以加入第三方Patch或修改源码
可以将第三方库静态编译到可执行文件(TCMalloc)
• 为何使用ICC编译?
原生Intel SSE2指令集,浮点运算效率高
内置Intel Math Lib,提升数学函数效率
内置Intel Thread Lib,提升多线程稳定性和效率
- 7. Why XFS
• 为何不使用EXT3?
对SSD设备不友好,SSD是未来数据存储设备的趋势
删除文件速度慢,导致数据库Hang
对大文件读写性能不佳
• 为何选择XFS?
SGI已经在其大型机上应用多年(From 1994),稳定可靠
对SSD设备友好(延迟分配)
高并发下竞争少,性能好(分配组特性)
支持条带化分配,使得文件系统分配与RAID条带完全对
齐,最大化吞吐量
对大文件操作友好(基于Extent的分配方式)
- 8. Why NOT EXT4?
• EXT4也是一款非常好的文件系统
• 性能与XFS接近,甚至好一些
• 并且可以从EXT3无缝升级
• But ......
• 我们没有运维EXT4的经验
- 12. Why Percona
• Percona的优势
对SSD设备有专门的优化
对Flashcache有SQL层接口
允许XtraDB静态编译
支持多种页大小
提供额外的监控参数
有被生产环境考验过(SOHU)
• Percona存在的问题
引入第三方补丁较多,可能存在Bug(可以接受)
- 13. New Future(1)
• 文件格式
Compressed结构:CPU换IO
Dynamic结构:ROW中不存大字段前缀
• IO参数
IO容量:innodb_io_capacity
IO线程数:innodb_read_io_threads(预读)、
innodb_write_io_threads(赃页回写)、
innodb_use_purge_thread(清理UNDO)
• 赃页刷新方式
innodb_adaptive_checkpoint (XtraDB)
innodb_adaptive_flushing (InnoDB Plugin)
- 14. New Future(2)
• 扩展性
增强多处理机性能(About 24 Cores)
拆分Buffer Pool Mutex(buf_pool_mutex、
LRU_list_mutex、flush_list_mutex、page_hash_latch、
free_list_mutex、zip_free_mutex、zip_hash_mutex)
• 功能
可变页大小(innodb_page_size)
可控的Insert Buffering和Adaptive Hash Index
可配置多回滚段(innodb_extra_rsegments)
快速Warn Up(innodb_buffer_pool_shm_key 、
XTRA_LRU_DUMP/XTRA_LRU_RESTORE)
快速创建索引和索引快速重命名
- 15. New Future(3)
• 监控
扩展information_schema
– INDEX_STATISTICS
– TABLE_STATISTICS
– USER_STATISTICS
扩展InnoDB统计
– INNODB_TABLE_STATS
– INNODB_INDEX_STATS
For Example
– 可以获取未使用过的索引
– 可以获取索引被用于访问的行数
– 可以获取当前锁定信息
– 可以获取用户连接统计信息
– ……
- 17. Why Handler Socket
• SQL执行的Oprofile
samples % app name symbol name
259130 4.5199 mysqld MYSQLparse(void*)
196841 3.4334 mysqld my_pthread_fastmutex_lock
106439 1.8566 libc-2.5.so _int_malloc
……
63435 1.1065 mysqld JOIN::optimize()
55825 0.9737 vmlinux wakeup_stack_begin
55054 0.9603 mysqld MYSQLlex(void*, void*)
50833 0.8867 libpthread-2.5.so pthread_mutex_trylock
49602 0.8652 ha_innodb_plugin.so.0.0.0 row_search_for_mysql
……
46499 0.8111 libc-2.5.so malloc
- 18. Why Handler Socket
• HandlerSocket执行的Oprofile
samples % app name symbol name
984785 5.9118 bnx2 /bnx2
847486 5.0876 ha_innodb_plugin.so.0.0.0 ut_delay
545303 3.2735 ha_innodb_plugin.so.0.0.0
btr_search_guess_on_hash
317570 1.9064 ha_innodb_plugin.so.0.0.0 row_search_for_mysql
……
206057 1.2370 HandlerSocket.so
dena::hstcpsvr_worker::run_one_ep()
183330 1.1006 ha_innodb_plugin.so.0.0.0 mutex_spin_wait
175738 1.0550 HandlerSocket.so dena::dbcontext::
cmd_find_internal(dena::dbcallback_i&, dena::prep_stmt const&,
ha_rkey_function, dena::cmd_exec_args const&)
……
149611 0.8981 ha_innodb_plugin.so.0.0.0 row_sel_store_mysql_rec
- 19. HS vs MC vs SQL
硬件环境
CPU:Intel Xoen 5520
内存:24G
硬盘:10*15k SAS RAID10
MySQL:5.1.48
- 21. Our Solution(2)
• RAID卡
关闭预读:预读效果不佳,直接读取磁盘
关闭磁盘Cache:RAID卡缓存已经缓冲了写操作,磁盘
Cache无电池
条带:默认64K,调整为1M
• Linux
IO调度:/sys/block/sdb/queue/scheduler,默认cfq,
调整为deadline
减少预读:/sys/block/sdb/queue/read_ahead_kb,默
认128,调整为16
增大队列:/sys/block/sdb/queue/nr_requests,默认
128,调整为512
NUMA策略:numactl --interleave=all 或 --
cpunodebind=0 --localalloc
- 22. Our Solution(3)
• Flashcache
Block Size=4K:与SSD设备页对齐
dirty_thresh_pct = 90:一个SET内90%都是脏块则刷新
write_merge = 1:写入合并,提升写磁盘的性能
fast_remove = 1:解除绑定时无需将脏块写入磁盘
• Percona
页设置:innodb_page_size=4096、
innodb_fast_checksum=1
刷新策略:innodb_adaptive_checkpoint=3、
innodb_flush_neighbor_pages=0
IO容量:innodb_io_capacity>10000
IO线程:innodb_read_io_threads = 1、
innodb_write_io_threads = 16
- 23. Our Solution(4)
• 监控
Fusion-IO(fio-status):
– Logical bytes written:逻辑写
– Logical bytes read :逻辑读
– Physical bytes written:物理写
– Physical bytes read:物理读
Flashcache(dmsetup status):
– read hit percent:读命中
– write hit percent:写命中
Editor's Notes
- #include <iostream>
#include <cstdlib>
#include <time.h>
#include <math.h>
#define MAXN 2000000000
using namespace std;
//计时器
class Timer {
public :
//构造函数
Timer ();
//析构函数
~Timer ();
//开始计时
void begin();
//计时结束
void end();
//获取时间
float get_time();
private :
clock_t start, finish;
float time;
};
Timer::Timer () {
start = 0;
finish = 0;
}
Timer::~Timer () {
start = 0;
finish = 0;
}
void Timer::begin () {
start = clock();
}
void Timer::end () {
finish = clock();
}
float Timer::get_time() {
time = (float)(finish-start)/CLOCKS_PER_SEC;
return time;
}
float pi(int n)
{
float s=0;
float a;
for(int i=1;i<=n;++i)
{
if(i%2==0)
{
a = 1.0/(2*i-1);
s -= a;
}
else
{
a = 1.0/(2*i-1);
s += a;
}
}
s *= 4;
return s;
};
float sqrt1 (int n) {
float a = 0;
for (int i=1; i<=n; ++i) {
/*if (i%2 == 0) {
a += 1/sqrt(i);
} else {
a -= 1/sqrt(i);
}*/
a += 1/sqrt(i);
}
return a;
}
float sqrt2 (float x) {
float xhalf = 0.5f*x;
int i = *(int*)&x; // get bits for floating VALUE
i = 0x5f375a86- (i>>1); // gives initial guess y0
x = *(float*)&i; // convert bits BACK to float
x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy
return x;
}
float sqrt3 (int n) {
float a = 0;
for (int i=1; i<=n; ++i) {
/*if (i%2 == 0) {
a += sqrt2((float)i);
} else {
a -= sqrt2((float)i);
}*/
a += sqrt2((float)i);
}
return a;
}
int main()
{
Timer timer;
timer.begin ();
float p = pi(MAXN);
timer.end ();
cout<<"1.PI Test(NULL):"<<timer.get_time()<<endl;
timer.begin ();
cout<<"\n2.PI="<<pi(MAXN)<<endl;
timer.end ();
cout<<"2.PI Test:"<<timer.get_time()<<endl;
timer.begin ();
sqrt1(MAXN);
timer.end ();
cout<<"\n3.system sqrt Test(NULL):"<<timer.get_time()<<endl;
timer.begin ();
cout<<"\n4.sqrt1="<<sqrt1(MAXN)<<endl;
timer.end ();
cout<<"4.system sqrt Test:"<<timer.get_time()<<endl;
timer.begin ();
float s = sqrt3(MAXN);
timer.end ();
cout<<"\n5.newton sqrt Test(NULL):"<<timer.get_time()<<endl;
timer.begin ();
cout<<"\n6.sqrt2="<<sqrt3(MAXN)<<endl;
timer.end ();
cout<<"6.newton sqrt Test:"<<timer.get_time()<<endl;
return 0;
}
- GCC编译参数:CC=gcc \
CXX=gcc \
CHOST="x86_64-pc-linux-gnu" \
CFLAGS=" -O3 \
-fomit-frame-pointer \
-pipe \
-march=nocona \
-mfpmath=sse \
-m128bit-long-double \
-mmmx \
-msse \
-msse2 \
-maccumulate-outgoing-args \
-m64 \
-ftree-loop-linear \
-fprefetch-loop-arrays \
-freg-struct-return \
-fgcse-sm \
-fgcse-las \
-frename-registers \
-fforce-addr \
-fivopts \
-ftree-vectorize \
-ftracer \
-frename-registers \
-minline-all-stringops \
-fno-exceptions \
-fno-rtti \
-fbranch-target-load-optimize2 \
-fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free " \
CXXFLAGS="${CFLAGS}" \
LDFLAGS=" -ltcmalloc_minimal -lstdc++ " \
./configure \
--with-server-suffix=-alibaba-edition \
--with-mysqld-user=mysql \
--with-plugins=heap,innodb_plugin,myisam,partition \
--with-charset=utf8 \
--with-collation=utf8_general_ci \
--with-extra-charsets=gbk,utf8,ascii \
--with-big-tables \
--with-fast-mutexes \
--with-zlib-dir=bundled \
--with-readline \
--with-pthread \
--enable-assembler \
--enable-profiling \
--enable-local-infile \
--enable-thread-safe-client \
--with-mysqld-ldflags=-all-static \
--without-embedded-server \
--without-query-cache \
--without-geometry \
--without-debug \
--without-ndb-binlog \
--without-ndb-debug
ICC参数:
CC=icc \
CXX=icpc \
LD=xild \
AR=xiar \
CFLAGS="-O3 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -unroll2 -ip -fp-model fast=1 -restrict -fno-exceptions -fno-rtti -no-prec-div -fno-implicit-templates -static-intel -static-libgcc -static -xSSE2 -axSSE2 " \
CXXFLAGS="${CFLAGS}" \
CPPFLAGS=" -I/usr/alibaba/icc/include " \
LDFLAGS=" -L/usr/alibaba/icc/lib/intel64/ -lrt -ltcmalloc_minimal -lstdc++ " \
./configure \
--with-server-suffix=-alibaba-edition \
--with-mysqld-user=mysql \
--with-plugins=heap,innodb_plugin,myisam,partition \
--with-charset=utf8 \
--with-collation=utf8_general_ci \
--with-extra-charsets=gbk,utf8,ascii \
--with-big-tables \
--with-fast-mutexes \
--with-zlib-dir=bundled \
--with-readline \
--with-pthread \
--enable-assembler \
--enable-profiling \
--enable-local-infile \
--enable-thread-safe-client \
--with-mysqld-ldflags=-all-static \
--without-embedded-server \
--without-query-cache \
--without-geometry \
--without-debug \
--without-ndb-binlog \
--without-ndb-debug
- 格式化参数:
mkfs.xfs -f -i size=512,attr=2 -l size=128m,lazy-count=1 -d su=64k,sw=5 -L data /dev/sdb1
mount参数:
/dev/sdb1 /data xfs defaults,noatime,nodiratime,noikeep,nobarrier,allocsize=8M,attr2,largeio,inode64,swalloc 0 0
- 测试场景为1G数据不断执行SELECT或GET操作同一条记录