SlideShare a Scribd company logo
1 of 63
Download to read offline
MyRocks introduction and
production deployment at Facebook
Yoshinori Matsunobu
Production Engineer, Facebook
Jun 2017
Who am I
▪ Was a MySQL consultant at MySQL (Sun, Oracle) for 4 years
▪ Joined Facebook in March 2012
▪ MySQL 5.1 -> 5.6 upgrade
▪ Fast master failover without losing data
▪ Partly joined HBase Production Engineering team in 2014.H1
▪ Started a research project to integrate RocksDB and MySQL from
2014.H2, with MySQL Engineering Team and MariaDB
▪ Started MyRocks production deployment since 2016.H2
Agenda
▪ MySQL at Facebook
▪ Issues in InnoDB
▪ RocksDB and MyRocks overview
▪ Production Deployment
“Main MySQL Database” at Facebook
▪ Storing Social Graph
▪ Massively Sharded
▪ Petabytes scale
▪ Low latency
▪ Automated Operations
▪ Pure Flash Storage (Constrained by space, not by CPU/IOPS)
H/W trends and limitations
▪ SSD/Flash is getting affordable, but MLC Flash is still a bit expensive
▪ HDD: Large enough capacity but very limited IOPS
▪ Reducing read/write IOPS is very important -- Reducing write is harder
▪ SSD/Flash: Great read iops but limited space and write endurance
▪ Reducing space is higher priority
InnoDB Issue (1) -- Write Amplification
Row
Read
Modify
Write
Row
Row
Row
Row
Row
Row
Row
Row
Row
- 1 Byte Modification results in
one page write (4~16KB)
- InnoDB “Doublewrite” doubles
write volume
InnoDB Issue (2) -- B+Tree Fragmentation
INSERT INTO message_table (user_id) VALUES (31)
user_id RowID
1 10000
2 5
…
3 15321
60 431
Leaf Block 1
user_id RowID
1 10000
…
30 333
Leaf Block 1
user_id RowID
31 345
Leaf Block 2
60 431
…
Empty Empty
InnoDB Issue (3) -- Compression
Uncompressed
16KB page
Row
Row
Row
Compressed
to 5KB
Row
Row
Row
Using 8KB space
on storage
Row
Row
Row
0~4KB => 4KB
4~8KB => 8KB
8~16KB => 16KB
RocksDB
▪ http://rocksdb.org/
▪ Forked from LevelDB
▪ Key-Value LSM (Log Structured Merge) persistent store
▪ Embedded
▪ Data stored locally
▪ Optimized for fast storage
▪ LevelDB was created by Google
▪ Facebook forked and developed RocksDB
▪ Used at many backend services at Facebook, and many external large services
▪ Needs to write C++ or Java code to access RocksDB
RocksDB architecture overview
▪ Leveled LSM Structure
▪ MemTable
▪ WAL (Write Ahead Log)
▪ Compaction
▪ Column Family
RocksDB Architecture
Write Request
Memory
Switch Switch
Persistent Storage
Flush
Active MemTable
MemTable
MemTable
WAL
WAL
WAL
File1 File2 File3
File4 File5 File6
Files
Compaction
Write Path (1)
Write Request
Memory
Switch Switch
Persistent Storage
Flush
Active MemTable
MemTable
MemTable
WAL
WAL
WAL
File1 File2 File3
File4 File5 File6
Files
Compaction
Write Path (2)
Write Request
Memory
Switch Switch
Persistent Storage
Flush
Active MemTable
MemTable
MemTable
WAL
WAL
WAL
File1 File2 File3
File4 File5 File6
Files
Compaction
Write Path (3)
Write Request
Memory
Switch Switch
Persistent Storage
Flush
Active MemTable
MemTable
MemTable
WAL
WAL
WAL
File1 File2 File3
File4 File5 File6
Files
Compaction
Write Path (4)
Write Request
Memory
Switch Switch
Persistent Storage
Flush
Active MemTable
MemTable
MemTable
WAL
WAL
WAL
File1 File2 File3
File4 File5 File6
Files
Compaction
Read Path
Read Request
Memory Persistent Storage
Active MemTable
Bloom Filter
MemTableMemTable
Bloom Filter
WAL
WAL
WAL
File1 File2 File3
File4 File5 File6
Files
Read Path
Read Request
Memory Persistent Storage
Active MemTable
Bloom Filter
MemTableMemTable
Bloom Filter
WAL
WAL
WAL
File1 File2 File3
File4 File5 File6
Files
Index and Bloom
Filters cached In
Memory
File 1 File 3
How LSM/RocksDB works
INSERT INTO message (user_id) VALUES (31);
INSERT INTO message (user_id) VALUES (99999);
INSERT INTO message (user_id) VALUES (10000);
WAL/
MemTable
10000
99999
31
existing SSTs
10
1
150
50000
…
55000
50001
55700
110000
…
….
31
10000
99999
New SST
Compaction
Writing new SSTs sequentially
N random row modifications => A few sequential reads & writes
10
1
31
…
150
10000
55000
50001
55700
…
99999
LSM handles compression better
16MB SST File
5MB SST File
Block
Block
Row
Row
Row
Row
Row
Row
Row
Row
=> Aligned to OS sector (4KB unit)
=> Negligible OS page alignment
overhead
Reducing Space/Write Amplification
Append Only Prefix Key Encoding Zero-Filling metadata
WAL/
Memstore
Row
Row
Row
Row
Row
Row
sst
id1 id2 id3
100 200 1
100 200 2
100 200 3
100 200 4
id1 id2 id3
100 200 1
2
3
4
key value seq id flag
k1 v1 1234561 W
k2 v2 1234562 W
k3 v3 1234563 W
k4 v4 1234564 W
Seq id is 7 bytes in RocksDB. After compression,
“0” uses very little space
key value seq id flag
k1 v1 0 W
k2 v2 0 W
k3 v3 0 W
k4 v4 0 W
LSM Compaction Algorithm -- Level
▪ For each level, data is sorted by key
▪ Read Amplification: 1 ~ number of levels (depending on cache -- L0~L3 are usually cached)
▪ Write Amplification: 1 + 1 + fanout * (number of levels – 2) / 2
▪ Space Amplification: 1.11
▪ 11% is much smaller than B+Tree’s fragmentation
Read Penalty on LSM
MemTable
L0
L1
L2
L3
RocksDBInnoDB
SELECT id1, id2, time FROM t WHERE id1=100 AND id2=100 ORDER BY time DESC LIMIT 1000;
Index on (id1, id2, time)
Branch
Leaf
Range Scan with covering index is
done by just reading leaves sequentially,
and ordering is guaranteed (very efficient)
Merge is needed to do range scan with
ORDER BY
(L0-L2 are usually cached, but in total it needs
more CPU cycles than InnoDB)
Bloom Filter
MemTable
L0
L1
L2
L3
KeyMayExist(id=31) ?
false
false
Checking key may exist or not without reading data,
and skipping read i/o if it definitely does not exist
Column Family
Query atomicity across different key spaces.
▪ Column families:
▪ Separate MemTables and SST files
▪ Share transactional logs
WAL
(shared)
CF1
Active MemTable
CF2
Active MemTable
CF1
Active MemTable
CF1
Read Only
MemTable(s)
CF1
Active MemTable
CF2
Read Only
MemTable(s)
File File File
CF1 Persistent Files
File File File
CF2 Persistent Files
CF1
Compaction
CF2
Compaction
What is MyRocks
▪ MySQL on top of RocksDB (RocksDB storage engine)
▪ Taking both LSM advantages and MySQL features
▪ LSM advantage: Smaller space and lower write amplification
▪ MySQL features: SQL, Replication, Connectors and many tools
▪ Open Source, distributed from MariaDB and Percona as well
MySQL Clients
InnoDB RocksDB
Parser
Optimizer
Replication
etc
SQL/Connector
MySQL
http://myrocks.io/
MyRocks features
▪ Clustered Index (same as InnoDB)
▪ Transactions, including consistency between binlog and RocksDB
▪ Faster data loading, deletes and replication
▪ Dynamic Options
▪ TTL
▪ Crash Safety
▪ Online logical and binary backup
Performance (LinkBench)
▪ Space Usage
▪ QPS
▪ Flash writes per query
▪ HDD
▪ http://smalldatum.blogspot.com/2016/01/myrocks-vs-innodb-with-linkbench-over-7.html
Database Size (Compression)
QPS
Flush writes per query
On HDD workloads
Getting Started
▪ Downloading MySQL with MyRocks support (MariaDB, Percona Server)
▪ Configuring my.cnf
▪ Installing MySQL and initializing data directory
▪ Starting mysqld
▪ Creating and manipulating some tables
▪ Shutting down mysqld
▪ https://github.com/facebook/mysql-5.6/wiki/Getting-Started-with-MyRocks
my.cnf (minimal configuration)
▪ It’s generally not recommended mixing multiple transactional storage engines within the same
instance
▪ Not transactional across engines, Not tested enough
▪ Add “allow-multiple-engines” in my.cnf, if you really want to mix InnoDB and MyRocks
[mysqld]
rocksdb
default-storage-engine=rocksdb
skip-innodb
default-tmp-storage-engine=MyISAM
collation-server=latin1_bin (or utf8_bin, binary)
log-bin
binlog-format=ROW
Creating tables
▪ “ENGINE=RocksDB” is a syntax to create RocksDB tables
▪ Setting “default-storage-engine=RocksDB” in my.cnf is fine too
▪ It is generally recommended to have a PRIMARY KEY, though MyRocks allows tables without
PRIMARY KEYs
▪ Tables are automatically compressed, without any configuration in DDL. Compression
algorithm can be configured via my.cnf
CREATE TABLE t (
id INT PRIMARY KEY,
value1 INT,
value2 VARCHAR (100),
INDEX (value1)
) ENGINE=RocksDB COLLATE latin1_bin;
MyRocks in Production at Facebook
MyRocks Goals
InnoDB in main database
90%
SpaceIOCPU
Machine limit
15%20%
MyRocks in main database
45%
SpaceIOCPU
Machine limit
12%18%
18%
12%
45%
MyRocks goals (more details)
▪ Smaller space usage
▪ 50% compared to compressed InnoDB at FB
▪ Better write amplification
▪ Can use more affordable flash storage
▪ Fast, and small enough CPU usage with general purpose workloads
▪ Large enough data that don’t fit in RAM
▪ Point lookup, range scan, full index/table scan
▪ Insert, update, delete by point/range, and occasional bulk inserts
▪ Same or even smaller CPU usage compared to InnoDB at FB
▪ Make it possible to consolidate 2x more instances per machine
Facebook MySQL related teams
▪ Production Engineering (SRE/PE)
▪ MySQL Production Engineering
▪ Data Performance
▪ Software Engineering (SWE)
▪ RocksDB Engineering
▪ MySQL Engineering
▪ MyRocks
▪ Others (Replication, Client, InnoDB, etc)
Relationship between MySQL PE and SWE
▪ SWE does MySQL server upgrade
▪ Upgrading MySQL revision with several FB patches
▪ Hot fixes
▪ SWE participates in PE oncall
▪ Investigating issues that might have been caused by MySQL server codebase
▪ Core files analysis
▪ Can submit diffs each other
▪ “Hackaweek” to work on shorter term tasks for a week
MyRocks migration -- Technical Challenges
▪ Initial Migration
▪ Creating MyRocks instances without downtime
▪ Loading into MyRocks tables within reasonable time
▪ Verifying data consistency between InnoDB and MyRocks
▪ Continuous Monitoring
▪ Resource Usage like space, iops, cpu and memory
▪ Query plan outliers
▪ Stalls and crashes
MyRocks migration -- Technical Challenges (2)
▪ When running MyRocks on master
▪ RBR (Row based binary logging)
▪ Removing queries relying on InnoDB Gap Lock
▪ Robust XA support (binlog and RocksDB)
Creating first MyRocks instance without downtime
▪ Picking one of the InnoDB slave instances, then starting logical dump
and restore
▪ Stopping one slave does not affect services
Master (InnoDB)
Slave1 (InnoDB) Slave2 (InnoDB) Slave3 (InnoDB) Slave4 (MyRocks)
Stop & Dump & Load
Faster Data Loading
Normal Write Path in MyRocks/RocksDB
Write Requests
MemTableWAL
Level 0 SST
Level 1 SST
Level max SST
….
Flush
Compaction
Compaction
Faster Write Path
Write Requests
Level max SST
“SET SESSION rocksdb_bulk_load=1;”
Original data must be sorted by primary key
Data migration steps
▪ Dst) Create table … ENGINE=ROCKSDB; (creating MyRocks tables with proper column families)
▪ Dst) ALTER TABLE DROP INDEX; (dropping secondary keys)
▪ Src) STOP SLAVE;
▪ mysqldump –host=innodb-host --order-by-primary | mysql –host=myrocks-host –init-
command=“set sql_log_bin=0; set rocksdb_bulk_load=1”
▪ Dst) ALTER TABLE ADD INDEX; (adding secondary keys)
▪ Src, Dst) START SLAVE;
Data Verification
▪ MyRocks/RocksDB is relatively new database technology
▪ Might have more bugs than robust InnoDB
▪ Ensuring data consistency helps avoid showing conflicting
results
Verification tests
▪ Index count check between primary key and secondary keys
▪ If any index is broken, it can be detected
▪ SELECT ‘PRIMARY’, COUNT(*) FROM t FORCE INDEX (PRIMARY)
UNION SELECT ‘idx1’, COUNT(*) FROM t FORCE INDEX (idx1)
▪ Can’t be used if there is no secondary key
▪ Index stats check
▪ Checking if “rows” show SHOW TABLE STATUS is not far different from actual row count
▪ Checksum tests w/ InnoDB
▪ Comparing between InnoDB instance and MyRocks instance
▪ Creating a transaction consistent snapshot at the same GTID position, scan, then compare
checksum
▪ Shadow correctness check
▪ Capturing read traffics
Shadow traffic tests
▪ We have a shadow test framework
▪ MySQL audit plugin to capture read/write queries from production instances
▪ Replaying them into shadow master instances
▪ Shadow master tests
▪ Client errors
▪ Rewriting queries relying on Gap Lock
▪ gap_lock_raise_error=1, gap_lock_write_log=1
Creating second MyRocks instance without downtime
Master (InnoDB)
Slave1 (InnoDB) Slave2 (InnoDB) Slave3 (MyRocks) Slave4 (MyRocks)
myrocks_hotbackup
(Online binary backup)
Promoting MyRocks as a master
Master (MyRocks)
Slave1 (InnoDB) Slave2 (InnoDB) Slave3 (InnoDB) Slave4 (MyRocks)
Crash Safety
▪ Crash Safety makes operations much easier
▪ Just restarting failed instances can restart replication
▪ No need to rebuild entire instances, as long as data is there
▪ Crash Safe Slave
▪ Crash Safe Master
▪ 2PC (binlog and WAL)
Master crash safety settings
sync-binlog rocksdb-flush-log-at-
trx-commit
rocksdb-enable-
2pc
rocksdb-wal-
recovery-mode
No data loss on
unplanned machine
reboot
1 (default) 1 (default) 1 (default) 1 (default)
No data loss on
mysqld crash &
recovery
0 2 1 2
No data loss if
always failover
0 2 0 2
Notable issues fixed during migration
▪ Lots of “Snapshot Conflict” errors
▪ Because of implementation differences in MyRocks (PostgreSQL Style snapshot isolation)
▪ Setting tx-isolation=READ-COMMITTED solved the problem
▪ Slaves stopped with I/O errors on reads
▪ We switched to make MyRocks abort on I/O errors for both reads and writes
▪ Index statistics bugs
▪ Inaccurate row counts/length on bulk loading -> fixed
▪ Cardinality was not updated on fast index creation -> fixed
▪ Crash safety bugs and some crash bugs in MyRocks/RocksDB
Preventing stalls
▪ Heavy writes cause lots of compactions, which may cause stalls
▪ Typical write stall cases
▪ Online schema change
▪ Massive data migration jobs that write lots of data
▪ Use fast data loading whenever possible
▪ InnoDB to MyRocks migration can utilize this technique
▪ Can be harder for data migration jobs that write into existing tables
When write stalls happen
▪ Estimated number of pending compaction bytes exceeded X bytes
▪ soft|hard_pending_compaction_bytes, default 64GB
▪ Number of L0 files exceeded level0_slowdown|stop_writes_trigger (default 10)
▪ Number of unflushed number of MemTables exceeded
max_write_buffer_number (default 4)
▪ All of these incidents are written to LOG as WARN level
▪ All of these options apply to each column family
What happens on write stalls
▪ Soft stalls
▪ COMMIT takes longer time than usual
▪ Total estimated written bytes to MemTable is capped to
rocksdb_delayed_write_rate, until slowdown conditions are cleared
▪ Default is 16MB/s (previously 2MB/s)
▪ Hard stalls
▪ All writes are blocked at COMMIT, until stop conditions are cleared
Mitigating Write Stalls
▪ Speed up compactions
▪ Use faster compression algorithm (LZ4 for higher levels, ZSTD in the bottommost)
▪ Increase rocksdb_max_background_compactions
▪ Reduce total bytes written
▪ But avoid using too strong compression algorithm on upper levels
▪ Use more write efficient compaction algorithm
▪ compaction_pri=kMinOverlappingRatio
▪ Delete files slowly on Flash
▪ Deleting too many large files cause TRIM stalls on Flash
▪ MyRocks has an option to throttle sst file deletion speed
▪ Binlog file deletions should be slowed down as well
Monitoring
▪ MyRocks files
▪ SHOW ENGINE ROCKSDB STATUS
▪ SHOW ENGINE ROCKSDB TRANSACTION STATUS
▪ LOG files
▪ information_schema tables
▪ sst_dump tool
▪ ldb tool
SHOW ENGINE ROCKSDB STATUS
▪ Column Family Statistics, including size, read and write amp per
level
▪ Memory usage
*************************** 7. row ***************************
Type: CF_COMPACTION
Name: default
Status:
** Compaction Stats [default] **
Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
L0 2/0 51.58 0.5 0.0 0.0 0.0 0.3 0.3 0.0 0.0 0.0 40.3 7 10 0.669 0 0
L3 6/0 109.36 0.9 0.7 0.7 0.0 0.6 0.6 0.0 0.9 43.8 40.7 16 3 5.172 7494K 297K
L4 61/0 1247.31 1.0 2.0 0.3 1.7 2.0 0.2 0.0 6.9 49.7 48.5 41 9 4.593 15M 176K
L5 989/0 12592.86 1.0 2.0 0.3 1.8 1.9 0.1 0.0 7.4 8.1 7.4 258 8 32.209 17M 726K
L6 4271/0 127363.51 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
Sum 5329/0 141364.62 0.0 4.7 1.2 3.5 4.7 1.2 0.0 17.9 15.0 15.0 321 30 10.707 41M 1200K
SHOW GLOBAL STATUS
mysql> show global status like 'rocksdb%';
+---------------------------------------+-------------+
| Variable_name | Value |
+---------------------------------------+-------------+
| rocksdb_rows_deleted | 216223 |
| rocksdb_rows_inserted | 1318158 |
| rocksdb_rows_read | 7102838 |
| rocksdb_rows_updated | 1997116 |
....
| rocksdb_bloom_filter_prefix_checked | 773124 |
| rocksdb_bloom_filter_prefix_useful | 308445 |
| rocksdb_bloom_filter_useful | 10108448 |
....
Kernel VM allocation stalls
▪ RocksDB has limited support for O_DIRECT
▪ Too many buffered writes may cause “VM allocation stalls” on
older Linux kernels
▪ Some queries may take more than 1s because of this
▪ Using more %sys
▪ Linux Kernel 4.6 is much better for handling buffered writes
▪ For details: https://lwn.net/Articles/704739/
More information
▪ MyRocks home, documentation:
▪ http://myrocks.io
▪ https://github.com/facebook/mysql-5.6/wiki
▪ Source Repo: https://github.com/facebook/mysql-5.6
▪ Bug Reports: https://github.com/facebook/mysql-5.6/issues
▪ Forum: https://groups.google.com/forum/#!forum/myrocks-dev
Summary
▪ We started migrating from InnoDB to MyRocks in our main database
▪ We run both master and slaves in production
▪ Major motivation was to save space
▪ Online data correctness check tool helped to find lots of data integrity
bugs and prevented from deploying inconsistent instances in production
▪ Bulk loading greatly reduced compaction stalls
▪ Transactions and crash safety are supported
(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

More Related Content

What's hot

Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScyllaDB
 
Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기NeoClova
 
Sql server エンジニアに知ってもらいたい!! sql server チューニングアプローチ
Sql server エンジニアに知ってもらいたい!! sql server チューニングアプローチSql server エンジニアに知ってもらいたい!! sql server チューニングアプローチ
Sql server エンジニアに知ってもらいたい!! sql server チューニングアプローチMasayuki Ozawa
 
しばちょう先生による特別講義! RMANバックアップの運用と高速化チューニング
しばちょう先生による特別講義! RMANバックアップの運用と高速化チューニングしばちょう先生による特別講義! RMANバックアップの運用と高速化チューニング
しばちょう先生による特別講義! RMANバックアップの運用と高速化チューニングオラクルエンジニア通信
 
NOSQLの基礎知識(講義資料)
NOSQLの基礎知識(講義資料)NOSQLの基礎知識(講義資料)
NOSQLの基礎知識(講義資料)CLOUDIAN KK
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
Innodb Deep Talk #2 でお話したスライド
Innodb Deep Talk #2 でお話したスライドInnodb Deep Talk #2 でお話したスライド
Innodb Deep Talk #2 でお話したスライドYasufumi Kinoshita
 
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScaleThe Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScaleColin Charles
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsMydbops
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsAlexander Korotkov
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바NeoClova
 
MariaDB 마이그레이션 - 네오클로바
MariaDB 마이그레이션 - 네오클로바MariaDB 마이그레이션 - 네오클로바
MariaDB 마이그레이션 - 네오클로바NeoClova
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
1.mysql disk io 모니터링 및 분석사례
1.mysql disk io 모니터링 및 분석사례1.mysql disk io 모니터링 및 분석사례
1.mysql disk io 모니터링 및 분석사례I Goo Lee
 
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdfOracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdfSrirakshaSrinivasan2
 
Liquibase migration for data bases
Liquibase migration for data basesLiquibase migration for data bases
Liquibase migration for data basesRoman Uholnikov
 
シンプルでシステマチックな Oracle Database, Exadata 性能分析
シンプルでシステマチックな Oracle Database, Exadata 性能分析シンプルでシステマチックな Oracle Database, Exadata 性能分析
シンプルでシステマチックな Oracle Database, Exadata 性能分析Yohei Azekatsu
 

What's hot (20)

Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
 
Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기
 
Sql server エンジニアに知ってもらいたい!! sql server チューニングアプローチ
Sql server エンジニアに知ってもらいたい!! sql server チューニングアプローチSql server エンジニアに知ってもらいたい!! sql server チューニングアプローチ
Sql server エンジニアに知ってもらいたい!! sql server チューニングアプローチ
 
しばちょう先生による特別講義! RMANバックアップの運用と高速化チューニング
しばちょう先生による特別講義! RMANバックアップの運用と高速化チューニングしばちょう先生による特別講義! RMANバックアップの運用と高速化チューニング
しばちょう先生による特別講義! RMANバックアップの運用と高速化チューニング
 
NOSQLの基礎知識(講義資料)
NOSQLの基礎知識(講義資料)NOSQLの基礎知識(講義資料)
NOSQLの基礎知識(講義資料)
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Innodb Deep Talk #2 でお話したスライド
Innodb Deep Talk #2 でお話したスライドInnodb Deep Talk #2 でお話したスライド
Innodb Deep Talk #2 でお話したスライド
 
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScaleThe Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
 
PostgreSQL replication
PostgreSQL replicationPostgreSQL replication
PostgreSQL replication
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바
 
MariaDB 마이그레이션 - 네오클로바
MariaDB 마이그레이션 - 네오클로바MariaDB 마이그레이션 - 네오클로바
MariaDB 마이그레이션 - 네오클로바
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
1.mysql disk io 모니터링 및 분석사례
1.mysql disk io 모니터링 및 분석사례1.mysql disk io 모니터링 및 분석사례
1.mysql disk io 모니터링 및 분석사례
 
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdfOracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
 
Liquibase migration for data bases
Liquibase migration for data basesLiquibase migration for data bases
Liquibase migration for data bases
 
シンプルでシステマチックな Oracle Database, Exadata 性能分析
シンプルでシステマチックな Oracle Database, Exadata 性能分析シンプルでシステマチックな Oracle Database, Exadata 性能分析
シンプルでシステマチックな Oracle Database, Exadata 性能分析
 

Similar to MyRocks introduction and production deployment

Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMariaDB plc
 
M|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocksM|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocksMariaDB plc
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank:  Rocking the Database World with RocksDBThe Hive Think Tank:  Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformMaris Elsins
 
NoSQL in Real-time Architectures
NoSQL in Real-time ArchitecturesNoSQL in Real-time Architectures
NoSQL in Real-time ArchitecturesRonen Botzer
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationTim Callaghan
 
Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQLUlf Wendel
 
ZFS for Databases
ZFS for DatabasesZFS for Databases
ZFS for Databasesahl0003
 
Geospatial Big Data - Foss4gNA
Geospatial Big Data - Foss4gNAGeospatial Big Data - Foss4gNA
Geospatial Big Data - Foss4gNAnormanbarker
 
Mysql 2007 Tech At Digg V3
Mysql 2007 Tech At Digg V3Mysql 2007 Tech At Digg V3
Mysql 2007 Tech At Digg V3epee
 
Storage Engine Wars at Parse
Storage Engine Wars at ParseStorage Engine Wars at Parse
Storage Engine Wars at ParseMongoDB
 
Linux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLLinux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLYoshinori Matsunobu
 
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Ontico
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Javasunnygleason
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at nightMichael Yarichuk
 

Similar to MyRocks introduction and production deployment (20)

Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
 
M|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocksM|18 How Facebook Migrated to MyRocks
M|18 How Facebook Migrated to MyRocks
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank:  Rocking the Database World with RocksDBThe Hive Think Tank:  Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
 
NoSQL in Real-time Architectures
NoSQL in Real-time ArchitecturesNoSQL in Real-time Architectures
NoSQL in Real-time Architectures
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free Replication
 
NoSQL
NoSQLNoSQL
NoSQL
 
MySQL highav Availability
MySQL highav AvailabilityMySQL highav Availability
MySQL highav Availability
 
Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQL
 
ZFS for Databases
ZFS for DatabasesZFS for Databases
ZFS for Databases
 
In-memory Databases
In-memory DatabasesIn-memory Databases
In-memory Databases
 
Geospatial Big Data - Foss4gNA
Geospatial Big Data - Foss4gNAGeospatial Big Data - Foss4gNA
Geospatial Big Data - Foss4gNA
 
Mysql 2007 Tech At Digg V3
Mysql 2007 Tech At Digg V3Mysql 2007 Tech At Digg V3
Mysql 2007 Tech At Digg V3
 
Storage Engine Wars at Parse
Storage Engine Wars at ParseStorage Engine Wars at Parse
Storage Engine Wars at Parse
 
Linux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLLinux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQL
 
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
 

More from Yoshinori Matsunobu

Consistency between Engine and Binlog under Reduced Durability
Consistency between Engine and Binlog under Reduced DurabilityConsistency between Engine and Binlog under Reduced Durability
Consistency between Engine and Binlog under Reduced DurabilityYoshinori Matsunobu
 
データベース技術の羅針盤
データベース技術の羅針盤データベース技術の羅針盤
データベース技術の羅針盤Yoshinori Matsunobu
 
MHA for MySQLとDeNAのオープンソースの話
MHA for MySQLとDeNAのオープンソースの話MHA for MySQLとDeNAのオープンソースの話
MHA for MySQLとDeNAのオープンソースの話Yoshinori Matsunobu
 
MySQL for Large Scale Social Games
MySQL for Large Scale Social GamesMySQL for Large Scale Social Games
MySQL for Large Scale Social GamesYoshinori Matsunobu
 
ソーシャルゲームのためのデータベース設計
ソーシャルゲームのためのデータベース設計ソーシャルゲームのためのデータベース設計
ソーシャルゲームのためのデータベース設計Yoshinori Matsunobu
 
More mastering the art of indexing
More mastering the art of indexingMore mastering the art of indexing
More mastering the art of indexingYoshinori Matsunobu
 
SSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLSSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLYoshinori Matsunobu
 
Linux performance tuning & stabilization tips (mysqlconf2010)
Linux performance tuning & stabilization tips (mysqlconf2010)Linux performance tuning & stabilization tips (mysqlconf2010)
Linux performance tuning & stabilization tips (mysqlconf2010)Yoshinori Matsunobu
 
Linux/DB Tuning (DevSumi2010, Japanese)
Linux/DB Tuning (DevSumi2010, Japanese)Linux/DB Tuning (DevSumi2010, Japanese)
Linux/DB Tuning (DevSumi2010, Japanese)Yoshinori Matsunobu
 

More from Yoshinori Matsunobu (11)

Consistency between Engine and Binlog under Reduced Durability
Consistency between Engine and Binlog under Reduced DurabilityConsistency between Engine and Binlog under Reduced Durability
Consistency between Engine and Binlog under Reduced Durability
 
データベース技術の羅針盤
データベース技術の羅針盤データベース技術の羅針盤
データベース技術の羅針盤
 
MHA for MySQLとDeNAのオープンソースの話
MHA for MySQLとDeNAのオープンソースの話MHA for MySQLとDeNAのオープンソースの話
MHA for MySQLとDeNAのオープンソースの話
 
Introducing MySQL MHA (JP/LT)
Introducing MySQL MHA (JP/LT)Introducing MySQL MHA (JP/LT)
Introducing MySQL MHA (JP/LT)
 
MySQL for Large Scale Social Games
MySQL for Large Scale Social GamesMySQL for Large Scale Social Games
MySQL for Large Scale Social Games
 
Automated master failover
Automated master failoverAutomated master failover
Automated master failover
 
ソーシャルゲームのためのデータベース設計
ソーシャルゲームのためのデータベース設計ソーシャルゲームのためのデータベース設計
ソーシャルゲームのためのデータベース設計
 
More mastering the art of indexing
More mastering the art of indexingMore mastering the art of indexing
More mastering the art of indexing
 
SSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLSSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQL
 
Linux performance tuning & stabilization tips (mysqlconf2010)
Linux performance tuning & stabilization tips (mysqlconf2010)Linux performance tuning & stabilization tips (mysqlconf2010)
Linux performance tuning & stabilization tips (mysqlconf2010)
 
Linux/DB Tuning (DevSumi2010, Japanese)
Linux/DB Tuning (DevSumi2010, Japanese)Linux/DB Tuning (DevSumi2010, Japanese)
Linux/DB Tuning (DevSumi2010, Japanese)
 

Recently uploaded

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 

Recently uploaded (20)

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 

MyRocks introduction and production deployment

  • 1. MyRocks introduction and production deployment at Facebook Yoshinori Matsunobu Production Engineer, Facebook Jun 2017
  • 2. Who am I ▪ Was a MySQL consultant at MySQL (Sun, Oracle) for 4 years ▪ Joined Facebook in March 2012 ▪ MySQL 5.1 -> 5.6 upgrade ▪ Fast master failover without losing data ▪ Partly joined HBase Production Engineering team in 2014.H1 ▪ Started a research project to integrate RocksDB and MySQL from 2014.H2, with MySQL Engineering Team and MariaDB ▪ Started MyRocks production deployment since 2016.H2
  • 3. Agenda ▪ MySQL at Facebook ▪ Issues in InnoDB ▪ RocksDB and MyRocks overview ▪ Production Deployment
  • 4. “Main MySQL Database” at Facebook ▪ Storing Social Graph ▪ Massively Sharded ▪ Petabytes scale ▪ Low latency ▪ Automated Operations ▪ Pure Flash Storage (Constrained by space, not by CPU/IOPS)
  • 5. H/W trends and limitations ▪ SSD/Flash is getting affordable, but MLC Flash is still a bit expensive ▪ HDD: Large enough capacity but very limited IOPS ▪ Reducing read/write IOPS is very important -- Reducing write is harder ▪ SSD/Flash: Great read iops but limited space and write endurance ▪ Reducing space is higher priority
  • 6. InnoDB Issue (1) -- Write Amplification Row Read Modify Write Row Row Row Row Row Row Row Row Row - 1 Byte Modification results in one page write (4~16KB) - InnoDB “Doublewrite” doubles write volume
  • 7. InnoDB Issue (2) -- B+Tree Fragmentation INSERT INTO message_table (user_id) VALUES (31) user_id RowID 1 10000 2 5 … 3 15321 60 431 Leaf Block 1 user_id RowID 1 10000 … 30 333 Leaf Block 1 user_id RowID 31 345 Leaf Block 2 60 431 … Empty Empty
  • 8. InnoDB Issue (3) -- Compression Uncompressed 16KB page Row Row Row Compressed to 5KB Row Row Row Using 8KB space on storage Row Row Row 0~4KB => 4KB 4~8KB => 8KB 8~16KB => 16KB
  • 9. RocksDB ▪ http://rocksdb.org/ ▪ Forked from LevelDB ▪ Key-Value LSM (Log Structured Merge) persistent store ▪ Embedded ▪ Data stored locally ▪ Optimized for fast storage ▪ LevelDB was created by Google ▪ Facebook forked and developed RocksDB ▪ Used at many backend services at Facebook, and many external large services ▪ Needs to write C++ or Java code to access RocksDB
  • 10. RocksDB architecture overview ▪ Leveled LSM Structure ▪ MemTable ▪ WAL (Write Ahead Log) ▪ Compaction ▪ Column Family
  • 11. RocksDB Architecture Write Request Memory Switch Switch Persistent Storage Flush Active MemTable MemTable MemTable WAL WAL WAL File1 File2 File3 File4 File5 File6 Files Compaction
  • 12. Write Path (1) Write Request Memory Switch Switch Persistent Storage Flush Active MemTable MemTable MemTable WAL WAL WAL File1 File2 File3 File4 File5 File6 Files Compaction
  • 13. Write Path (2) Write Request Memory Switch Switch Persistent Storage Flush Active MemTable MemTable MemTable WAL WAL WAL File1 File2 File3 File4 File5 File6 Files Compaction
  • 14. Write Path (3) Write Request Memory Switch Switch Persistent Storage Flush Active MemTable MemTable MemTable WAL WAL WAL File1 File2 File3 File4 File5 File6 Files Compaction
  • 15. Write Path (4) Write Request Memory Switch Switch Persistent Storage Flush Active MemTable MemTable MemTable WAL WAL WAL File1 File2 File3 File4 File5 File6 Files Compaction
  • 16. Read Path Read Request Memory Persistent Storage Active MemTable Bloom Filter MemTableMemTable Bloom Filter WAL WAL WAL File1 File2 File3 File4 File5 File6 Files
  • 17. Read Path Read Request Memory Persistent Storage Active MemTable Bloom Filter MemTableMemTable Bloom Filter WAL WAL WAL File1 File2 File3 File4 File5 File6 Files Index and Bloom Filters cached In Memory File 1 File 3
  • 18. How LSM/RocksDB works INSERT INTO message (user_id) VALUES (31); INSERT INTO message (user_id) VALUES (99999); INSERT INTO message (user_id) VALUES (10000); WAL/ MemTable 10000 99999 31 existing SSTs 10 1 150 50000 … 55000 50001 55700 110000 … …. 31 10000 99999 New SST Compaction Writing new SSTs sequentially N random row modifications => A few sequential reads & writes 10 1 31 … 150 10000 55000 50001 55700 … 99999
  • 19. LSM handles compression better 16MB SST File 5MB SST File Block Block Row Row Row Row Row Row Row Row => Aligned to OS sector (4KB unit) => Negligible OS page alignment overhead
  • 20. Reducing Space/Write Amplification Append Only Prefix Key Encoding Zero-Filling metadata WAL/ Memstore Row Row Row Row Row Row sst id1 id2 id3 100 200 1 100 200 2 100 200 3 100 200 4 id1 id2 id3 100 200 1 2 3 4 key value seq id flag k1 v1 1234561 W k2 v2 1234562 W k3 v3 1234563 W k4 v4 1234564 W Seq id is 7 bytes in RocksDB. After compression, “0” uses very little space key value seq id flag k1 v1 0 W k2 v2 0 W k3 v3 0 W k4 v4 0 W
  • 21. LSM Compaction Algorithm -- Level ▪ For each level, data is sorted by key ▪ Read Amplification: 1 ~ number of levels (depending on cache -- L0~L3 are usually cached) ▪ Write Amplification: 1 + 1 + fanout * (number of levels – 2) / 2 ▪ Space Amplification: 1.11 ▪ 11% is much smaller than B+Tree’s fragmentation
  • 22. Read Penalty on LSM MemTable L0 L1 L2 L3 RocksDBInnoDB SELECT id1, id2, time FROM t WHERE id1=100 AND id2=100 ORDER BY time DESC LIMIT 1000; Index on (id1, id2, time) Branch Leaf Range Scan with covering index is done by just reading leaves sequentially, and ordering is guaranteed (very efficient) Merge is needed to do range scan with ORDER BY (L0-L2 are usually cached, but in total it needs more CPU cycles than InnoDB)
  • 23. Bloom Filter MemTable L0 L1 L2 L3 KeyMayExist(id=31) ? false false Checking key may exist or not without reading data, and skipping read i/o if it definitely does not exist
  • 24. Column Family Query atomicity across different key spaces. ▪ Column families: ▪ Separate MemTables and SST files ▪ Share transactional logs WAL (shared) CF1 Active MemTable CF2 Active MemTable CF1 Active MemTable CF1 Read Only MemTable(s) CF1 Active MemTable CF2 Read Only MemTable(s) File File File CF1 Persistent Files File File File CF2 Persistent Files CF1 Compaction CF2 Compaction
  • 25. What is MyRocks ▪ MySQL on top of RocksDB (RocksDB storage engine) ▪ Taking both LSM advantages and MySQL features ▪ LSM advantage: Smaller space and lower write amplification ▪ MySQL features: SQL, Replication, Connectors and many tools ▪ Open Source, distributed from MariaDB and Percona as well MySQL Clients InnoDB RocksDB Parser Optimizer Replication etc SQL/Connector MySQL http://myrocks.io/
  • 26. MyRocks features ▪ Clustered Index (same as InnoDB) ▪ Transactions, including consistency between binlog and RocksDB ▪ Faster data loading, deletes and replication ▪ Dynamic Options ▪ TTL ▪ Crash Safety ▪ Online logical and binary backup
  • 27. Performance (LinkBench) ▪ Space Usage ▪ QPS ▪ Flash writes per query ▪ HDD ▪ http://smalldatum.blogspot.com/2016/01/myrocks-vs-innodb-with-linkbench-over-7.html
  • 29. QPS
  • 32. Getting Started ▪ Downloading MySQL with MyRocks support (MariaDB, Percona Server) ▪ Configuring my.cnf ▪ Installing MySQL and initializing data directory ▪ Starting mysqld ▪ Creating and manipulating some tables ▪ Shutting down mysqld ▪ https://github.com/facebook/mysql-5.6/wiki/Getting-Started-with-MyRocks
  • 33. my.cnf (minimal configuration) ▪ It’s generally not recommended mixing multiple transactional storage engines within the same instance ▪ Not transactional across engines, Not tested enough ▪ Add “allow-multiple-engines” in my.cnf, if you really want to mix InnoDB and MyRocks [mysqld] rocksdb default-storage-engine=rocksdb skip-innodb default-tmp-storage-engine=MyISAM collation-server=latin1_bin (or utf8_bin, binary) log-bin binlog-format=ROW
  • 34. Creating tables ▪ “ENGINE=RocksDB” is a syntax to create RocksDB tables ▪ Setting “default-storage-engine=RocksDB” in my.cnf is fine too ▪ It is generally recommended to have a PRIMARY KEY, though MyRocks allows tables without PRIMARY KEYs ▪ Tables are automatically compressed, without any configuration in DDL. Compression algorithm can be configured via my.cnf CREATE TABLE t ( id INT PRIMARY KEY, value1 INT, value2 VARCHAR (100), INDEX (value1) ) ENGINE=RocksDB COLLATE latin1_bin;
  • 35. MyRocks in Production at Facebook
  • 36. MyRocks Goals InnoDB in main database 90% SpaceIOCPU Machine limit 15%20% MyRocks in main database 45% SpaceIOCPU Machine limit 12%18% 18% 12% 45%
  • 37. MyRocks goals (more details) ▪ Smaller space usage ▪ 50% compared to compressed InnoDB at FB ▪ Better write amplification ▪ Can use more affordable flash storage ▪ Fast, and small enough CPU usage with general purpose workloads ▪ Large enough data that don’t fit in RAM ▪ Point lookup, range scan, full index/table scan ▪ Insert, update, delete by point/range, and occasional bulk inserts ▪ Same or even smaller CPU usage compared to InnoDB at FB ▪ Make it possible to consolidate 2x more instances per machine
  • 38. Facebook MySQL related teams ▪ Production Engineering (SRE/PE) ▪ MySQL Production Engineering ▪ Data Performance ▪ Software Engineering (SWE) ▪ RocksDB Engineering ▪ MySQL Engineering ▪ MyRocks ▪ Others (Replication, Client, InnoDB, etc)
  • 39. Relationship between MySQL PE and SWE ▪ SWE does MySQL server upgrade ▪ Upgrading MySQL revision with several FB patches ▪ Hot fixes ▪ SWE participates in PE oncall ▪ Investigating issues that might have been caused by MySQL server codebase ▪ Core files analysis ▪ Can submit diffs each other ▪ “Hackaweek” to work on shorter term tasks for a week
  • 40. MyRocks migration -- Technical Challenges ▪ Initial Migration ▪ Creating MyRocks instances without downtime ▪ Loading into MyRocks tables within reasonable time ▪ Verifying data consistency between InnoDB and MyRocks ▪ Continuous Monitoring ▪ Resource Usage like space, iops, cpu and memory ▪ Query plan outliers ▪ Stalls and crashes
  • 41. MyRocks migration -- Technical Challenges (2) ▪ When running MyRocks on master ▪ RBR (Row based binary logging) ▪ Removing queries relying on InnoDB Gap Lock ▪ Robust XA support (binlog and RocksDB)
  • 42. Creating first MyRocks instance without downtime ▪ Picking one of the InnoDB slave instances, then starting logical dump and restore ▪ Stopping one slave does not affect services Master (InnoDB) Slave1 (InnoDB) Slave2 (InnoDB) Slave3 (InnoDB) Slave4 (MyRocks) Stop & Dump & Load
  • 43. Faster Data Loading Normal Write Path in MyRocks/RocksDB Write Requests MemTableWAL Level 0 SST Level 1 SST Level max SST …. Flush Compaction Compaction Faster Write Path Write Requests Level max SST “SET SESSION rocksdb_bulk_load=1;” Original data must be sorted by primary key
  • 44. Data migration steps ▪ Dst) Create table … ENGINE=ROCKSDB; (creating MyRocks tables with proper column families) ▪ Dst) ALTER TABLE DROP INDEX; (dropping secondary keys) ▪ Src) STOP SLAVE; ▪ mysqldump –host=innodb-host --order-by-primary | mysql –host=myrocks-host –init- command=“set sql_log_bin=0; set rocksdb_bulk_load=1” ▪ Dst) ALTER TABLE ADD INDEX; (adding secondary keys) ▪ Src, Dst) START SLAVE;
  • 45. Data Verification ▪ MyRocks/RocksDB is relatively new database technology ▪ Might have more bugs than robust InnoDB ▪ Ensuring data consistency helps avoid showing conflicting results
  • 46. Verification tests ▪ Index count check between primary key and secondary keys ▪ If any index is broken, it can be detected ▪ SELECT ‘PRIMARY’, COUNT(*) FROM t FORCE INDEX (PRIMARY) UNION SELECT ‘idx1’, COUNT(*) FROM t FORCE INDEX (idx1) ▪ Can’t be used if there is no secondary key ▪ Index stats check ▪ Checking if “rows” show SHOW TABLE STATUS is not far different from actual row count ▪ Checksum tests w/ InnoDB ▪ Comparing between InnoDB instance and MyRocks instance ▪ Creating a transaction consistent snapshot at the same GTID position, scan, then compare checksum ▪ Shadow correctness check ▪ Capturing read traffics
  • 47. Shadow traffic tests ▪ We have a shadow test framework ▪ MySQL audit plugin to capture read/write queries from production instances ▪ Replaying them into shadow master instances ▪ Shadow master tests ▪ Client errors ▪ Rewriting queries relying on Gap Lock ▪ gap_lock_raise_error=1, gap_lock_write_log=1
  • 48. Creating second MyRocks instance without downtime Master (InnoDB) Slave1 (InnoDB) Slave2 (InnoDB) Slave3 (MyRocks) Slave4 (MyRocks) myrocks_hotbackup (Online binary backup)
  • 49. Promoting MyRocks as a master Master (MyRocks) Slave1 (InnoDB) Slave2 (InnoDB) Slave3 (InnoDB) Slave4 (MyRocks)
  • 50. Crash Safety ▪ Crash Safety makes operations much easier ▪ Just restarting failed instances can restart replication ▪ No need to rebuild entire instances, as long as data is there ▪ Crash Safe Slave ▪ Crash Safe Master ▪ 2PC (binlog and WAL)
  • 51. Master crash safety settings sync-binlog rocksdb-flush-log-at- trx-commit rocksdb-enable- 2pc rocksdb-wal- recovery-mode No data loss on unplanned machine reboot 1 (default) 1 (default) 1 (default) 1 (default) No data loss on mysqld crash & recovery 0 2 1 2 No data loss if always failover 0 2 0 2
  • 52. Notable issues fixed during migration ▪ Lots of “Snapshot Conflict” errors ▪ Because of implementation differences in MyRocks (PostgreSQL Style snapshot isolation) ▪ Setting tx-isolation=READ-COMMITTED solved the problem ▪ Slaves stopped with I/O errors on reads ▪ We switched to make MyRocks abort on I/O errors for both reads and writes ▪ Index statistics bugs ▪ Inaccurate row counts/length on bulk loading -> fixed ▪ Cardinality was not updated on fast index creation -> fixed ▪ Crash safety bugs and some crash bugs in MyRocks/RocksDB
  • 53. Preventing stalls ▪ Heavy writes cause lots of compactions, which may cause stalls ▪ Typical write stall cases ▪ Online schema change ▪ Massive data migration jobs that write lots of data ▪ Use fast data loading whenever possible ▪ InnoDB to MyRocks migration can utilize this technique ▪ Can be harder for data migration jobs that write into existing tables
  • 54. When write stalls happen ▪ Estimated number of pending compaction bytes exceeded X bytes ▪ soft|hard_pending_compaction_bytes, default 64GB ▪ Number of L0 files exceeded level0_slowdown|stop_writes_trigger (default 10) ▪ Number of unflushed number of MemTables exceeded max_write_buffer_number (default 4) ▪ All of these incidents are written to LOG as WARN level ▪ All of these options apply to each column family
  • 55. What happens on write stalls ▪ Soft stalls ▪ COMMIT takes longer time than usual ▪ Total estimated written bytes to MemTable is capped to rocksdb_delayed_write_rate, until slowdown conditions are cleared ▪ Default is 16MB/s (previously 2MB/s) ▪ Hard stalls ▪ All writes are blocked at COMMIT, until stop conditions are cleared
  • 56. Mitigating Write Stalls ▪ Speed up compactions ▪ Use faster compression algorithm (LZ4 for higher levels, ZSTD in the bottommost) ▪ Increase rocksdb_max_background_compactions ▪ Reduce total bytes written ▪ But avoid using too strong compression algorithm on upper levels ▪ Use more write efficient compaction algorithm ▪ compaction_pri=kMinOverlappingRatio ▪ Delete files slowly on Flash ▪ Deleting too many large files cause TRIM stalls on Flash ▪ MyRocks has an option to throttle sst file deletion speed ▪ Binlog file deletions should be slowed down as well
  • 57. Monitoring ▪ MyRocks files ▪ SHOW ENGINE ROCKSDB STATUS ▪ SHOW ENGINE ROCKSDB TRANSACTION STATUS ▪ LOG files ▪ information_schema tables ▪ sst_dump tool ▪ ldb tool
  • 58. SHOW ENGINE ROCKSDB STATUS ▪ Column Family Statistics, including size, read and write amp per level ▪ Memory usage *************************** 7. row *************************** Type: CF_COMPACTION Name: default Status: ** Compaction Stats [default] ** Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop --------------------------------------------------------------------------------------------------------------------------------------------------------------------- L0 2/0 51.58 0.5 0.0 0.0 0.0 0.3 0.3 0.0 0.0 0.0 40.3 7 10 0.669 0 0 L3 6/0 109.36 0.9 0.7 0.7 0.0 0.6 0.6 0.0 0.9 43.8 40.7 16 3 5.172 7494K 297K L4 61/0 1247.31 1.0 2.0 0.3 1.7 2.0 0.2 0.0 6.9 49.7 48.5 41 9 4.593 15M 176K L5 989/0 12592.86 1.0 2.0 0.3 1.8 1.9 0.1 0.0 7.4 8.1 7.4 258 8 32.209 17M 726K L6 4271/0 127363.51 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 Sum 5329/0 141364.62 0.0 4.7 1.2 3.5 4.7 1.2 0.0 17.9 15.0 15.0 321 30 10.707 41M 1200K
  • 59. SHOW GLOBAL STATUS mysql> show global status like 'rocksdb%'; +---------------------------------------+-------------+ | Variable_name | Value | +---------------------------------------+-------------+ | rocksdb_rows_deleted | 216223 | | rocksdb_rows_inserted | 1318158 | | rocksdb_rows_read | 7102838 | | rocksdb_rows_updated | 1997116 | .... | rocksdb_bloom_filter_prefix_checked | 773124 | | rocksdb_bloom_filter_prefix_useful | 308445 | | rocksdb_bloom_filter_useful | 10108448 | ....
  • 60. Kernel VM allocation stalls ▪ RocksDB has limited support for O_DIRECT ▪ Too many buffered writes may cause “VM allocation stalls” on older Linux kernels ▪ Some queries may take more than 1s because of this ▪ Using more %sys ▪ Linux Kernel 4.6 is much better for handling buffered writes ▪ For details: https://lwn.net/Articles/704739/
  • 61. More information ▪ MyRocks home, documentation: ▪ http://myrocks.io ▪ https://github.com/facebook/mysql-5.6/wiki ▪ Source Repo: https://github.com/facebook/mysql-5.6 ▪ Bug Reports: https://github.com/facebook/mysql-5.6/issues ▪ Forum: https://groups.google.com/forum/#!forum/myrocks-dev
  • 62. Summary ▪ We started migrating from InnoDB to MyRocks in our main database ▪ We run both master and slaves in production ▪ Major motivation was to save space ▪ Online data correctness check tool helped to find lots of data integrity bugs and prevented from deploying inconsistent instances in production ▪ Bulk loading greatly reduced compaction stalls ▪ Transactions and crash safety are supported
  • 63. (c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0