SlideShare a Scribd company logo
1 of 19
Download to read offline
Linux IO internals
for database administrators
Ilya Kosmodemiansky
ik@dataegret.com
1 Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kernel
developers (for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
2 The main IO problem for databases
• How to maximize page throughput between memory and
disks
• Things involved:
Disks
Memory
CPU
IO Schedulers
Filesystems
Database itself
• IO problems for databases are not always only about disks
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
3 A typical database
DRAM
Disks
Shared memory
Database
Linux
Page cache
User space
Kernel space
WAL buffer
WAL Datafile
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
4 Key things about such workload
• Shared memory segment can be very large
• Keeping in-memory pages synchronized with disk generates
huge IO
• WAL should be written fast and safe
• One and every layer of OS IO stack involved
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
5 Memory allocation and mapping
CPU L1
MMU TLB
Memory
L2 L3
page
table
Virtual addressing
Translation
Physical addressing
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
6 DBAs takeaways:
• Database with huge shared memory segment benefits from
huge pages
• But not from transparent huge pages
Databases operate large continuous shared memory segments
THP defragmentation can lead to severe performance
degradation in such cases
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
7 Freeing memory
Page cache
Swap
Free list Free list Free list
alloc()free()
page-out
vm.min_free_kbytes
reclaim
vm.swapiness
OOM-killer
vm.panic_on_oom
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
8 Page-out
• Page-out happens if:
someone calls fsync
30 sec timeout exceeded (vm.dirty_expire_centisecs)
Too many dirty pages (vm.dirty_background_ratio and
vm.dirty_ratio)
• It is reasonable to tune vm.dirty_* on rotating disks with
RAID controller, but helps a little on server class SSDs
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
9 DBAs takeaways:
• vm.overcommit_memory
0 - heuristic overcommit, reduces swap usage
1 - always overcommit
2 - do not overcommit (vm.overcommit_ratio = 50 by
default)
• vm.min_free_kbytes - reasonably high (can be easy 1000000
on a server with enough memory)
• vm.swappiness = 1
0 - swap disabled
60 - default
100 - swap preffered instead of other reaping mechanisms
• Your database would not like OOM-killer
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
10 OOM-killer: plague and cholera
• vm.panic_on_oom effectively disables OOM-killer, but that is
probably not the result you desire
• Or for a certain process: echo -17 > /proc/12465/oom_adj
but again
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
11 Filesystems: write barriers
Journal
Kernel buffer
Journalated fylesystem
Data
Journal entry
Data
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
12 DBAs takeaways:
• ext4 or xfs
• Disable write barrier (only if SSD/controller cache is protected
by capacitor/battery)
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
13 IO stack
Database memory
Page cacheVFS
EXT4
Block device interface
Disks
Direct IO
BIO Layer
Block IO
Request Layer Elevator/IO Scheduler
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
14 Elevators: before 2.6 kernel
• Linus Elevator - the only one in times of 2.4
• merging and sorting request queues
• Had lots of problems
• The Linux Scheduler: a Decade of Wasted Cores – Lozi et al.
2016
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
15 Elevators: between 2.6 and early 3.*
• CFQ - universal, default one
• deadline - rotating disks
• noop or none - then disks throughput is so high, that it can
not benefit from keen scheduling
PCIe SSDs
SAN disk arrays
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
16 Elevators: 3.13 and newer
• Effectiveness of noop clearly shows ineffectiveness of others,
or ineffectiveness of smart sorting as an approach
• blk-mq scheduler was merged into 3.13 kernel
• Much better deals with parallelism of modern SSD - basically
separate IO queue for each CPU
• The best option for good SSDs right now
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
17 Thanks
to my collegues Alexey Lesovsky and Max Boguk for a lot of
research on this topic
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
18 Questions?
ik@dataegret.com
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com

More Related Content

What's hot

Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryChris Nauroth
 
OSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesOSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesNETWAYS
 
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network BandwidthLeveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network BandwidthPerforce
 
Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildTim Vaillancourt
 
Best Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on SolarisBest Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on SolarisJignesh Shah
 
SOUG_Deployment__Automation_DB
SOUG_Deployment__Automation_DBSOUG_Deployment__Automation_DB
SOUG_Deployment__Automation_DBUniFabric
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014marvin herrera
 
Gluster the ugly parts with Jeff Darcy
Gluster  the ugly parts with Jeff DarcyGluster  the ugly parts with Jeff Darcy
Gluster the ugly parts with Jeff DarcyGluster.org
 
Tuning DB2 in a Solaris Environment
Tuning DB2 in a Solaris EnvironmentTuning DB2 in a Solaris Environment
Tuning DB2 in a Solaris EnvironmentJignesh Shah
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDSDenish Patel
 
MongoDB and server performance
MongoDB and server performanceMongoDB and server performance
MongoDB and server performanceAlon Horev
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldJignesh Shah
 
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...Tim Vaillancourt
 
Postgres on OpenStack
Postgres on OpenStackPostgres on OpenStack
Postgres on OpenStackEDB
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)MongoDB
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Denish Patel
 
Deploying ssd in the data center 2014
Deploying ssd in the data center 2014Deploying ssd in the data center 2014
Deploying ssd in the data center 2014Howard Marks
 
Accomplishing redundancy on Lustre based PFS with DRBD
Accomplishing redundancy on Lustre based PFS with DRBDAccomplishing redundancy on Lustre based PFS with DRBD
Accomplishing redundancy on Lustre based PFS with DRBDTyrone Systems
 
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun DuynsteeSolr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun Duynsteelucenerevolution
 

What's hot (20)

Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and Memory
 
OSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesOSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin Charles
 
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network BandwidthLeveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
 
Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the Wild
 
Best Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on SolarisBest Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on Solaris
 
SOUG_Deployment__Automation_DB
SOUG_Deployment__Automation_DBSOUG_Deployment__Automation_DB
SOUG_Deployment__Automation_DB
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
 
Gluster the ugly parts with Jeff Darcy
Gluster  the ugly parts with Jeff DarcyGluster  the ugly parts with Jeff Darcy
Gluster the ugly parts with Jeff Darcy
 
Tuning DB2 in a Solaris Environment
Tuning DB2 in a Solaris EnvironmentTuning DB2 in a Solaris Environment
Tuning DB2 in a Solaris Environment
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDS
 
MongoDB and server performance
MongoDB and server performanceMongoDB and server performance
MongoDB and server performance
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
 
Postgres on OpenStack
Postgres on OpenStackPostgres on OpenStack
Postgres on OpenStack
 
PostgreSQL on Solaris
PostgreSQL on SolarisPostgreSQL on Solaris
PostgreSQL on Solaris
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
 
Deploying ssd in the data center 2014
Deploying ssd in the data center 2014Deploying ssd in the data center 2014
Deploying ssd in the data center 2014
 
Accomplishing redundancy on Lustre based PFS with DRBD
Accomplishing redundancy on Lustre based PFS with DRBDAccomplishing redundancy on Lustre based PFS with DRBD
Accomplishing redundancy on Lustre based PFS with DRBD
 
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun DuynsteeSolr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
 

Similar to Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016PostgreSQL-Consulting
 
The Ultimate IBM and Lotus on Linux Workshop for Windows Admins
The Ultimate IBM and Lotus on Linux Workshop for Windows AdminsThe Ultimate IBM and Lotus on Linux Workshop for Windows Admins
The Ultimate IBM and Lotus on Linux Workshop for Windows AdminsBill Malchisky Jr.
 
MySQL Performance Tuning
MySQL Performance TuningMySQL Performance Tuning
MySQL Performance TuningFromDual GmbH
 
The 5 Minute MySQL DBA
The 5 Minute MySQL DBAThe 5 Minute MySQL DBA
The 5 Minute MySQL DBAIrawan Soetomo
 
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Perforce
 
Handling Massive Writes
Handling Massive WritesHandling Massive Writes
Handling Massive WritesLiran Zelkha
 
Linux操作系统01 简介
Linux操作系统01 简介Linux操作系统01 简介
Linux操作系统01 简介lclsg123
 
Docker and the Linux Kernel
Docker and the Linux KernelDocker and the Linux Kernel
Docker and the Linux KernelDocker, Inc.
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Bridging the Developer and the Datacenter
Bridging the Developer and the DatacenterBridging the Developer and the Datacenter
Bridging the Developer and the Datacenterlurs83
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute ClusterRamsay Key
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2ScribbleLive
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]shuwutong
 
Linux Administrator - The Linux Course on Eduonix
Linux Administrator - The Linux Course on EduonixLinux Administrator - The Linux Course on Eduonix
Linux Administrator - The Linux Course on EduonixPaddy Lock
 
Systems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudSystems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudBrendan Gregg
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structuresconfluent
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the LogBen Stopford
 

Similar to Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version) (20)

Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016
 
Running MySQL on Linux
Running MySQL on LinuxRunning MySQL on Linux
Running MySQL on Linux
 
The Ultimate IBM and Lotus on Linux Workshop for Windows Admins
The Ultimate IBM and Lotus on Linux Workshop for Windows AdminsThe Ultimate IBM and Lotus on Linux Workshop for Windows Admins
The Ultimate IBM and Lotus on Linux Workshop for Windows Admins
 
MySQL Performance Tuning
MySQL Performance TuningMySQL Performance Tuning
MySQL Performance Tuning
 
The 5 Minute MySQL DBA
The 5 Minute MySQL DBAThe 5 Minute MySQL DBA
The 5 Minute MySQL DBA
 
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale
 
Handling Massive Writes
Handling Massive WritesHandling Massive Writes
Handling Massive Writes
 
Linux操作系统01 简介
Linux操作系统01 简介Linux操作系统01 简介
Linux操作系统01 简介
 
Docker and the Linux Kernel
Docker and the Linux KernelDocker and the Linux Kernel
Docker and the Linux Kernel
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Scalability
ScalabilityScalability
Scalability
 
Bridging the Developer and the Datacenter
Bridging the Developer and the DatacenterBridging the Developer and the Datacenter
Bridging the Developer and the Datacenter
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute Cluster
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
 
Linux Administrator - The Linux Course on Eduonix
Linux Administrator - The Linux Course on EduonixLinux Administrator - The Linux Course on Eduonix
Linux Administrator - The Linux Course on Eduonix
 
Systems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the CloudSystems Performance: Enterprise and the Cloud
Systems Performance: Enterprise and the Cloud
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the Log
 

More from PostgreSQL-Consulting

Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...PostgreSQL-Consulting
 
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL-Consulting
 
10 things, an Oracle DBA should care about when moving to PostgreSQL
10 things, an Oracle DBA should care about when moving to PostgreSQL10 things, an Oracle DBA should care about when moving to PostgreSQL
10 things, an Oracle DBA should care about when moving to PostgreSQLPostgreSQL-Consulting
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaAutovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaPostgreSQL-Consulting
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL-Consulting
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
 
Как PostgreSQL работает с диском
Как PostgreSQL работает с дискомКак PostgreSQL работает с диском
Как PostgreSQL работает с дискомPostgreSQL-Consulting
 
Максим Богук. Postgres-XC
Максим Богук. Postgres-XCМаксим Богук. Postgres-XC
Максим Богук. Postgres-XCPostgreSQL-Consulting
 
Илья Космодемьянский. Использование очередей асинхронных сообщений с PostgreSQL
Илья Космодемьянский. Использование очередей асинхронных сообщений с PostgreSQLИлья Космодемьянский. Использование очередей асинхронных сообщений с PostgreSQL
Илья Космодемьянский. Использование очередей асинхронных сообщений с PostgreSQLPostgreSQL-Consulting
 

More from PostgreSQL-Consulting (13)

Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
 
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
 
10 things, an Oracle DBA should care about when moving to PostgreSQL
10 things, an Oracle DBA should care about when moving to PostgreSQL10 things, an Oracle DBA should care about when moving to PostgreSQL
10 things, an Oracle DBA should care about when moving to PostgreSQL
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaAutovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQ
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
Pgconfru 2015 kosmodemiansky
Pgconfru 2015 kosmodemianskyPgconfru 2015 kosmodemiansky
Pgconfru 2015 kosmodemiansky
 
Как PostgreSQL работает с диском
Как PostgreSQL работает с дискомКак PostgreSQL работает с диском
Как PostgreSQL работает с диском
 
Kosmodemiansky wr 2013
Kosmodemiansky wr 2013Kosmodemiansky wr 2013
Kosmodemiansky wr 2013
 
Максим Богук. Postgres-XC
Максим Богук. Postgres-XCМаксим Богук. Postgres-XC
Максим Богук. Postgres-XC
 
Иван Фролков. Tricky SQL
Иван Фролков. Tricky SQLИван Фролков. Tricky SQL
Иван Фролков. Tricky SQL
 
Илья Космодемьянский. Использование очередей асинхронных сообщений с PostgreSQL
Илья Космодемьянский. Использование очередей асинхронных сообщений с PostgreSQLИлья Космодемьянский. Использование очередей асинхронных сообщений с PostgreSQL
Илья Космодемьянский. Использование очередей асинхронных сообщений с PostgreSQL
 

Recently uploaded

Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
Industrial Applications of Centrifugal Compressors
Industrial Applications of Centrifugal CompressorsIndustrial Applications of Centrifugal Compressors
Industrial Applications of Centrifugal CompressorsAlirezaBagherian3
 
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdfDEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdfAkritiPradhan2
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfBalamuruganV28
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodManicka Mamallan Andavar
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Communityprachaibot
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming languageSmritiSharma901052
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 

Recently uploaded (20)

Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
Industrial Applications of Centrifugal Compressors
Industrial Applications of Centrifugal CompressorsIndustrial Applications of Centrifugal Compressors
Industrial Applications of Centrifugal Compressors
 
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdfDEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdf
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Community
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming language
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 

Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2017 version)

  • 1. Linux IO internals for database administrators Ilya Kosmodemiansky ik@dataegret.com
  • 2. 1 Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kernel developers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 3. 2 The main IO problem for databases • How to maximize page throughput between memory and disks • Things involved: Disks Memory CPU IO Schedulers Filesystems Database itself • IO problems for databases are not always only about disks dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 4. 3 A typical database DRAM Disks Shared memory Database Linux Page cache User space Kernel space WAL buffer WAL Datafile dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 5. 4 Key things about such workload • Shared memory segment can be very large • Keeping in-memory pages synchronized with disk generates huge IO • WAL should be written fast and safe • One and every layer of OS IO stack involved dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 6. 5 Memory allocation and mapping CPU L1 MMU TLB Memory L2 L3 page table Virtual addressing Translation Physical addressing dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 7. 6 DBAs takeaways: • Database with huge shared memory segment benefits from huge pages • But not from transparent huge pages Databases operate large continuous shared memory segments THP defragmentation can lead to severe performance degradation in such cases dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 8. 7 Freeing memory Page cache Swap Free list Free list Free list alloc()free() page-out vm.min_free_kbytes reclaim vm.swapiness OOM-killer vm.panic_on_oom dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 9. 8 Page-out • Page-out happens if: someone calls fsync 30 sec timeout exceeded (vm.dirty_expire_centisecs) Too many dirty pages (vm.dirty_background_ratio and vm.dirty_ratio) • It is reasonable to tune vm.dirty_* on rotating disks with RAID controller, but helps a little on server class SSDs dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 10. 9 DBAs takeaways: • vm.overcommit_memory 0 - heuristic overcommit, reduces swap usage 1 - always overcommit 2 - do not overcommit (vm.overcommit_ratio = 50 by default) • vm.min_free_kbytes - reasonably high (can be easy 1000000 on a server with enough memory) • vm.swappiness = 1 0 - swap disabled 60 - default 100 - swap preffered instead of other reaping mechanisms • Your database would not like OOM-killer dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 11. 10 OOM-killer: plague and cholera • vm.panic_on_oom effectively disables OOM-killer, but that is probably not the result you desire • Or for a certain process: echo -17 > /proc/12465/oom_adj but again dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 12. 11 Filesystems: write barriers Journal Kernel buffer Journalated fylesystem Data Journal entry Data dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 13. 12 DBAs takeaways: • ext4 or xfs • Disable write barrier (only if SSD/controller cache is protected by capacitor/battery) dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 14. 13 IO stack Database memory Page cacheVFS EXT4 Block device interface Disks Direct IO BIO Layer Block IO Request Layer Elevator/IO Scheduler dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 15. 14 Elevators: before 2.6 kernel • Linus Elevator - the only one in times of 2.4 • merging and sorting request queues • Had lots of problems • The Linux Scheduler: a Decade of Wasted Cores – Lozi et al. 2016 dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 16. 15 Elevators: between 2.6 and early 3.* • CFQ - universal, default one • deadline - rotating disks • noop or none - then disks throughput is so high, that it can not benefit from keen scheduling PCIe SSDs SAN disk arrays dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 17. 16 Elevators: 3.13 and newer • Effectiveness of noop clearly shows ineffectiveness of others, or ineffectiveness of smart sorting as an approach • blk-mq scheduler was merged into 3.13 kernel • Much better deals with parallelism of modern SSD - basically separate IO queue for each CPU • The best option for good SSDs right now dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 18. 17 Thanks to my collegues Alexey Lesovsky and Max Boguk for a lot of research on this topic dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 19. 18 Questions? ik@dataegret.com dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com