SlideShare a Scribd company logo
1 of 36
Download to read offline
Copyright©2018 NTT Corp. All Rights Reserved.
Bloat and Fragmentation in PostgreSQL
NTT Open Source Center
Masahiko Sawada
PGConf.ASIA 2018
2Copyright©2018 NTT Corp. All Rights Reserved.
• Heap and Index
• Fragmentation and Bloat
• VACUUM 、 HOT(Heap Only Tuple)
• Clustered Tables
• Future
Index
3Copyright©2018 NTT Corp. All Rights Reserved.
• Almost all objects such as tables and indexes that
PostgreSQL manages consist of pages
• Page size is 8kB by default
• Changing by the configure script
• Block (on disk) = Page (on memory)
Pages
4Copyright©2018 NTT Corp. All Rights Reserved.
Page Layouts (Heaps and Indexes)
Page Header
ItemID ItemIDItemIDItemID
Special Area
TupleTuple
TupleTuple
5Copyright©2018 NTT Corp. All Rights Reserved.
• Combination of block number and offset number
• TID = (123, 86)
• Unique with in a table
TID - Tuple ID
=# SELECT ctid, c FROM test ORDER BY c LIMIT 10;
ctid | c
------------------+----
(88495,130) | 1
(0,1) | 1
(88495,131) | 2
(44247,179) | 2
(0,2) | 2
(88495,132) | 3
(44247,180) | 3
(0,3) | 3
(88495,133) | 4
(44247,181) | 4
(10 rows)
6Copyright©2018 NTT Corp. All Rights Reserved.
• MVCC = Multi Version Concurrency Control
• Concurrency control using multiple versions of row
• Reads don’t block writes
• PostgreSQL’s approach:
• INSERT and UPDATE create the new version of the row
• UPDATE and DELETE don’t immediately remove the old
version of the row
• Tuple visibility is determined by a ‘snapshot’
• Other DBMS might use UNDO log instead
• Old versions of rows have to be removed eventually
• VACUUM
MVCC
7Copyright©2018 NTT Corp. All Rights Reserved.
• Heap Table≒
• User created table, materialized view as well as system catalogs use
heap
• Optimized sequential access and accessed by index lookup
• Heaps have one free space map and one visibility map
Heap
8Copyright©2018 NTT Corp. All Rights Reserved.
• Tracks visibility information per blocks
• 1234_vm
• 2 bits / block
• all-visible bit : Used for Index Only Scans and vacuums
• all-frozen bit : Used for aggressive vacuums
• Isn’t relevant with bloating
Visibility Map
9Copyright©2018 NTT Corp. All Rights Reserved.
• Manage available space in tables or indexes
• 2-level binary tree
• Bottom level store the free space on each page
• Upper levels aggregate information from the lower levels
• Record free space at a granularity of 1/256th of a page
• 1234_fsm
Free Space Map
10Copyright©2018 NTT Corp. All Rights Reserved.
• Fast lookup a row in the table
• Various types of index
• B-tree
• Hash
• GIN
• Gist
• SP-Gist
• BRIN
• Bloom
• Indexes eventually points to somewhere in heap
Index
11Copyright©2018 NTT Corp. All Rights Reserved.
• One of the most popular type of index
• PostgreSQL has B+Tree
• One B+tree node is one page
• Leaf nodes have TIDs pointing to the heap
 Need new version index tuple when a new version is
created
B-tree
12Copyright©2018 NTT Corp. All Rights Reserved.
FRAGMENTATION AND BLOAT
13Copyright©2018 NTT Corp. All Rights Reserved.
• Fragmentation
• As pages split to make room to added to a page, there might be
excessive free space left on the pages
• Bloat
• Tables or indexes gets bigger than its actual size
• Less utilization efficiency of each pages
Fragmentation and Bloat
14Copyright©2018 NTT Corp. All Rights Reserved.
• Garbage collection
• Doesn’t block DELETE, INSERT and UPDATE (of course and
SELECT)
• VACUUM command
• VACUUM FULL is quite different feature
• Batch operation
Vacuum
15Copyright©2018 NTT Corp. All Rights Reserved.
1. Scan heap and collect dead tuples
1. Scanning page can skip using its visibility map
2. Recover dead space in all indexes
3. Recover dead space in heap
4. Loop 1 to 3 until the end of table
5. Truncate the tail of table if possible
Vacuum Processing
16Copyright©2018 NTT Corp. All Rights Reserved.
• Always start at the beginning of table
• Always visit all indexes (at multiple times)
Batch Operation
17Copyright©2018 NTT Corp. All Rights Reserved.
• Threshold-based, automatically vacuum execution
• Could be cancelled by a concurrent conflicting operation
• e.g. ALTER TABLE 、 TRUNCATE
• Vacuum delay is enabled by default
Auto Vacuum
18Copyright©2018 NTT Corp. All Rights Reserved.
• Cost-based vacuum delay
• Sleep vacuum_cost_delay(10 msec) time whenever the
vacuum cost goes above vacuum_cost_limit(200)
• Cost configurations
• vacuum_cost_page_hit (1)
• vacuum_cost_page_miss (10)
• vacuum_cost_page_dirty (20)
• By default, vacuum processes 16GB/h at maximum in
case where every pages don’t hit
Vacuum Delay
19Copyright©2018 NTT Corp. All Rights Reserved.
• Avoid creating new version of index entries
• Depending on updated column being updated
• Chain heap tuples
• Heap tuple chain is pruned opportunistically
HOT(Heap Only Tuple) Update
20Copyright©2018 NTT Corp. All Rights Reserved.
HOT Update & HOT pruning
Page Header
(0,1) (0,3)(0,2)
TupleTuple
Tuple
(0,1) (0,2) (0,3)
Index
Heap
21Copyright©2018 NTT Corp. All Rights Reserved.
HOT Update & HOT pruning
Page Header
(0,1) (0,3)(0,2)
TupleTuple
Tuple
(0,1) (0,2) (0,3)
Index
Heap
1. Update TID = (0,2) → (0,4)
(0,4)
Tuple
22Copyright©2018 NTT Corp. All Rights Reserved.
HOT Update & HOT pruning
Page Header
(0,1) (0,3)(0,2)
TupleTuple
Tuple
(0,1) (0,2) (0,3)
Index
Heap
2. Update TID = (0,4) → (0,5)
(0,4)
Tuple
(0,5)
Tuple
23Copyright©2018 NTT Corp. All Rights Reserved.
HOT Update & HOT pruning
Page Header
(0,1) (0,3)REDIRECTED
TupleTuple
(0,1) (0,2) (0,3)
Index
Heap
3. VACUUM
UNUSED (0,5)
Tuple
24Copyright©2018 NTT Corp. All Rights Reserved.
• Two conditions to use HOT updating
 Enough free space in the same page
 No Indexed column is not updated
• Perform HOT-pruning when the page utilization goes
above 90%
• IMO, the most important feature to prevent the table
bloat
Using HOT Update
25Copyright©2018 NTT Corp. All Rights Reserved.
• Fragmentation
• Free space map is not up-to-date
• Shortage of vacuum
• Concurrent long-transaction
Causes of Bloat
26Copyright©2018 NTT Corp. All Rights Reserved.
• Garbage collection speed < Generating garbage speed
• Solution : making vacuums run more faster
• Decrease vacuum delays
• Increase maintenance_work_mem
• Ideally perform the index vacuum only once
Shortage of Vacuum
27Copyright©2018 NTT Corp. All Rights Reserved.
• “Due to long-running transaction dead spaces are not
reclaimed” by a user
• That’s correct in a sense but this is actually inaccurate:
 Vacuum doesn’t remove row if there is a concurrent
snapshot that can see it, even for a one snapshot!
 This also means these row is NOT dead yet
• A row is dead when no one can see it
Vacuum and Snapshot
28Copyright©2018 NTT Corp. All Rights Reserved.
CLUSTERD TABLES
29Copyright©2018 NTT Corp. All Rights Reserved.
• An order of value on a column matches physical block
order
• pg_stats.correlation
• CLUSTER command
Clustered Table
30Copyright©2018 NTT Corp. All Rights Reserved.
=# SELECT * FROM tt LIMIT 10;
a | b | c
-----+--------+--------
1 | 9988 | 100000
2 | 176 | 99999
3 | 1066 | 99998
4 | 1980 | 99997
5 | 2966 | 99996
6 | 5732 | 99995
7 | 1751 | 99994
8 | 3813 | 99993
9 | 3031 | 99992
10 | 5332 | 99991
(10 rows)
=# SELECT attname, correlation FROM pg_stats WHERE tablename = 'tt';
attname | correlation
-------------+-------------
a | 1
b | 0.00493127
c | -1
(3 rows)
Correlations
31Copyright©2018 NTT Corp. All Rights Reserved.
Performance Impact
=# EXPLAIN (buffers on, analyze on) SELECT * FROM tt WHERE a BETWEEN 200 AND 1000 LIMIT 1000;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Limit (cost=0.29..38.07 rows=889 width=12) (actual time=0.063..1.153 rows=801 loops=1)
Buffers: shared hit=9
-> Index Scan using tt_a on tt (cost=0.29..38.07 rows=889 width=12) (actual time=0.060..0.942
rows=801 loops=1)
Index Cond: ((a >= 200) AND (a <= 1000))
Buffers: shared hit=9
Planning Time: 0.388 ms
Execution Time: 1.380 ms
(7 rows)
=# EXPLAIN (buffers on, analyze on) SELECT * FROM tt WHERE b BETWEEN 200 AND 1000 LIMIT 1000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.29..306.76 rows=1000 width=12) (actual time=0.052..3.176 rows=1000 loops=1)
Buffers: shared hit=992
-> Index Scan using tt_b on tt (cost=0.29..2435.18 rows=7945 width=12) (actual time=0.048..2.797
rows=1000 loops=1)
Index Cond: ((b >= 200) AND (b <= 1000))
Buffers: shared hit=992
Planning Time: 0.749 ms
Execution Time: 3.493 ms
(7 rows)
32Copyright©2018 NTT Corp. All Rights Reserved.
FUTURE
33Copyright©2018 NTT Corp. All Rights Reserved.
• More faster
• Parallelism
• Eager vacuum (retail index deletion)
• Non-batch operation
The Future of Vacuum
34Copyright©2018 NTT Corp. All Rights Reserved.
• Pluggable Storage Engine
• zheap - an UNDO log based new storage engine
 Prevent bloating by UPDATE
Good bye Vacuum ...?
35Copyright©2018 NTT Corp. All Rights Reserved.
• PostgreSQL creates multiple versions of rows within the
same table
• Has to eventually get rid of the unnecessary row versions
• Vacuum and auto vacuum
• Batched garbage collection
• HOT pruning
• Opportunistic garbage collection
• Clustered Table
• Future
• Parallelism and eager vacuum
• zheap
Conclusion
36Copyright©2018 NTT Corp. All Rights Reserved.
THANK YOU!

More Related Content

What's hot

What's hot (20)

Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Tuning Autovacuum in Postgresql
Tuning Autovacuum in PostgresqlTuning Autovacuum in Postgresql
Tuning Autovacuum in Postgresql
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
Webinar: PostgreSQL continuous backup and PITR with Barman
Webinar: PostgreSQL continuous backup and PITR with BarmanWebinar: PostgreSQL continuous backup and PITR with Barman
Webinar: PostgreSQL continuous backup and PITR with Barman
 
PostgreSQL and RAM usage
PostgreSQL and RAM usagePostgreSQL and RAM usage
PostgreSQL and RAM usage
 
Innodb Deep Talk #2 でお話したスライド
Innodb Deep Talk #2 でお話したスライドInnodb Deep Talk #2 でお話したスライド
Innodb Deep Talk #2 でお話したスライド
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
 
PostgreSQL replication
PostgreSQL replicationPostgreSQL replication
PostgreSQL replication
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScale
 
Developing Active-Active Geo-Distributed Apps with Redis
Developing Active-Active Geo-Distributed Apps with RedisDeveloping Active-Active Geo-Distributed Apps with Redis
Developing Active-Active Geo-Distributed Apps with Redis
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
 
MySQL_SQL_Tunning_v0.1.3.docx
MySQL_SQL_Tunning_v0.1.3.docxMySQL_SQL_Tunning_v0.1.3.docx
MySQL_SQL_Tunning_v0.1.3.docx
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
 
New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
 
SQLServer Database Structures
SQLServer Database Structures SQLServer Database Structures
SQLServer Database Structures
 
PostgreSQL9.3 Switchover/Switchback
PostgreSQL9.3 Switchover/SwitchbackPostgreSQL9.3 Switchover/Switchback
PostgreSQL9.3 Switchover/Switchback
 
Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0
 
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
 

Similar to Bloat and Fragmentation in PostgreSQL

Similar to Bloat and Fragmentation in PostgreSQL (20)

Vacuum more efficient than ever
Vacuum more efficient than everVacuum more efficient than ever
Vacuum more efficient than ever
 
What’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributorWhat’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributor
 
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays...
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB  | InfluxDays...Sam Dillard [InfluxData] | Performance Optimization in InfluxDB  | InfluxDays...
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays...
 
Hypothetical Partitioning for PostgreSQL
Hypothetical Partitioning for PostgreSQLHypothetical Partitioning for PostgreSQL
Hypothetical Partitioning for PostgreSQL
 
Les 18 space
Les 18 spaceLes 18 space
Les 18 space
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Timesten Architecture
Timesten ArchitectureTimesten Architecture
Timesten Architecture
 
Optimising code using Span<T>
Optimising code using Span<T>Optimising code using Span<T>
Optimising code using Span<T>
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
 
Architecture & Pitfalls of Logical Replication
Architecture & Pitfalls of Logical ReplicationArchitecture & Pitfalls of Logical Replication
Architecture & Pitfalls of Logical Replication
 
MongoDB World 2019: From Traditional Oil and Gas to Sustainable Energy OR Fro...
MongoDB World 2019: From Traditional Oil and Gas to Sustainable Energy OR Fro...MongoDB World 2019: From Traditional Oil and Gas to Sustainable Energy OR Fro...
MongoDB World 2019: From Traditional Oil and Gas to Sustainable Energy OR Fro...
 
PostgreSQL 10: What to Look For
PostgreSQL 10: What to Look ForPostgreSQL 10: What to Look For
PostgreSQL 10: What to Look For
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
Real time Linux
Real time LinuxReal time Linux
Real time Linux
 
Understanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStore
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 

More from Masahiko Sawada

Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Masahiko Sawada
 

More from Masahiko Sawada (20)

PostgreSQL 15の新機能を徹底解説
PostgreSQL 15の新機能を徹底解説PostgreSQL 15の新機能を徹底解説
PostgreSQL 15の新機能を徹底解説
 
行ロックと「LOG: process 12345 still waiting for ShareLock on transaction 710 afte...
行ロックと「LOG:  process 12345 still waiting for ShareLock on transaction 710 afte...行ロックと「LOG:  process 12345 still waiting for ShareLock on transaction 710 afte...
行ロックと「LOG: process 12345 still waiting for ShareLock on transaction 710 afte...
 
PostgreSQL 15 開発最新情報
PostgreSQL 15 開発最新情報PostgreSQL 15 開発最新情報
PostgreSQL 15 開発最新情報
 
Vacuum徹底解説
Vacuum徹底解説Vacuum徹底解説
Vacuum徹底解説
 
Transparent Data Encryption in PostgreSQL
Transparent Data Encryption in PostgreSQLTransparent Data Encryption in PostgreSQL
Transparent Data Encryption in PostgreSQL
 
PostgreSQL 12の話
PostgreSQL 12の話PostgreSQL 12の話
PostgreSQL 12の話
 
OSS活動のやりがいとそれから得たもの - PostgreSQLコミュニティにて -
OSS活動のやりがいとそれから得たもの - PostgreSQLコミュニティにて -OSS活動のやりがいとそれから得たもの - PostgreSQLコミュニティにて -
OSS活動のやりがいとそれから得たもの - PostgreSQLコミュニティにて -
 
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
 
Database Encryption and Key Management for PostgreSQL - Principles and Consid...
Database Encryption and Key Management for PostgreSQL - Principles and Consid...Database Encryption and Key Management for PostgreSQL - Principles and Consid...
Database Encryption and Key Management for PostgreSQL - Principles and Consid...
 
今秋リリース予定のPostgreSQL11を徹底解説
今秋リリース予定のPostgreSQL11を徹底解説今秋リリース予定のPostgreSQL11を徹底解説
今秋リリース予定のPostgreSQL11を徹底解説
 
Vacuumとzheap
VacuumとzheapVacuumとzheap
Vacuumとzheap
 
アーキテクチャから理解するPostgreSQLのレプリケーション
アーキテクチャから理解するPostgreSQLのレプリケーションアーキテクチャから理解するPostgreSQLのレプリケーション
アーキテクチャから理解するPostgreSQLのレプリケーション
 
Parallel Vacuum
Parallel VacuumParallel Vacuum
Parallel Vacuum
 
PostgreSQLでスケールアウト
PostgreSQLでスケールアウトPostgreSQLでスケールアウト
PostgreSQLでスケールアウト
 
OSS 開発ってどうやっているの? ~ PostgreSQL の現場から~
OSS 開発ってどうやっているの? ~ PostgreSQL の現場から~OSS 開発ってどうやっているの? ~ PostgreSQL の現場から~
OSS 開発ってどうやっているの? ~ PostgreSQL の現場から~
 
PostgreSQL10徹底解説
PostgreSQL10徹底解説PostgreSQL10徹底解説
PostgreSQL10徹底解説
 
FDW-based Sharding Update and Future
FDW-based Sharding Update and FutureFDW-based Sharding Update and Future
FDW-based Sharding Update and Future
 
PostgreSQL 9.6 新機能紹介
PostgreSQL 9.6 新機能紹介PostgreSQL 9.6 新機能紹介
PostgreSQL 9.6 新機能紹介
 
pg_bigmと類似度検索
pg_bigmと類似度検索pg_bigmと類似度検索
pg_bigmと類似度検索
 
pg_bigmを触り始めた人に伝えたいこと
pg_bigmを触り始めた人に伝えたいことpg_bigmを触り始めた人に伝えたいこと
pg_bigmを触り始めた人に伝えたいこと
 

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Recently uploaded (20)

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

Bloat and Fragmentation in PostgreSQL

  • 1. Copyright©2018 NTT Corp. All Rights Reserved. Bloat and Fragmentation in PostgreSQL NTT Open Source Center Masahiko Sawada PGConf.ASIA 2018
  • 2. 2Copyright©2018 NTT Corp. All Rights Reserved. • Heap and Index • Fragmentation and Bloat • VACUUM 、 HOT(Heap Only Tuple) • Clustered Tables • Future Index
  • 3. 3Copyright©2018 NTT Corp. All Rights Reserved. • Almost all objects such as tables and indexes that PostgreSQL manages consist of pages • Page size is 8kB by default • Changing by the configure script • Block (on disk) = Page (on memory) Pages
  • 4. 4Copyright©2018 NTT Corp. All Rights Reserved. Page Layouts (Heaps and Indexes) Page Header ItemID ItemIDItemIDItemID Special Area TupleTuple TupleTuple
  • 5. 5Copyright©2018 NTT Corp. All Rights Reserved. • Combination of block number and offset number • TID = (123, 86) • Unique with in a table TID - Tuple ID =# SELECT ctid, c FROM test ORDER BY c LIMIT 10; ctid | c ------------------+---- (88495,130) | 1 (0,1) | 1 (88495,131) | 2 (44247,179) | 2 (0,2) | 2 (88495,132) | 3 (44247,180) | 3 (0,3) | 3 (88495,133) | 4 (44247,181) | 4 (10 rows)
  • 6. 6Copyright©2018 NTT Corp. All Rights Reserved. • MVCC = Multi Version Concurrency Control • Concurrency control using multiple versions of row • Reads don’t block writes • PostgreSQL’s approach: • INSERT and UPDATE create the new version of the row • UPDATE and DELETE don’t immediately remove the old version of the row • Tuple visibility is determined by a ‘snapshot’ • Other DBMS might use UNDO log instead • Old versions of rows have to be removed eventually • VACUUM MVCC
  • 7. 7Copyright©2018 NTT Corp. All Rights Reserved. • Heap Table≒ • User created table, materialized view as well as system catalogs use heap • Optimized sequential access and accessed by index lookup • Heaps have one free space map and one visibility map Heap
  • 8. 8Copyright©2018 NTT Corp. All Rights Reserved. • Tracks visibility information per blocks • 1234_vm • 2 bits / block • all-visible bit : Used for Index Only Scans and vacuums • all-frozen bit : Used for aggressive vacuums • Isn’t relevant with bloating Visibility Map
  • 9. 9Copyright©2018 NTT Corp. All Rights Reserved. • Manage available space in tables or indexes • 2-level binary tree • Bottom level store the free space on each page • Upper levels aggregate information from the lower levels • Record free space at a granularity of 1/256th of a page • 1234_fsm Free Space Map
  • 10. 10Copyright©2018 NTT Corp. All Rights Reserved. • Fast lookup a row in the table • Various types of index • B-tree • Hash • GIN • Gist • SP-Gist • BRIN • Bloom • Indexes eventually points to somewhere in heap Index
  • 11. 11Copyright©2018 NTT Corp. All Rights Reserved. • One of the most popular type of index • PostgreSQL has B+Tree • One B+tree node is one page • Leaf nodes have TIDs pointing to the heap  Need new version index tuple when a new version is created B-tree
  • 12. 12Copyright©2018 NTT Corp. All Rights Reserved. FRAGMENTATION AND BLOAT
  • 13. 13Copyright©2018 NTT Corp. All Rights Reserved. • Fragmentation • As pages split to make room to added to a page, there might be excessive free space left on the pages • Bloat • Tables or indexes gets bigger than its actual size • Less utilization efficiency of each pages Fragmentation and Bloat
  • 14. 14Copyright©2018 NTT Corp. All Rights Reserved. • Garbage collection • Doesn’t block DELETE, INSERT and UPDATE (of course and SELECT) • VACUUM command • VACUUM FULL is quite different feature • Batch operation Vacuum
  • 15. 15Copyright©2018 NTT Corp. All Rights Reserved. 1. Scan heap and collect dead tuples 1. Scanning page can skip using its visibility map 2. Recover dead space in all indexes 3. Recover dead space in heap 4. Loop 1 to 3 until the end of table 5. Truncate the tail of table if possible Vacuum Processing
  • 16. 16Copyright©2018 NTT Corp. All Rights Reserved. • Always start at the beginning of table • Always visit all indexes (at multiple times) Batch Operation
  • 17. 17Copyright©2018 NTT Corp. All Rights Reserved. • Threshold-based, automatically vacuum execution • Could be cancelled by a concurrent conflicting operation • e.g. ALTER TABLE 、 TRUNCATE • Vacuum delay is enabled by default Auto Vacuum
  • 18. 18Copyright©2018 NTT Corp. All Rights Reserved. • Cost-based vacuum delay • Sleep vacuum_cost_delay(10 msec) time whenever the vacuum cost goes above vacuum_cost_limit(200) • Cost configurations • vacuum_cost_page_hit (1) • vacuum_cost_page_miss (10) • vacuum_cost_page_dirty (20) • By default, vacuum processes 16GB/h at maximum in case where every pages don’t hit Vacuum Delay
  • 19. 19Copyright©2018 NTT Corp. All Rights Reserved. • Avoid creating new version of index entries • Depending on updated column being updated • Chain heap tuples • Heap tuple chain is pruned opportunistically HOT(Heap Only Tuple) Update
  • 20. 20Copyright©2018 NTT Corp. All Rights Reserved. HOT Update & HOT pruning Page Header (0,1) (0,3)(0,2) TupleTuple Tuple (0,1) (0,2) (0,3) Index Heap
  • 21. 21Copyright©2018 NTT Corp. All Rights Reserved. HOT Update & HOT pruning Page Header (0,1) (0,3)(0,2) TupleTuple Tuple (0,1) (0,2) (0,3) Index Heap 1. Update TID = (0,2) → (0,4) (0,4) Tuple
  • 22. 22Copyright©2018 NTT Corp. All Rights Reserved. HOT Update & HOT pruning Page Header (0,1) (0,3)(0,2) TupleTuple Tuple (0,1) (0,2) (0,3) Index Heap 2. Update TID = (0,4) → (0,5) (0,4) Tuple (0,5) Tuple
  • 23. 23Copyright©2018 NTT Corp. All Rights Reserved. HOT Update & HOT pruning Page Header (0,1) (0,3)REDIRECTED TupleTuple (0,1) (0,2) (0,3) Index Heap 3. VACUUM UNUSED (0,5) Tuple
  • 24. 24Copyright©2018 NTT Corp. All Rights Reserved. • Two conditions to use HOT updating  Enough free space in the same page  No Indexed column is not updated • Perform HOT-pruning when the page utilization goes above 90% • IMO, the most important feature to prevent the table bloat Using HOT Update
  • 25. 25Copyright©2018 NTT Corp. All Rights Reserved. • Fragmentation • Free space map is not up-to-date • Shortage of vacuum • Concurrent long-transaction Causes of Bloat
  • 26. 26Copyright©2018 NTT Corp. All Rights Reserved. • Garbage collection speed < Generating garbage speed • Solution : making vacuums run more faster • Decrease vacuum delays • Increase maintenance_work_mem • Ideally perform the index vacuum only once Shortage of Vacuum
  • 27. 27Copyright©2018 NTT Corp. All Rights Reserved. • “Due to long-running transaction dead spaces are not reclaimed” by a user • That’s correct in a sense but this is actually inaccurate:  Vacuum doesn’t remove row if there is a concurrent snapshot that can see it, even for a one snapshot!  This also means these row is NOT dead yet • A row is dead when no one can see it Vacuum and Snapshot
  • 28. 28Copyright©2018 NTT Corp. All Rights Reserved. CLUSTERD TABLES
  • 29. 29Copyright©2018 NTT Corp. All Rights Reserved. • An order of value on a column matches physical block order • pg_stats.correlation • CLUSTER command Clustered Table
  • 30. 30Copyright©2018 NTT Corp. All Rights Reserved. =# SELECT * FROM tt LIMIT 10; a | b | c -----+--------+-------- 1 | 9988 | 100000 2 | 176 | 99999 3 | 1066 | 99998 4 | 1980 | 99997 5 | 2966 | 99996 6 | 5732 | 99995 7 | 1751 | 99994 8 | 3813 | 99993 9 | 3031 | 99992 10 | 5332 | 99991 (10 rows) =# SELECT attname, correlation FROM pg_stats WHERE tablename = 'tt'; attname | correlation -------------+------------- a | 1 b | 0.00493127 c | -1 (3 rows) Correlations
  • 31. 31Copyright©2018 NTT Corp. All Rights Reserved. Performance Impact =# EXPLAIN (buffers on, analyze on) SELECT * FROM tt WHERE a BETWEEN 200 AND 1000 LIMIT 1000; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Limit (cost=0.29..38.07 rows=889 width=12) (actual time=0.063..1.153 rows=801 loops=1) Buffers: shared hit=9 -> Index Scan using tt_a on tt (cost=0.29..38.07 rows=889 width=12) (actual time=0.060..0.942 rows=801 loops=1) Index Cond: ((a >= 200) AND (a <= 1000)) Buffers: shared hit=9 Planning Time: 0.388 ms Execution Time: 1.380 ms (7 rows) =# EXPLAIN (buffers on, analyze on) SELECT * FROM tt WHERE b BETWEEN 200 AND 1000 LIMIT 1000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.29..306.76 rows=1000 width=12) (actual time=0.052..3.176 rows=1000 loops=1) Buffers: shared hit=992 -> Index Scan using tt_b on tt (cost=0.29..2435.18 rows=7945 width=12) (actual time=0.048..2.797 rows=1000 loops=1) Index Cond: ((b >= 200) AND (b <= 1000)) Buffers: shared hit=992 Planning Time: 0.749 ms Execution Time: 3.493 ms (7 rows)
  • 32. 32Copyright©2018 NTT Corp. All Rights Reserved. FUTURE
  • 33. 33Copyright©2018 NTT Corp. All Rights Reserved. • More faster • Parallelism • Eager vacuum (retail index deletion) • Non-batch operation The Future of Vacuum
  • 34. 34Copyright©2018 NTT Corp. All Rights Reserved. • Pluggable Storage Engine • zheap - an UNDO log based new storage engine  Prevent bloating by UPDATE Good bye Vacuum ...?
  • 35. 35Copyright©2018 NTT Corp. All Rights Reserved. • PostgreSQL creates multiple versions of rows within the same table • Has to eventually get rid of the unnecessary row versions • Vacuum and auto vacuum • Batched garbage collection • HOT pruning • Opportunistic garbage collection • Clustered Table • Future • Parallelism and eager vacuum • zheap Conclusion
  • 36. 36Copyright©2018 NTT Corp. All Rights Reserved. THANK YOU!