SlideShare a Scribd company logo
1 of 34
Download to read offline
Data corruption
Where does it come from and what
can you do about it?
Tomas Vondra
tomas@2ndquadrant.com / tomas@pgaddict.com
data_checksums = on
Well, not quite ...
The larger and older a database is, the
more likely it's corrupted in some way.
Agenda
● What is data corruption?
● Sources of data corruption
● What to do about it
Data corruption
● many possible causes
● hardware
○ disks / memory
○ environment (cosmic rays, temperature, …)
● software
○ bugs in OS (kernel, fs, libc, ...)
○ bugs in database systems / applications
● administrator mistake
○ remove pg_xlog ...
naturally one-off events
cosmic rays = no clue
storage
storage corruption
● this used to be pretty common
○ crappy disks, RAID controllers
○ insufficient power-loss protection
● it got better over time
○ better disks, better RAID controllers
○ when it failed, it failed totally
● now it's getting worse, again :-(
○ crappy disks connected over network
(SAN, EBS, NFS, …)
○ layers of virtualization everywhere
software issues
Operating System
● PostgreSQL is very trusting
○ relies on a lot of stuff
○ assumes it's perfect
● nothing is perfect
○ kernel bugs
○ filesystem bugs
○ glibc bugs (collation updates, …)
○ ...
Collations
● rules for language-specific text sort
○ defined in glibc
○ change rarely, poor versioning
● indexes require text ordering to be stable
● what if you build index and then
○ upgrade to diffferent collations
○ replica has different collations
● ICU solves this
○ reliable versioning
#fsyncgate
● CHECKPOINT
○ flush data from shared_buffers to disc
○ discard old part of WAL
● in case of fsync error, retry
● assumes two things
○ reliable kernel error reporting
○ dirty data are kept in page cache
#fsyncgate
● fact #1: error reporting unreliable
○ behavior depends on kernel version
○ errors may be consumed by someone else
○ fixed in new kernel (>= 4.13)
● fact #2: dirty data are discarded after error
○ retry never really retries the write
○ replaced with elog(PANIC) forcing recovery
● a bunch of bugs in the error handling ;-)
○ error branches are the least tested code
NFS == cosmic rays
database systems
● broken indexes
○ violated constraints
○ index scan misses data, seq scan finds it
● bogus data
○ insufficient UTF-8 validation
○ corrupted varlena headers
● data loss
○ multixact bug
○ inappropriate removal of data
○ data made visible/invisible inappropriately
database bugs
ERROR: invalid memory alloc request size 1073741824
pilot error
DBA mistakes ....
● drop incorrect table
● free space by removing "logs" from pg_xlog
● let's compress everything in DATADIR
● take backups incorrectly
What can you do about it?
Preventing data corruption
● use good hardware
○ good hardware is cheaper than outages
○ no, desktop machines are not good choice
○ ECC RAM is a must
● update regularly
○ minor releases exist for a reason
● test it
○ Are the numbers way too good?
○ What happens in case of power-loss?
○ Does the virtualization honor fsyncs etc.?
Preventing data corruption
● make sure you have backups
○ take them and test them
○ consider higher retention periods
● consider doing extra checks on backups
○ pg_verify_checksums
○ application tests
● "prophylactic" pg_dump is a good idea
○ pg_dump > /dev/null
○ especially when using pg_basebackup
fixing data corruption
So you want to fix in-place ...
There's no universal recipe :-(
Recipe
1) Try restoring from a backup.
○ The backup may be corrupted too.
○ Maybe it'd take too long. (Well, …)
○ ...
2) If you need to fix a corrupted cluster …
○ Always make sure you have a copy.
○ Take detailed notes about each step.
○ Proceed methodically.
Recipe
3) Asses the extent of data corruption
○ Is it just a single object? What object?
○ How many pages/rows/...?
○ ...
4) Ad-hoc recovery steps
○ Rebuild corrupted indexes.
○ Extract as much data as possible. Select "around", use
loop with exception block, …
○ pageinspect is your friend
○ Zero corrupted pages using "dd" etc.
http://thebuild.com/presentations/corruption-war-stories-fosdem-2017.pdf
So, what about data checksums?
data checksums
checksum(page number, page contents)
● available since PostgreSQL 9.3
○ … so all supported versions have them
○ disabled by default
● protects against (some) storage system issues
○ changes to existing pages / torn pages
● has some overhead
○ a couple of %, depends on HW / workload
● correctness vs. availability trade-off
data checksums are not perfect
● can't detect various types of corruption
○ pages written to different files
○ "forgotten" writes
○ truncated files
○ PostgreSQL bugs (before the write)
○ table vs. index mismatch
● may detect (some) memory issues
How do you verify checksums?
● you have to read all the data
● regular SQL queries
○ only active set (but not "hot" data)
● pg_dump
○ no checks for indexes
● pg_basebackup
○ since PostgreSQL 11
● pg_verify_checksums
○ since PostgreSQL 11, offline only
Questions?

More Related Content

Similar to Data corruption

Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
DataStax Academy
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
Tamas K Lengyel
 

Similar to Data corruption (20)

Logging and ranting / Vytis Valentinavičius (Lamoda)
Logging and ranting / Vytis Valentinavičius (Lamoda)Logging and ranting / Vytis Valentinavičius (Lamoda)
Logging and ranting / Vytis Valentinavičius (Lamoda)
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
 
If the data cannot come to the algorithm...
If the data cannot come to the algorithm...If the data cannot come to the algorithm...
If the data cannot come to the algorithm...
 
Pen Testing Development
Pen Testing DevelopmentPen Testing Development
Pen Testing Development
 
MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)
 
OpenNebulaConf2017EU: Torturing OpenNebula for Fun and Profit by Carlo Daffar...
OpenNebulaConf2017EU: Torturing OpenNebula for Fun and Profit by Carlo Daffar...OpenNebulaConf2017EU: Torturing OpenNebula for Fun and Profit by Carlo Daffar...
OpenNebulaConf2017EU: Torturing OpenNebula for Fun and Profit by Carlo Daffar...
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
Mongodb meetup
Mongodb meetupMongodb meetup
Mongodb meetup
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 
Caching in
Caching inCaching in
Caching in
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software uk
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 
Understand and optimize Linux I/O
Understand and optimize Linux I/OUnderstand and optimize Linux I/O
Understand and optimize Linux I/O
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processing
 
Case study of BtrFS: A fault tolerant File system
Case study of BtrFS: A fault tolerant File systemCase study of BtrFS: A fault tolerant File system
Case study of BtrFS: A fault tolerant File system
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
 

More from Tomas Vondra

More from Tomas Vondra (19)

CREATE STATISTICS - What is it for? (PostgresLondon)
CREATE STATISTICS - What is it for? (PostgresLondon)CREATE STATISTICS - What is it for? (PostgresLondon)
CREATE STATISTICS - What is it for? (PostgresLondon)
 
CREATE STATISTICS - what is it for?
CREATE STATISTICS - what is it for?CREATE STATISTICS - what is it for?
CREATE STATISTICS - what is it for?
 
DB vs. encryption
DB vs. encryptionDB vs. encryption
DB vs. encryption
 
PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAltPostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyond
 
Postgresql na EXT3/4, XFS, BTRFS a ZFS
Postgresql na EXT3/4, XFS, BTRFS a ZFSPostgresql na EXT3/4, XFS, BTRFS a ZFS
Postgresql na EXT3/4, XFS, BTRFS a ZFS
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
Novinky v PostgreSQL 9.4 a JSONB
Novinky v PostgreSQL 9.4 a JSONBNovinky v PostgreSQL 9.4 a JSONB
Novinky v PostgreSQL 9.4 a JSONB
 
PostgreSQL performance archaeology
PostgreSQL performance archaeologyPostgreSQL performance archaeology
PostgreSQL performance archaeology
 
Výkonnostní archeologie
Výkonnostní archeologieVýkonnostní archeologie
Výkonnostní archeologie
 
Český fulltext a sdílené slovníky
Český fulltext a sdílené slovníkyČeský fulltext a sdílené slovníky
Český fulltext a sdílené slovníky
 
SSD vs HDD / WAL, indexes and fsync
SSD vs HDD / WAL, indexes and fsyncSSD vs HDD / WAL, indexes and fsync
SSD vs HDD / WAL, indexes and fsync
 
Checkpoint (CSPUG 22.11.2011)
Checkpoint (CSPUG 22.11.2011)Checkpoint (CSPUG 22.11.2011)
Checkpoint (CSPUG 22.11.2011)
 
Čtení explain planu (CSPUG 21.6.2011)
Čtení explain planu (CSPUG 21.6.2011)Čtení explain planu (CSPUG 21.6.2011)
Čtení explain planu (CSPUG 21.6.2011)
 
Replikace (CSPUG 19.4.2011)
Replikace (CSPUG 19.4.2011)Replikace (CSPUG 19.4.2011)
Replikace (CSPUG 19.4.2011)
 
PostgreSQL / Performance monitoring
PostgreSQL / Performance monitoringPostgreSQL / Performance monitoring
PostgreSQL / Performance monitoring
 

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

Data corruption

  • 1. Data corruption Where does it come from and what can you do about it? Tomas Vondra tomas@2ndquadrant.com / tomas@pgaddict.com
  • 4. The larger and older a database is, the more likely it's corrupted in some way.
  • 5. Agenda ● What is data corruption? ● Sources of data corruption ● What to do about it
  • 6. Data corruption ● many possible causes ● hardware ○ disks / memory ○ environment (cosmic rays, temperature, …) ● software ○ bugs in OS (kernel, fs, libc, ...) ○ bugs in database systems / applications ● administrator mistake ○ remove pg_xlog ...
  • 8. cosmic rays = no clue
  • 10. storage corruption ● this used to be pretty common ○ crappy disks, RAID controllers ○ insufficient power-loss protection ● it got better over time ○ better disks, better RAID controllers ○ when it failed, it failed totally ● now it's getting worse, again :-( ○ crappy disks connected over network (SAN, EBS, NFS, …) ○ layers of virtualization everywhere
  • 12. Operating System ● PostgreSQL is very trusting ○ relies on a lot of stuff ○ assumes it's perfect ● nothing is perfect ○ kernel bugs ○ filesystem bugs ○ glibc bugs (collation updates, …) ○ ...
  • 13. Collations ● rules for language-specific text sort ○ defined in glibc ○ change rarely, poor versioning ● indexes require text ordering to be stable ● what if you build index and then ○ upgrade to diffferent collations ○ replica has different collations ● ICU solves this ○ reliable versioning
  • 14. #fsyncgate ● CHECKPOINT ○ flush data from shared_buffers to disc ○ discard old part of WAL ● in case of fsync error, retry ● assumes two things ○ reliable kernel error reporting ○ dirty data are kept in page cache
  • 15. #fsyncgate ● fact #1: error reporting unreliable ○ behavior depends on kernel version ○ errors may be consumed by someone else ○ fixed in new kernel (>= 4.13) ● fact #2: dirty data are discarded after error ○ retry never really retries the write ○ replaced with elog(PANIC) forcing recovery ● a bunch of bugs in the error handling ;-) ○ error branches are the least tested code
  • 17. database systems ● broken indexes ○ violated constraints ○ index scan misses data, seq scan finds it ● bogus data ○ insufficient UTF-8 validation ○ corrupted varlena headers ● data loss ○ multixact bug ○ inappropriate removal of data ○ data made visible/invisible inappropriately
  • 18. database bugs ERROR: invalid memory alloc request size 1073741824
  • 20. DBA mistakes .... ● drop incorrect table ● free space by removing "logs" from pg_xlog ● let's compress everything in DATADIR ● take backups incorrectly
  • 21. What can you do about it?
  • 22. Preventing data corruption ● use good hardware ○ good hardware is cheaper than outages ○ no, desktop machines are not good choice ○ ECC RAM is a must ● update regularly ○ minor releases exist for a reason ● test it ○ Are the numbers way too good? ○ What happens in case of power-loss? ○ Does the virtualization honor fsyncs etc.?
  • 23. Preventing data corruption ● make sure you have backups ○ take them and test them ○ consider higher retention periods ● consider doing extra checks on backups ○ pg_verify_checksums ○ application tests ● "prophylactic" pg_dump is a good idea ○ pg_dump > /dev/null ○ especially when using pg_basebackup
  • 25. So you want to fix in-place ...
  • 26. There's no universal recipe :-(
  • 27. Recipe 1) Try restoring from a backup. ○ The backup may be corrupted too. ○ Maybe it'd take too long. (Well, …) ○ ... 2) If you need to fix a corrupted cluster … ○ Always make sure you have a copy. ○ Take detailed notes about each step. ○ Proceed methodically.
  • 28. Recipe 3) Asses the extent of data corruption ○ Is it just a single object? What object? ○ How many pages/rows/...? ○ ... 4) Ad-hoc recovery steps ○ Rebuild corrupted indexes. ○ Extract as much data as possible. Select "around", use loop with exception block, … ○ pageinspect is your friend ○ Zero corrupted pages using "dd" etc.
  • 30. So, what about data checksums?
  • 31. data checksums checksum(page number, page contents) ● available since PostgreSQL 9.3 ○ … so all supported versions have them ○ disabled by default ● protects against (some) storage system issues ○ changes to existing pages / torn pages ● has some overhead ○ a couple of %, depends on HW / workload ● correctness vs. availability trade-off
  • 32. data checksums are not perfect ● can't detect various types of corruption ○ pages written to different files ○ "forgotten" writes ○ truncated files ○ PostgreSQL bugs (before the write) ○ table vs. index mismatch ● may detect (some) memory issues
  • 33. How do you verify checksums? ● you have to read all the data ● regular SQL queries ○ only active set (but not "hot" data) ● pg_dump ○ no checks for indexes ● pg_basebackup ○ since PostgreSQL 11 ● pg_verify_checksums ○ since PostgreSQL 11, offline only