SlideShare a Scribd company logo
1 of 43
Download to read offline
Managing Terabytes
                          Problems and solutions with
                       operating large Postgres installations
                                Selena Deckelmann
                                   Prime Radiant
                                  @selenamarie
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                         1
               0n
                1c1e
About me.



                          2
                            1c1e
                         re0n
                      Uef2
                    .En
                  Cfo
               Ceon
             ogm
            SP
The Environment

                       • 1.6 TB, 1 cluster,Version 8.2
                       • 1.1 TB, 1 cluster,Version 8.3
                       • 8.4/9.0 Dev systems
                       • Working toward 9.0 into prod (May 2011)
                       • pgpool, Redis, RabbitMQ, NFS
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                           3
               0n
                1c1e
Some stats

                       • daily peak: ~3000 commits per second
                       • average writes: 4 MBps
                       • average reads: 8 MBps
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                           4
               0n
                1c1e
What’s good

                       • Most queries are fast!
                       • Benchmarks say we’re pushing the limits of
                         the hardware
                       • Developers love working with Postgres
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re0n
                1c1e
And lots more. But...
                                        1c1e
                                     re0n
                                  Uef2
                                .En
                              Cfo
                           Ceon
                         ogm
                        SP
1c1e
             re0n
          Uef2
        .En
      Cfo
   Ceon
 ogm
SP
The Problems

                  1. System resource exhaustion
                  2. Everything is slow: Huge catalogs, Backups
                  3. Handling VACUUM problems: Bloat,
                     Transaction wraparound
                  4. Upgrades: Minor, Major
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re0n
                1c1e
System Resource Exhaustion
                                             1c1e
                                          re0n
                                       Uef2
                                     .En
                                   Cfo
                                Ceon
                              ogm
                             SP
Running out of inodes

                       Problem: UFS on Solaris
                       “The only way to add more inodes to a UFS
                       filesystem is: 1. destroy the filesystem and create a
                       new filesystem with a higher inode density 2. enlarge
                       the filesystem - growfs man page”
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                              10
               0n
                1c1e
Running out of inodes

                       Solution 0: Delete files.
                       Solution 1: Sharding/bigger filesystem
                       Solution 2: xfs
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                         11
               0n
                1c1e
Running out of
                           file descriptors
                       Problem: Too many open files
                       by the database.
                       selena@lulu:~ #508 18:43 :)
                       sudo lsof -p 19121 | wc
                           40     355        4151

                       Solution: You need a connection
                       pooler.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                        12
               0n
                1c1e
Running out of
                           file descriptors
                       Solution: You need a connection
                       pooler.
                       Recommended:
                       pgbouncer (threaded, online upgrade)
                       pgpool-II (failover)
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                       13
               0n
                1c1e
Everything is slow.
                                      1c1e
                                   re0n
                                Uef2
                              .En
                            Cfo
                         Ceon
                       ogm
                      SP
Huge Catalogs


                       409,994 tables
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                        15
               0n
                1c1e
Maintenance problem
                       Minor mistake in parent table definitions:

                       not null default
                       nextval('important_sequence'::text)

                       vs

                       not null default
                       nextval('important_sequence'::regclass)
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                           16
               0n
                1c1e
Huge Catalogs
                       Problem: Slow scans of catalog data


                       Solution:
                       Upgrade to Postgres 8.4 or higher


                       But really: Avoid making a cluster with >400k
                       tables.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                           17
               0n
                1c1e
Stats collection

                       9,019,868 total data points for table stats
                       4,550,770 total data points for index stats
                       Problem: This is slow to write.
                       (128 MB written every second or so)
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                            18
               0n
                1c1e
Stats collection

                       9,019,868 total data points for table stats
                       4,550,770 total data points for index stats
                       Soution: Move stats file to RAM.
                       stats_temp_directory    (8.4 or higher)
                       There’s a trivial patch for earlier versions.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                             19
               0n
                1c1e
Stats collection

                       9,019,868 total data points for table stats
                       4,550,770 total data points for index stats
                       Problem: This is slow to read.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                            20
               0n
                1c1e
Stats collection
                       9,019,868 total data points for table stats
                       4,550,770 total data points for index stats
                       Solution:
                       Supposedly, this is better in 8.4 and higher.
                       (fewer writes per minute)
                       Still probably not fast.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                             21
               0n
                1c1e
Backups


                       pg_dump takes longer and longer...
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                       22
               0n
                1c1e
 
                           backup          |   duration
                        -------------------+--------------------
                            2009­11­22     |  02:44:36.821475
                            2009­11­23     |  02:46:20.003507
                            2009­11­24     |  02:47:06.260705
                            2009­12­06     |  07:13:04.174964
                            2009­12­13     |  05:00:01.082676
                            2009­12­20     |  06:24:49.433043
                            2009­12­27     |  05:35:20.551477
                            2010­01­03     |  07:36:49.651492
                            2010­01­10     |  05:55:02.396163
                            2010­01­17     |  07:32:33.277559
                            2010­01­24     |  06:22:46.522319
                            2010­01­31     |  10:48:13.060888
                            2010­02­07     |  21:21:47.77618
                            2010­02­14     |  14:32:04.638267
                            2010­02­21     |  11:34:42.353244
                            2010­02­28     |  11:13:02.102345
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                        23
               0n
                1c1e
Backups
                       Problem: pg_dump is too slow.
                       Solutions:
                       • patching pg_dump for SELECT ... LIMIT
                       • crank down shared_buffers
                       • Stop using pg_dump for backups
                       • 64-bit might help
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                         24
               0n
                1c1e
How not to migrate
                        to a 64-bit system
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                25
               0n
                1c1e
Title Text



                       Install 32-bit Postgres and libraries on a 64-bit system.
                       Install 64-bit Postgres/libs of the same version.
                       Copy “hot backup” from 32-bit sys over to 64-bit sys.
                       Run pg_dump from 64-bit version on 32-bit Postgres.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                                  26
               0n
                1c1e
A single warm standby
                          is not a backup.


                       But lots of people use them that way!
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                          27
               0n
                1c1e
Ship WAL from Solaris x86 -> Linux
                                  It did work!
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                    28
               0n
                1c1e
Handling VACUUM problems
                                           1c1e
                                        re0n
                                     Uef2
                                   .En
                                 Cfo
                              Ceon
                            ogm
                           SP
Bloat

                       Problem: Lots of dead tuples in tables.

                       • Frequent UPDATEs to long tables of log
                          data
                       • Frequent DELETEs without a VACUUM
                       • A terabyte of dead tuples
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                           30
               0n
                1c1e
Fixing bloat

                       Solution: Write custom scripts to clean

                       • VACUUM for small things
                       • CLUSTER for everything else
                       • Considered TRUNCATE
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                         31
               0n
                1c1e
Catalog Bloat

                       Application allowed users to initiate ALTER
                       TABLE.

                       Regular VACUUM couldn’t fix it.
                       VACUUM FULL   of the catalog takes 2+ hours.
                       Use of NOTIFY/LISTEN can also cause bloat.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                             32
               0n
                1c1e
Transaction
                       wraparound avoidance
                       Problem: autovacuum set off too
                       frequently
                       Watch age(datfrozenxid)
                       Solution:
                       Increase autovacuum_freeze_max_age
                       (default is 200 million, we increase to one
                       billion)
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                            33
               0n
                1c1e
Upgrades
                           1c1e
                        re0n
                     Uef2
                   .En
                 Cfo
              Ceon
            ogm
           SP
Minor upgrades

                       Problem: Restarting Postgres causes bad
                       application performance.
                       • Require a start/stop of database
                         • Unexpected CHECKPOINT
                         • Cold cache
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                          35
               0n
                1c1e
Minor upgrades

                       Solutions:
                         • Plan for a CHECKPOINT before
                            shutdown
                         • Warm the cache (Queries that
                            exercise indexes, maybe table scans)
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                          36
               0n
                1c1e
Major Version upgrades

                       Problem: Major upgrades are a PITA.
                         • <8.2 - no pg_upgrade :(
                         • Time your restores.
                         • Document your SLAs.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                         37
               0n
                1c1e
Major Version upgrades

                       Solutions: :(
                         • >=8.3 - pg_upgrade
                         • Time your restores.
                         • Document your SLAs.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                       38
               0n
                1c1e
Major Version upgrades

                       Solutions: :(
                         • Write tools to migrate data
                         • Shard
                         • Trigger-based replication
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                         39
               0n
                1c1e
The Problems

                  1. System resource exhaustion
                  2. Everything is slow: Huge catalogs, Backups
                  3. Handling VACUUM problems: Bloat,
                     Transaction wraparound
                  4. Upgrades: Minor, Major
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re0n
                1c1e
The Solutions

                  1. System resource exhaustion
                     Choose a better filesystem, Pooling
                  2. Everything is slow: Huge catalogs, Backups
                     Don’t do that, Monitor & Binary
                     backups
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re0n
                1c1e
The Solutions

                  3. Handling VACUUM problems: Bloat,
                     Transaction wraparound
                     Developer education, Monitoring,
                     Cleanup, *_max_freeze_age
                  4. Upgrades: Minor, Major
                     Plan, Plan, Plan
                       (CHECKPOINT, warm cache, pg_upgrade)
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re0n
                1c1e
Thanks!
                           Managing Terabytes
                          Problems and solutions with
                       operating large Postgres installations
                                Selena Deckelmann
                                   Prime Radiant
                                  @selenamarie
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                         43
               0n
                1c1e

More Related Content

Viewers also liked

Web Standards Seminar 2006
Web Standards Seminar 2006Web Standards Seminar 2006
Web Standards Seminar 2006Taeyoung Yoon
 
A Critical Analysis Of British Mosques As An
A Critical Analysis Of British Mosques As AnA Critical Analysis Of British Mosques As An
A Critical Analysis Of British Mosques As Anguest0fb60e
 
Kyoto, Japan 京都
 Kyoto, Japan 京都 Kyoto, Japan 京都
Kyoto, Japan 京都nonnon
 
Sida vs Denguecmas
Sida vs DenguecmasSida vs Denguecmas
Sida vs Denguecmaslambert
 
Tehnoloogia Rakendamine
Tehnoloogia RakendamineTehnoloogia Rakendamine
Tehnoloogia Rakendaminekiq
 
De Milieuproblematiek
De MilieuproblematiekDe Milieuproblematiek
De Milieuproblematiekguestb2e54a
 
Использование инструментальных средств для выделения коллокаций в лексикограф...
Использование инструментальных средств для выделения коллокаций влексикограф...Использование инструментальных средств для выделения коллокаций влексикограф...
Использование инструментальных средств для выделения коллокаций в лексикограф...Lidia Pivovarova
 
Presentation1
Presentation1Presentation1
Presentation1Borreke
 
Text Pattern Formation For Information Extraction
Text Pattern Formation For Information ExtractionText Pattern Formation For Information Extraction
Text Pattern Formation For Information ExtractionLidia Pivovarova
 

Viewers also liked (14)

Web Standards Seminar 2006
Web Standards Seminar 2006Web Standards Seminar 2006
Web Standards Seminar 2006
 
Stolyarov
StolyarovStolyarov
Stolyarov
 
Matkalla metaverseen?
Matkalla metaverseen?Matkalla metaverseen?
Matkalla metaverseen?
 
A Critical Analysis Of British Mosques As An
A Critical Analysis Of British Mosques As AnA Critical Analysis Of British Mosques As An
A Critical Analysis Of British Mosques As An
 
Exercici 3
Exercici 3Exercici 3
Exercici 3
 
Kyoto, Japan 京都
 Kyoto, Japan 京都 Kyoto, Japan 京都
Kyoto, Japan 京都
 
Sida vs Denguecmas
Sida vs DenguecmasSida vs Denguecmas
Sida vs Denguecmas
 
Tehnoloogia Rakendamine
Tehnoloogia RakendamineTehnoloogia Rakendamine
Tehnoloogia Rakendamine
 
P4
P4P4
P4
 
De Milieuproblematiek
De MilieuproblematiekDe Milieuproblematiek
De Milieuproblematiek
 
Использование инструментальных средств для выделения коллокаций в лексикограф...
Использование инструментальных средств для выделения коллокаций влексикограф...Использование инструментальных средств для выделения коллокаций влексикограф...
Использование инструментальных средств для выделения коллокаций в лексикограф...
 
Presentation1
Presentation1Presentation1
Presentation1
 
Noches Griegas
Noches GriegasNoches Griegas
Noches Griegas
 
Text Pattern Formation For Information Extraction
Text Pattern Formation For Information ExtractionText Pattern Formation For Information Extraction
Text Pattern Formation For Information Extraction
 

More from Selena Deckelmann

While we're here, let's fix computer science education
While we're here, let's fix computer science educationWhile we're here, let's fix computer science education
While we're here, let's fix computer science educationSelena Deckelmann
 
Mistakes were made - LCA 2012
Mistakes were made - LCA 2012Mistakes were made - LCA 2012
Mistakes were made - LCA 2012Selena Deckelmann
 
Postgres needs an aircraft carrier
Postgres needs an aircraft carrierPostgres needs an aircraft carrier
Postgres needs an aircraft carrierSelena Deckelmann
 
Harder, better, faster, stronger: PostgreSQL 9.1
Harder, better, faster, stronger: PostgreSQL 9.1Harder, better, faster, stronger: PostgreSQL 9.1
Harder, better, faster, stronger: PostgreSQL 9.1Selena Deckelmann
 
Letters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres communityLetters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres communitySelena Deckelmann
 
Own it: working with a changing open source community
Own it: working with a changing open source communityOwn it: working with a changing open source community
Own it: working with a changing open source communitySelena Deckelmann
 
Managing terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigManaging terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigSelena Deckelmann
 
Managing terabytes: When PostgreSQL gets big
Managing terabytes: When PostgreSQL gets bigManaging terabytes: When PostgreSQL gets big
Managing terabytes: When PostgreSQL gets bigSelena Deckelmann
 
How a bunch of normal people Used Technology To Repair a Rigged Election
How a bunch of normal people Used Technology To Repair a Rigged ElectionHow a bunch of normal people Used Technology To Repair a Rigged Election
How a bunch of normal people Used Technology To Repair a Rigged ElectionSelena Deckelmann
 
Open Source Bridge Opening Day
Open Source Bridge Opening DayOpen Source Bridge Opening Day
Open Source Bridge Opening DaySelena Deckelmann
 

More from Selena Deckelmann (20)

While we're here, let's fix computer science education
While we're here, let's fix computer science educationWhile we're here, let's fix computer science education
While we're here, let's fix computer science education
 
Algorithms are Recipes
Algorithms are RecipesAlgorithms are Recipes
Algorithms are Recipes
 
Hire the right way
Hire the right wayHire the right way
Hire the right way
 
Mistakes were made - LCA 2012
Mistakes were made - LCA 2012Mistakes were made - LCA 2012
Mistakes were made - LCA 2012
 
Pg92 HA, LCA 2012, Ballarat
Pg92 HA, LCA 2012, BallaratPg92 HA, LCA 2012, Ballarat
Pg92 HA, LCA 2012, Ballarat
 
Mistakes were made
Mistakes were madeMistakes were made
Mistakes were made
 
Postgres needs an aircraft carrier
Postgres needs an aircraft carrierPostgres needs an aircraft carrier
Postgres needs an aircraft carrier
 
Mistakes were made
Mistakes were madeMistakes were made
Mistakes were made
 
Harder, better, faster, stronger: PostgreSQL 9.1
Harder, better, faster, stronger: PostgreSQL 9.1Harder, better, faster, stronger: PostgreSQL 9.1
Harder, better, faster, stronger: PostgreSQL 9.1
 
How to ask for money
How to ask for moneyHow to ask for money
How to ask for money
 
Letters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres communityLetters from the open source trenches - Postgres community
Letters from the open source trenches - Postgres community
 
Own it: working with a changing open source community
Own it: working with a changing open source communityOwn it: working with a changing open source community
Own it: working with a changing open source community
 
Managing terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigManaging terabytes: When Postgres gets big
Managing terabytes: When Postgres gets big
 
Managing terabytes: When PostgreSQL gets big
Managing terabytes: When PostgreSQL gets bigManaging terabytes: When PostgreSQL gets big
Managing terabytes: When PostgreSQL gets big
 
Pdxpugday2010 pg90
Pdxpugday2010 pg90Pdxpugday2010 pg90
Pdxpugday2010 pg90
 
Making Software Communities
Making Software CommunitiesMaking Software Communities
Making Software Communities
 
Illustrated buffer cache
Illustrated buffer cacheIllustrated buffer cache
Illustrated buffer cache
 
Bucardo
BucardoBucardo
Bucardo
 
How a bunch of normal people Used Technology To Repair a Rigged Election
How a bunch of normal people Used Technology To Repair a Rigged ElectionHow a bunch of normal people Used Technology To Repair a Rigged Election
How a bunch of normal people Used Technology To Repair a Rigged Election
 
Open Source Bridge Opening Day
Open Source Bridge Opening DayOpen Source Bridge Opening Day
Open Source Bridge Opening Day
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Managing terabytes

  • 1. Managing Terabytes Problems and solutions with operating large Postgres installations Selena Deckelmann Prime Radiant @selenamarie SP ogm Ceon Cfo .En Uef2 re 1 0n 1c1e
  • 2. About me. 2 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 3. The Environment • 1.6 TB, 1 cluster,Version 8.2 • 1.1 TB, 1 cluster,Version 8.3 • 8.4/9.0 Dev systems • Working toward 9.0 into prod (May 2011) • pgpool, Redis, RabbitMQ, NFS SP ogm Ceon Cfo .En Uef2 re 3 0n 1c1e
  • 4. Some stats • daily peak: ~3000 commits per second • average writes: 4 MBps • average reads: 8 MBps SP ogm Ceon Cfo .En Uef2 re 4 0n 1c1e
  • 5. What’s good • Most queries are fast! • Benchmarks say we’re pushing the limits of the hardware • Developers love working with Postgres SP ogm Ceon Cfo .En Uef2 re0n 1c1e
  • 6. And lots more. But... 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 7. 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 8. The Problems 1. System resource exhaustion 2. Everything is slow: Huge catalogs, Backups 3. Handling VACUUM problems: Bloat, Transaction wraparound 4. Upgrades: Minor, Major SP ogm Ceon Cfo .En Uef2 re0n 1c1e
  • 9. System Resource Exhaustion 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 10. Running out of inodes Problem: UFS on Solaris “The only way to add more inodes to a UFS filesystem is: 1. destroy the filesystem and create a new filesystem with a higher inode density 2. enlarge the filesystem - growfs man page” SP ogm Ceon Cfo .En Uef2 re 10 0n 1c1e
  • 11. Running out of inodes Solution 0: Delete files. Solution 1: Sharding/bigger filesystem Solution 2: xfs SP ogm Ceon Cfo .En Uef2 re 11 0n 1c1e
  • 12. Running out of file descriptors Problem: Too many open files by the database. selena@lulu:~ #508 18:43 :) sudo lsof -p 19121 | wc 40 355 4151 Solution: You need a connection pooler. SP ogm Ceon Cfo .En Uef2 re 12 0n 1c1e
  • 13. Running out of file descriptors Solution: You need a connection pooler. Recommended: pgbouncer (threaded, online upgrade) pgpool-II (failover) SP ogm Ceon Cfo .En Uef2 re 13 0n 1c1e
  • 14. Everything is slow. 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 15. Huge Catalogs 409,994 tables SP ogm Ceon Cfo .En Uef2 re 15 0n 1c1e
  • 16. Maintenance problem Minor mistake in parent table definitions: not null default nextval('important_sequence'::text) vs not null default nextval('important_sequence'::regclass) SP ogm Ceon Cfo .En Uef2 re 16 0n 1c1e
  • 17. Huge Catalogs Problem: Slow scans of catalog data Solution: Upgrade to Postgres 8.4 or higher But really: Avoid making a cluster with >400k tables. SP ogm Ceon Cfo .En Uef2 re 17 0n 1c1e
  • 18. Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Problem: This is slow to write. (128 MB written every second or so) SP ogm Ceon Cfo .En Uef2 re 18 0n 1c1e
  • 19. Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Soution: Move stats file to RAM. stats_temp_directory (8.4 or higher) There’s a trivial patch for earlier versions. SP ogm Ceon Cfo .En Uef2 re 19 0n 1c1e
  • 20. Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Problem: This is slow to read. SP ogm Ceon Cfo .En Uef2 re 20 0n 1c1e
  • 21. Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Solution: Supposedly, this is better in 8.4 and higher. (fewer writes per minute) Still probably not fast. SP ogm Ceon Cfo .En Uef2 re 21 0n 1c1e
  • 22. Backups pg_dump takes longer and longer... SP ogm Ceon Cfo .En Uef2 re 22 0n 1c1e
  • 23.    backup     |   duration -------------------+--------------------  2009­11­22  |  02:44:36.821475   2009­11­23  |  02:46:20.003507  2009­11­24  |  02:47:06.260705  2009­12­06  |  07:13:04.174964  2009­12­13  |  05:00:01.082676  2009­12­20  |  06:24:49.433043  2009­12­27  |  05:35:20.551477  2010­01­03  |  07:36:49.651492  2010­01­10  |  05:55:02.396163  2010­01­17  |  07:32:33.277559  2010­01­24  |  06:22:46.522319  2010­01­31  |  10:48:13.060888  2010­02­07  |  21:21:47.77618  2010­02­14  |  14:32:04.638267  2010­02­21  |  11:34:42.353244  2010­02­28  |  11:13:02.102345 SP ogm Ceon Cfo .En Uef2 re 23 0n 1c1e
  • 24. Backups Problem: pg_dump is too slow. Solutions: • patching pg_dump for SELECT ... LIMIT • crank down shared_buffers • Stop using pg_dump for backups • 64-bit might help SP ogm Ceon Cfo .En Uef2 re 24 0n 1c1e
  • 25. How not to migrate to a 64-bit system SP ogm Ceon Cfo .En Uef2 re 25 0n 1c1e
  • 26. Title Text Install 32-bit Postgres and libraries on a 64-bit system. Install 64-bit Postgres/libs of the same version. Copy “hot backup” from 32-bit sys over to 64-bit sys. Run pg_dump from 64-bit version on 32-bit Postgres. SP ogm Ceon Cfo .En Uef2 re 26 0n 1c1e
  • 27. A single warm standby is not a backup. But lots of people use them that way! SP ogm Ceon Cfo .En Uef2 re 27 0n 1c1e
  • 28. Ship WAL from Solaris x86 -> Linux It did work! SP ogm Ceon Cfo .En Uef2 re 28 0n 1c1e
  • 29. Handling VACUUM problems 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 30. Bloat Problem: Lots of dead tuples in tables. • Frequent UPDATEs to long tables of log data • Frequent DELETEs without a VACUUM • A terabyte of dead tuples SP ogm Ceon Cfo .En Uef2 re 30 0n 1c1e
  • 31. Fixing bloat Solution: Write custom scripts to clean • VACUUM for small things • CLUSTER for everything else • Considered TRUNCATE SP ogm Ceon Cfo .En Uef2 re 31 0n 1c1e
  • 32. Catalog Bloat Application allowed users to initiate ALTER TABLE. Regular VACUUM couldn’t fix it. VACUUM FULL of the catalog takes 2+ hours. Use of NOTIFY/LISTEN can also cause bloat. SP ogm Ceon Cfo .En Uef2 re 32 0n 1c1e
  • 33. Transaction wraparound avoidance Problem: autovacuum set off too frequently Watch age(datfrozenxid) Solution: Increase autovacuum_freeze_max_age (default is 200 million, we increase to one billion) SP ogm Ceon Cfo .En Uef2 re 33 0n 1c1e
  • 34. Upgrades 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 35. Minor upgrades Problem: Restarting Postgres causes bad application performance. • Require a start/stop of database • Unexpected CHECKPOINT • Cold cache SP ogm Ceon Cfo .En Uef2 re 35 0n 1c1e
  • 36. Minor upgrades Solutions: • Plan for a CHECKPOINT before shutdown • Warm the cache (Queries that exercise indexes, maybe table scans) SP ogm Ceon Cfo .En Uef2 re 36 0n 1c1e
  • 37. Major Version upgrades Problem: Major upgrades are a PITA. • <8.2 - no pg_upgrade :( • Time your restores. • Document your SLAs. SP ogm Ceon Cfo .En Uef2 re 37 0n 1c1e
  • 38. Major Version upgrades Solutions: :( • >=8.3 - pg_upgrade • Time your restores. • Document your SLAs. SP ogm Ceon Cfo .En Uef2 re 38 0n 1c1e
  • 39. Major Version upgrades Solutions: :( • Write tools to migrate data • Shard • Trigger-based replication SP ogm Ceon Cfo .En Uef2 re 39 0n 1c1e
  • 40. The Problems 1. System resource exhaustion 2. Everything is slow: Huge catalogs, Backups 3. Handling VACUUM problems: Bloat, Transaction wraparound 4. Upgrades: Minor, Major SP ogm Ceon Cfo .En Uef2 re0n 1c1e
  • 41. The Solutions 1. System resource exhaustion Choose a better filesystem, Pooling 2. Everything is slow: Huge catalogs, Backups Don’t do that, Monitor & Binary backups SP ogm Ceon Cfo .En Uef2 re0n 1c1e
  • 42. The Solutions 3. Handling VACUUM problems: Bloat, Transaction wraparound Developer education, Monitoring, Cleanup, *_max_freeze_age 4. Upgrades: Minor, Major Plan, Plan, Plan (CHECKPOINT, warm cache, pg_upgrade) SP ogm Ceon Cfo .En Uef2 re0n 1c1e
  • 43. Thanks! Managing Terabytes Problems and solutions with operating large Postgres installations Selena Deckelmann Prime Radiant @selenamarie SP ogm Ceon Cfo .En Uef2 re 43 0n 1c1e