SlideShare a Scribd company logo
1 of 38
Download to read offline
Tale from Trenches
How auto-vacuum, streaming replication, batch
query took down availability and performance
- Sameer Kumar, Solution Architect, Ashnik
About Me!
• A Random guy who started his career as an Oracle and DB2 DBA
(and yeah a bit of SQL Server too)
• Then moved to ‘Ashnik’ and started working with PostgreSQL
• And then he fell in love with Open Source!
• Twitter - @sameerkasi200x
• Apart from PostgreSQL, I also do
• noSQL Database (shhhh!)
• Docker
• Ansible
• Chef
• Apart from technology I love cycling and photography
Disclaimer!
• All the images used in this PPT have been used as per the
associated attribution and copyright instructions
• I take sole responsibility for the content used in this presentation
• If you like my talk please tweet
• #PGCONFAPAC
• @PGCONFAPAC
Why I Love PostgreSQL?
• “Most Advanced Open Source Database”
• A vibrant and active community
• Full ACID compliant
• Multi Version Concurrency Control
• NoSQL capability
• Developer Friendly
• Built to be extended ‘easily’
What am I not going to talk?
• My employer - Ashnik
• I won’t tell you that Ashnik is Enterprise Open Source Solution provider
• My colleagues have great expertise and experience in PostgreSQL
• I won’t talk about Postgres deployments Ashnik has done in BFSI sector
• Why should you migrate out of Oracle and SQL Server?
• How to ensure high availability with PostgreSQL?
• How to scale PostgreSQL?
• How to use extensions in PostgreSQL and extend its features?
• How to monitor PostgreSQL setup?
• How to go about sharding and scaling PostgreSQL?
I will be telling you a
story!
And everyone lived happily ever after!
Once upon a time a large BFSI company
migrated to PostgreSQL
Well no!
Like any other animal Elephants needs
a caretaker and tendering
Let’s begin with the
story!
Configuration:
• 4 Core CPU and 32GB
RAM
• PostgreSQL 9.4.3
Installation
• HA setup with pgpool
Day 0 (GoLive): Architecture
Issue:
• High CPU usage
• 500+ concurrent sessions on server – all facing slow
response time
Day 1: Issue on production server
• Perform a controlled failover to
standby server
• Capacity upgrade on old
production server
Immediate action taken aka Firefighting
• Errors on standby server
ERROR: cancelling statement due to
conflict with recovery
DETAIL: User was holding a relation
lock for too long.
• This is not your usual conflict caused
by row clean-up
Day 2: Issue on standby server
pg_stat_database_conflict
• Monitor long running queries
• Monitor High CPU queries
• top + pg_stat_statement
• Time for Batch process has
been increasing exponentially
• From 15minutes to now
2hours
Monitoring the production server
Week 2: Identify the bottleneck queries
• Identify costly queries
• Make select queries run faster – hoping it will reduce
chances of conflicts
• Tune the queries – to reduce CPU usage
• Identify queries causing locks
• Tune queries used in batch
• Add indexes
Week 2: Issue reoccurrence
• Conflict occurred past midnight – low utilization
period
• Surprising to have issue re-occurred after doing
some tuning
• Understand the nature of application
• Logic inside the batch job
• Capture queries executing at the time of issue
• Set log_autovacuum_min_duration parameter
Week 2: Further diagnosis
Week 2 – Issue re-
creation
Let’s find the culprit!
postgres=# show autovacuum;
autovacuum
------------
on
(1 row)
postgres=# show autovacuum;
autovacuum
------------
on
(1 row)
Issue re-creation – on Master
postgres=# show
hot_standby_feedback ;
hot_standby_feedback
----------------------
on
(1 row)
postgres=# show
max_standby_streaming_delay
;
max_standby_streaming_delay
---------------------------
--
30s
(1 row)
Issue re-creation – on Standby
Issue re-creation – no conflicts
postgres=# select * from
pg_stat_database_conflicts
where datname='training_db';
-[ RECORD 1 ]-----+------------
datid | 16400
datname | training_db
confl_tablespace | 0
confl_lock | 0
confl_snapshot | 0
confl_bufferpin | 0
confl_deadlock | 0
Issue re-creation – table stats
relpages | reltuples | relname
------------+-------------+------------------
0| 0 | pgbench_history
1| 10 | pgbench_tellers
1| 1 | pgbench_branches
1640| 100000 | pgbench_accounts
Issue re-creation – Table maintenance stats
relname | n_dead_tup | last_autovacuum | last_vacuum
--------------------------+---------+------+------------------------+----
pgbench_tellers | 0 | | 2016-06-24 03:59:23.624436+08
pgbench_history | 0 | | 2016-06-24 03:59:24.084768+08
pgbench_branches | 0 | | 2016-06-24 03:59:23.536748+08
pgbench_accounts | 0 | | 2016-06-24 03:59:23.68088+08
Issue creation – a custom script for pgbench
begin transaction;
select count(*) from pgbench_accounts;
-- simulate delay
select pg_sleep(1);
select * from pgbench_branches;
select * from pgbench_history ;
select * from pgbench_tellers;
end;
Issue re-creation – Lock monitoring
select
act.pid as connection_pid,
act.query as locking_query,
act.client_addr as client_address,
act.usename as username,
act.datname as database_name,
lck.relation as relation_id,
rel.relname
from pg_stat_activity act
join pg_locks lck on act.pid=lck.pid
join pg_class rel on rel.oid=lck.relation
where lck.locktype='relation' and
lck.mode='AccessExclusiveLock';
Image source: http://maxpixel.freegreatpicture.com/Metal-Lock-Protection-Chain-Safety-Security-1114101
Issue re-creation - simulate bulk operation
• Delete rows
• Insert rows
Issue re-creation – error reproduced
client 0 sending select pg_sleep(1);
client 0 receiving
Client 0 aborted in state 2: ERROR: canceling
statement due to conflict with recovery
DETAIL: User was holding a relation lock for
too long.
transaction type: Custom query
Issue re-creation – table stats after test
relpages | reltuples | relname
----------+-----------+------------------
0 | 0 | pgbench_history
1 | 10 | pgbench_tellers
1 | 1 | pgbench_branches
738 | 20997 | pgbench_accounts
(4 rows)
Issue re-creation – Table maintenance stats
relname | n_dead_tup | last_autovacuum | last_vacuum
------------------+------------+-------------------------------+------------------------------
pgbench_tellers | 0 | | 2016-06-24 03:59:23.624436+08
pgbench_history | 0 | | 2016-06-24 03:59:24.084768+08
pgbench_branches | 0 | | 2016-06-24 03:59:23.536748+08
pgbench_accounts | 0 | 2016-06-24 04:23:59.811238+08| 2016-06-24 03:59:23.68088+08
(4 rows)
Image Source:
http://maxpixel.freegreatpicture.com/Supplies-Vacuum-Cleaning-Dust-Buster-Bucket-29040
Issue re-creation – Database Conflicts
datname | confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin | confl_deadlock
-------+-------------+------------------+------------+----------------+-----------------+---------
training_db | 0 | 2 | 0 | 0 | 0
Issue re-creation – Repeated the test
• Increased max_standby_streaming_delay
postgres=# show max_standby_streaming_delay ;
max_standby_streaming_delay
-----------------------------
1min
(1 row)
• Similar out come but took a bit longer to hit the error
VACUUM
• Clean-up of dead snapshots
• Does not reclaim the space back
to OS
• Concurrent access allowed
• In-place vacuum
• No additional space requirement
• If a page is fully freed-up it gets
released back to OS
• Causes Exclusive Lock at page
level
VACUUM FULL
• It is not “VACUUM, but better”
• Reclaims the space
• Concurrent access not allowed
• Uses AccessExclusive Lock
• Needs more storage during the
process
• Moves the table to a new
location
• More time
Vacuum Vs Vacuum Full
• Collected VACUUM related statistics from Test and Production
environment around the batch schedule
• All the conflicts on standby had page clean-up involved on
master
• In page lock gets replicated to standby
• Query cancellation because of conflict on standby server
Week 3 – Further Analysis
Week 3 – Resolution
• Interim resolution
• Switch off auto vacuum on the table involved in batch
• Manual vacuum full during off-peak hours
• Result
• No more query cancellation on standby
• Batch period reduced from 2 hours to 2 minutes
• Long term resolution
• Fix batch process logic
• Always set hot_standby_feedback to “on”
• Increasing max_standby_streaming_delay does not solve the
problem
• It only procrastinates the problem
• It has negative impact on availability
• Conflict because of Row version and lock are two different errors
• Tune frequent running queries and not just long running queries.
• Vacuum can also shrink pages if necessary
• Autovacuum has lot of knobs that can be tuned at table level
• Database troubleshooting involves application and OS as well
Learning
• Now vacuum and page reclaim works better and avoid
conflicting lock in replication environment
• Better monitoring details available for blocking session
• Have you upgraded yet?
Recent changes in PostgreSQL
Twitter - @sameerkasi200x | @ashnikbiz
Email - sameer.kumar@ashnik.com | success@ashnik.com
LinkedIn - https://www.linkedin.com/in/samkumar150288
We are hiring!

More Related Content

What's hot

Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
 
CaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use CasesCaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use CasesDataWorks Summit
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance SmackdownDataWorks Summit
 
PostgreSQL Enterprise Class Features and Capabilities
PostgreSQL Enterprise Class Features and CapabilitiesPostgreSQL Enterprise Class Features and Capabilities
PostgreSQL Enterprise Class Features and CapabilitiesPGConf APAC
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideIBM
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_enOgibayashi
 
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogVadim Semenov
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesJonathan Katz
 
ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemAvleen Vig
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkDatabricks
 
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...Grant McAlister
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale Hakka Labs
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015N Masahiro
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitSpark Summit
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBKai Sasaki
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web ServiceEvan Chan
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
 

What's hot (20)

Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
 
CaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use CasesCaffeOnSpark Update: Recent Enhancements and Use Cases
CaffeOnSpark Update: Recent Enhancements and Use Cases
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance Smackdown
 
PostgreSQL Enterprise Class Features and Capabilities
PostgreSQL Enterprise Class Features and CapabilitiesPostgreSQL Enterprise Class Features and Capabilities
PostgreSQL Enterprise Class Features and Capabilities
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
 
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at Datadog
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
ELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log systemELK: Moose-ively scaling your log system
ELK: Moose-ively scaling your log system
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
User Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDBUser Defined Partitioning on PlazmaDB
User Defined Partitioning on PlazmaDB
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
 
Spark tuning
Spark tuningSpark tuning
Spark tuning
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
 

Similar to PGConf APAC 2018 - Tale from Trenches

Tuning Autovacuum in Postgresql
Tuning Autovacuum in PostgresqlTuning Autovacuum in Postgresql
Tuning Autovacuum in PostgresqlMydbops
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
What you need to know for postgresql operation
What you need to know for postgresql operationWhat you need to know for postgresql operation
What you need to know for postgresql operationAnton Bushmelev
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
 
Strategic Autovacuum
Strategic AutovacuumStrategic Autovacuum
Strategic AutovacuumScott Mead
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at ParseTravis Redman
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at ParseMongoDB
 
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDBMongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDBMongoDB
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataJignesh Shah
 
(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool ManagementBIOVIA
 
Tuenti Release Workflow v1.1
Tuenti Release Workflow v1.1Tuenti Release Workflow v1.1
Tuenti Release Workflow v1.1Tuenti
 
Oracle Database Performance Tuning Basics
Oracle Database Performance Tuning BasicsOracle Database Performance Tuning Basics
Oracle Database Performance Tuning Basicsnitin anjankar
 
Always On - Zero Downtime releases
Always On - Zero Downtime releasesAlways On - Zero Downtime releases
Always On - Zero Downtime releasesAnders Lundsgård
 
MODULE 3 process synchronizationnnn.pptx
MODULE 3 process synchronizationnnn.pptxMODULE 3 process synchronizationnnn.pptx
MODULE 3 process synchronizationnnn.pptxsenthilkumar969017
 
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Antonios Chatzipavlis
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure DataTaro L. Saito
 
Cloud Architecture & Distributed Systems Trivia
Cloud Architecture & Distributed Systems TriviaCloud Architecture & Distributed Systems Trivia
Cloud Architecture & Distributed Systems TriviaDr.-Ing. Michael Menzel
 

Similar to PGConf APAC 2018 - Tale from Trenches (20)

Tuning Autovacuum in Postgresql
Tuning Autovacuum in PostgresqlTuning Autovacuum in Postgresql
Tuning Autovacuum in Postgresql
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
What you need to know for postgresql operation
What you need to know for postgresql operationWhat you need to know for postgresql operation
What you need to know for postgresql operation
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Ch5 process synchronization
Ch5   process synchronizationCh5   process synchronization
Ch5 process synchronization
 
Strategic Autovacuum
Strategic AutovacuumStrategic Autovacuum
Strategic Autovacuum
 
Strategic autovacuum
Strategic autovacuumStrategic autovacuum
Strategic autovacuum
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at Parse
 
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDBMongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
 
Postgres at Scale - at Scale.pdf
Postgres at Scale - at Scale.pdfPostgres at Scale - at Scale.pdf
Postgres at Scale - at Scale.pdf
 
(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management
 
Tuenti Release Workflow v1.1
Tuenti Release Workflow v1.1Tuenti Release Workflow v1.1
Tuenti Release Workflow v1.1
 
Oracle Database Performance Tuning Basics
Oracle Database Performance Tuning BasicsOracle Database Performance Tuning Basics
Oracle Database Performance Tuning Basics
 
Always On - Zero Downtime releases
Always On - Zero Downtime releasesAlways On - Zero Downtime releases
Always On - Zero Downtime releases
 
MODULE 3 process synchronizationnnn.pptx
MODULE 3 process synchronizationnnn.pptxMODULE 3 process synchronizationnnn.pptx
MODULE 3 process synchronizationnnn.pptx
 
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018 Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
Azure SQL Database for the SQL Server DBA - Azure Bootcamp Athens 2018
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Cloud Architecture & Distributed Systems Trivia
Cloud Architecture & Distributed Systems TriviaCloud Architecture & Distributed Systems Trivia
Cloud Architecture & Distributed Systems Trivia
 

More from PGConf APAC

PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...PGConf APAC
 
PGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC 2018: PostgreSQL 10 - Replication goes LogicalPGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC 2018: PostgreSQL 10 - Replication goes LogicalPGConf APAC
 
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQLPGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQLPGConf APAC
 
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC
 
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...PGConf APAC
 
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018PGConf APAC
 
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companionPGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companionPGConf APAC
 
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQLPGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQLPGConf APAC
 
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC
 
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...PGConf APAC
 
PGConf APAC 2018 Keynote: PostgreSQL goes eleven
PGConf APAC 2018 Keynote: PostgreSQL goes elevenPGConf APAC 2018 Keynote: PostgreSQL goes eleven
PGConf APAC 2018 Keynote: PostgreSQL goes elevenPGConf APAC
 
Amazon (AWS) Aurora
Amazon (AWS) AuroraAmazon (AWS) Aurora
Amazon (AWS) AuroraPGConf APAC
 
Use Case: PostGIS and Agribotics
Use Case: PostGIS and AgriboticsUse Case: PostGIS and Agribotics
Use Case: PostGIS and AgriboticsPGConf APAC
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollPGConf APAC
 
PostgreSQL on Amazon RDS
PostgreSQL on Amazon RDSPostgreSQL on Amazon RDS
PostgreSQL on Amazon RDSPGConf APAC
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PGConf APAC
 
Lightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst PracticesLightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst PracticesPGConf APAC
 
Lessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’tLessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’tPGConf APAC
 
Query Parallelism in PostgreSQL: What's coming next?
Query Parallelism in PostgreSQL: What's coming next?Query Parallelism in PostgreSQL: What's coming next?
Query Parallelism in PostgreSQL: What's coming next?PGConf APAC
 
Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!PGConf APAC
 

More from PGConf APAC (20)

PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
 
PGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC 2018: PostgreSQL 10 - Replication goes LogicalPGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
 
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQLPGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
 
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQLPGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
 
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
 
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
 
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companionPGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
 
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQLPGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
 
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
 
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
 
PGConf APAC 2018 Keynote: PostgreSQL goes eleven
PGConf APAC 2018 Keynote: PostgreSQL goes elevenPGConf APAC 2018 Keynote: PostgreSQL goes eleven
PGConf APAC 2018 Keynote: PostgreSQL goes eleven
 
Amazon (AWS) Aurora
Amazon (AWS) AuroraAmazon (AWS) Aurora
Amazon (AWS) Aurora
 
Use Case: PostGIS and Agribotics
Use Case: PostGIS and AgriboticsUse Case: PostGIS and Agribotics
Use Case: PostGIS and Agribotics
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'roll
 
PostgreSQL on Amazon RDS
PostgreSQL on Amazon RDSPostgreSQL on Amazon RDS
PostgreSQL on Amazon RDS
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
 
Lightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst PracticesLightening Talk - PostgreSQL Worst Practices
Lightening Talk - PostgreSQL Worst Practices
 
Lessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’tLessons PostgreSQL learned from commercial databases, and didn’t
Lessons PostgreSQL learned from commercial databases, and didn’t
 
Query Parallelism in PostgreSQL: What's coming next?
Query Parallelism in PostgreSQL: What's coming next?Query Parallelism in PostgreSQL: What's coming next?
Query Parallelism in PostgreSQL: What's coming next?
 
Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

PGConf APAC 2018 - Tale from Trenches

  • 1. Tale from Trenches How auto-vacuum, streaming replication, batch query took down availability and performance - Sameer Kumar, Solution Architect, Ashnik
  • 2. About Me! • A Random guy who started his career as an Oracle and DB2 DBA (and yeah a bit of SQL Server too) • Then moved to ‘Ashnik’ and started working with PostgreSQL • And then he fell in love with Open Source! • Twitter - @sameerkasi200x • Apart from PostgreSQL, I also do • noSQL Database (shhhh!) • Docker • Ansible • Chef • Apart from technology I love cycling and photography
  • 3. Disclaimer! • All the images used in this PPT have been used as per the associated attribution and copyright instructions • I take sole responsibility for the content used in this presentation • If you like my talk please tweet • #PGCONFAPAC • @PGCONFAPAC
  • 4. Why I Love PostgreSQL? • “Most Advanced Open Source Database” • A vibrant and active community • Full ACID compliant • Multi Version Concurrency Control • NoSQL capability • Developer Friendly • Built to be extended ‘easily’
  • 5. What am I not going to talk? • My employer - Ashnik • I won’t tell you that Ashnik is Enterprise Open Source Solution provider • My colleagues have great expertise and experience in PostgreSQL • I won’t talk about Postgres deployments Ashnik has done in BFSI sector • Why should you migrate out of Oracle and SQL Server? • How to ensure high availability with PostgreSQL? • How to scale PostgreSQL? • How to use extensions in PostgreSQL and extend its features? • How to monitor PostgreSQL setup? • How to go about sharding and scaling PostgreSQL?
  • 6. I will be telling you a story!
  • 7. And everyone lived happily ever after! Once upon a time a large BFSI company migrated to PostgreSQL
  • 8. Well no! Like any other animal Elephants needs a caretaker and tendering
  • 9. Let’s begin with the story!
  • 10. Configuration: • 4 Core CPU and 32GB RAM • PostgreSQL 9.4.3 Installation • HA setup with pgpool Day 0 (GoLive): Architecture
  • 11. Issue: • High CPU usage • 500+ concurrent sessions on server – all facing slow response time Day 1: Issue on production server
  • 12. • Perform a controlled failover to standby server • Capacity upgrade on old production server Immediate action taken aka Firefighting
  • 13. • Errors on standby server ERROR: cancelling statement due to conflict with recovery DETAIL: User was holding a relation lock for too long. • This is not your usual conflict caused by row clean-up Day 2: Issue on standby server
  • 15. • Monitor long running queries • Monitor High CPU queries • top + pg_stat_statement • Time for Batch process has been increasing exponentially • From 15minutes to now 2hours Monitoring the production server
  • 16. Week 2: Identify the bottleneck queries • Identify costly queries • Make select queries run faster – hoping it will reduce chances of conflicts • Tune the queries – to reduce CPU usage • Identify queries causing locks • Tune queries used in batch • Add indexes
  • 17. Week 2: Issue reoccurrence • Conflict occurred past midnight – low utilization period • Surprising to have issue re-occurred after doing some tuning
  • 18. • Understand the nature of application • Logic inside the batch job • Capture queries executing at the time of issue • Set log_autovacuum_min_duration parameter Week 2: Further diagnosis
  • 19. Week 2 – Issue re- creation Let’s find the culprit!
  • 20. postgres=# show autovacuum; autovacuum ------------ on (1 row) postgres=# show autovacuum; autovacuum ------------ on (1 row) Issue re-creation – on Master
  • 21. postgres=# show hot_standby_feedback ; hot_standby_feedback ---------------------- on (1 row) postgres=# show max_standby_streaming_delay ; max_standby_streaming_delay --------------------------- -- 30s (1 row) Issue re-creation – on Standby
  • 22. Issue re-creation – no conflicts postgres=# select * from pg_stat_database_conflicts where datname='training_db'; -[ RECORD 1 ]-----+------------ datid | 16400 datname | training_db confl_tablespace | 0 confl_lock | 0 confl_snapshot | 0 confl_bufferpin | 0 confl_deadlock | 0
  • 23. Issue re-creation – table stats relpages | reltuples | relname ------------+-------------+------------------ 0| 0 | pgbench_history 1| 10 | pgbench_tellers 1| 1 | pgbench_branches 1640| 100000 | pgbench_accounts
  • 24. Issue re-creation – Table maintenance stats relname | n_dead_tup | last_autovacuum | last_vacuum --------------------------+---------+------+------------------------+---- pgbench_tellers | 0 | | 2016-06-24 03:59:23.624436+08 pgbench_history | 0 | | 2016-06-24 03:59:24.084768+08 pgbench_branches | 0 | | 2016-06-24 03:59:23.536748+08 pgbench_accounts | 0 | | 2016-06-24 03:59:23.68088+08
  • 25. Issue creation – a custom script for pgbench begin transaction; select count(*) from pgbench_accounts; -- simulate delay select pg_sleep(1); select * from pgbench_branches; select * from pgbench_history ; select * from pgbench_tellers; end;
  • 26. Issue re-creation – Lock monitoring select act.pid as connection_pid, act.query as locking_query, act.client_addr as client_address, act.usename as username, act.datname as database_name, lck.relation as relation_id, rel.relname from pg_stat_activity act join pg_locks lck on act.pid=lck.pid join pg_class rel on rel.oid=lck.relation where lck.locktype='relation' and lck.mode='AccessExclusiveLock'; Image source: http://maxpixel.freegreatpicture.com/Metal-Lock-Protection-Chain-Safety-Security-1114101
  • 27. Issue re-creation - simulate bulk operation • Delete rows • Insert rows
  • 28. Issue re-creation – error reproduced client 0 sending select pg_sleep(1); client 0 receiving Client 0 aborted in state 2: ERROR: canceling statement due to conflict with recovery DETAIL: User was holding a relation lock for too long. transaction type: Custom query
  • 29. Issue re-creation – table stats after test relpages | reltuples | relname ----------+-----------+------------------ 0 | 0 | pgbench_history 1 | 10 | pgbench_tellers 1 | 1 | pgbench_branches 738 | 20997 | pgbench_accounts (4 rows)
  • 30. Issue re-creation – Table maintenance stats relname | n_dead_tup | last_autovacuum | last_vacuum ------------------+------------+-------------------------------+------------------------------ pgbench_tellers | 0 | | 2016-06-24 03:59:23.624436+08 pgbench_history | 0 | | 2016-06-24 03:59:24.084768+08 pgbench_branches | 0 | | 2016-06-24 03:59:23.536748+08 pgbench_accounts | 0 | 2016-06-24 04:23:59.811238+08| 2016-06-24 03:59:23.68088+08 (4 rows) Image Source: http://maxpixel.freegreatpicture.com/Supplies-Vacuum-Cleaning-Dust-Buster-Bucket-29040
  • 31. Issue re-creation – Database Conflicts datname | confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin | confl_deadlock -------+-------------+------------------+------------+----------------+-----------------+--------- training_db | 0 | 2 | 0 | 0 | 0
  • 32. Issue re-creation – Repeated the test • Increased max_standby_streaming_delay postgres=# show max_standby_streaming_delay ; max_standby_streaming_delay ----------------------------- 1min (1 row) • Similar out come but took a bit longer to hit the error
  • 33. VACUUM • Clean-up of dead snapshots • Does not reclaim the space back to OS • Concurrent access allowed • In-place vacuum • No additional space requirement • If a page is fully freed-up it gets released back to OS • Causes Exclusive Lock at page level VACUUM FULL • It is not “VACUUM, but better” • Reclaims the space • Concurrent access not allowed • Uses AccessExclusive Lock • Needs more storage during the process • Moves the table to a new location • More time Vacuum Vs Vacuum Full
  • 34. • Collected VACUUM related statistics from Test and Production environment around the batch schedule • All the conflicts on standby had page clean-up involved on master • In page lock gets replicated to standby • Query cancellation because of conflict on standby server Week 3 – Further Analysis
  • 35. Week 3 – Resolution • Interim resolution • Switch off auto vacuum on the table involved in batch • Manual vacuum full during off-peak hours • Result • No more query cancellation on standby • Batch period reduced from 2 hours to 2 minutes • Long term resolution • Fix batch process logic
  • 36. • Always set hot_standby_feedback to “on” • Increasing max_standby_streaming_delay does not solve the problem • It only procrastinates the problem • It has negative impact on availability • Conflict because of Row version and lock are two different errors • Tune frequent running queries and not just long running queries. • Vacuum can also shrink pages if necessary • Autovacuum has lot of knobs that can be tuned at table level • Database troubleshooting involves application and OS as well Learning
  • 37. • Now vacuum and page reclaim works better and avoid conflicting lock in replication environment • Better monitoring details available for blocking session • Have you upgraded yet? Recent changes in PostgreSQL
  • 38. Twitter - @sameerkasi200x | @ashnikbiz Email - sameer.kumar@ashnik.com | success@ashnik.com LinkedIn - https://www.linkedin.com/in/samkumar150288 We are hiring!