Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Advanced pg_stat_statements: Filtering, Regression Testing & more

Talk on how we use pg_stat_statements at https://pganalyze.com/.

Lessons learned and introducing our pg_query library for parsing SQL queries.

  • Login to see the comments

Advanced pg_stat_statements: Filtering, Regression Testing & more

  1. 1. Advanced pg_stat_statements: Filtering, Regression Testing & more @LukasFittl
  2. 2. Skilled Developer Amateur Hacker @LukasFittl
  3. 3. pganalyze.com 1.6 million unique queries tracked using pg_stat_statements
  4. 4. Intro pg_stat_statements userid | 10 dbid | 1397527 query | SELECT * FROM x WHERE y = ? calls | 5 total_time | 15.249 rows | 0 shared_blks_hit | 451 shared_blks_read | 41 shared_blks_dirtied | 26 shared_blks_written | 0 local_blks_hit | 0 local_blks_read | 0 local_blks_dirtied | 0 local_blks_written | 0 temp_blks_read | 0 temp_blks_written | 0 blk_read_time | 0 blk_write_time | 0
  5. 5. Intro query | SELECT * FROM x WHERE y = ? calls | 5 total_time | 15.249 Query + Avg Time + Timeframe
  6. 6. Intro
  7. 7. Improving Data Quality pg_query Filtering & Regression Testing
  8. 8. Improving Data Quality pg_query Filtering & Regression Testing
  9. 9. SELECT "postgres_settings".* FROM "postgres_settings" WHERE "postgres_settings"."database_id" = $1 AND "postgres_settings"."invalidated_at_snapshot_id" IS NULL AND (id not in (70288,70289,70290,70291,70292,70293,70294,70295,70296,70297,70298 ,70299,70300,70301,70302,70303,70304,70305,70306,70307,70308,70309 ,70310,70311,70312,70313,70314,70315,70316,70317,70318,70319,70320 ,70321,70322,70323,70324,70325,70326,70327,99059,99060,70330,70331 ,70332,70333,70334,70335,70336,70337,70338,99061,70340,70341,70342 ,70343,70344,70345,70346,70347,70348,70349,70350,70351,70352,70353 ,70354,70355,70356,70357,70358,70359,70360,99062,70362,70363,70364 ,70365,70366,70367,70368,70369,70370,70371,70372,70373,70374,70375 ,70376,70377,70378,70379,70380,70381,70382,70383,70384,70385,70386 ,99063,99064,99065,99066,99067,70392,70393,70394,70395,70396,70397 ,70398,70399,70400,70401,70402,70403,70404,70405,99068,70407,70408 ,70409,70410,70411,70412,70413,70414,70415,70416,70417,99069,70419 ,70420,70421,99070,70423,70424,70425,70426,70427,70428, Truncation Improving Data Quality
  10. 10. Improving Data Quality -[ RECORD 1 ]———+-------------------------------- query | SELECT * FROM x WHERE y = ? calls | 5 total_time | 15.249 -[ RECORD 2 ]———+-------------------------------- query | SELECT * FROM z WHERE a = 123 calls | 50 total_time | 104.19 Race Condition during pg_stat_statements_reset()
  11. 11. Lesson Learned: Avoid frequent Improving Data Quality pg_stat_statements_reset()
  12. 12. Fingerprinting SELECT a AS b == SELECT a AS c Problematic: y IN (?, ?, ?) != y IN (?, ?) Improving Data Quality SELECT a, b FROM x != SELECT b, a FROM x DEALLOCATE p141 != DEALLOCATE p150
  13. 13. Limited Statistical Information ! Histogram / MAX(runtime) would be super-useful Improving Data Quality
  14. 14. pg_stat_plans Improving Data Quality pg_stat_statements variant that differentiates between query plans. Slower + Don’t use it before this bug is fixed: https://github.com/2ndQuadrant/pg_stat_plans/issues/39
  15. 15. Improving Data Quality Filtering & Regression Testing pg_query
  16. 16. Storing & Cleaning pg_stat_statements data pg_query
  17. 17. pg_query Monitoring Setup Snapshot {“schema”: {“n_live_tup”: 75, "relpages": 1, "reltuples": 75.0,…}, “queries”: [{..}, {..}]} Production Database Collector Normalize {“schema”: {“n_live_tup”: 75, "relpages": 1, "reltuples": 75.0,…}, “queries”: [{..}, {..}]} Monitoring Database Parse Fingerprint Extract Tables
  18. 18. pg_query queries id | 7053479 database_id | 1 received_query | SELECT * FROM x WHERE y = ? normalized_query | SELECT * FROM x WHERE y = ? created_at | 2014-06-27 16:20:08.334705 updated_at | 2014-06-27 16:20:08.334705 parse_tree | [{"SELECT":{...}] parse_error | parse_warnings | statement_types | {SELECT} truncated | f fingerprint | 00704f1fd8442b7c17821cb8a61856c3d61b330e
  19. 19. pg_query query_snapshots id | 170661585 query_id | 7053479 calls | 29 total_time | 94.38 rows | 29 snapshot_id | 3386118 snapshots id | 3386118 database_id | 408 collected_at | 2014-09-09 20:10:01 submitter | pganalyze-collector 0.6.1 query_source | pg_stat_statements
  20. 20. pg_query Normalize Parse Fingerprint Extract Tables
  21. 21. pg_query Normalize Parse Parsing an SQL Query Fingerprint Extract Tables
  22. 22. EXPLAIN (PARSETREE TRUE) pg_query SELECT * FROM x WHERE y = 1 ({SELECT :distinctClause <> :intoClause <> :targetList ( {RESTARGET :name <> :indirection <> :val {COLUMNREF :fields ({A_STAR}) :location 7} :location 7}) :fromClause ( {RANGEVAR :schemaname <> :relname x :inhOpt 2 :relpersistence p :alias <> :location 14}) :whereClause {AEXPR :name (“=") :lexpr {COLUMNREF :fields ("y") :location 22} :rexpr {PARAMREF :number 0 :location 26} :location 24} Unfortunately doesn’t exist.
  23. 23. pg_query Parse Statement raw_parse(..) pg_catalog Rewrite Query Query Planner Execute
  24. 24. tree = raw_parser(query_str); pg_query str = nodeToString(tree); printf(str); ({SELECT :distinctClause <> :intoClause <> :targetList ( {RESTARGET :name <> :indirection <> :val {COLUMNREF :fields ({A_STAR}) :location 7} :location 7}) :fromClause ( {RANGEVAR :schemaname <> :relname x :inhOpt 2 :relpersistence p :alias <> :location 14}) :whereClause {AEXPR :name (“=") :lexpr {COLUMNREF :fields ("y") :location 22} :rexpr {PARAMREF :number 0 :location 26} :location 24}
  25. 25. pg_query Parse Statement raw_parse(..) pg_catalog Rewrite Query Query Planner Execute
  26. 26. github.com/pganalyze/pg_query pg_query Extension Compiles a full copy of PostgreSQL when you do “gem install pg_query”
  27. 27. pg_query PgQuery._raw_parse( “SELECT * FROM x WHERE y = 1”) ({SELECT :distinctClause <> :intoClause <> :targetList ( {RESTARGET :name <> :indirection <> :val {COLUMNREF :fields ({A_STAR}) :location 7} :location 7}) :fromClause ( {RANGEVAR :schemaname <> :relname x :inhOpt 2 :relpersistence p :alias <> :location 14}) :whereClause {AEXPR :name (“=") :lexpr {COLUMNREF :fields ("y") :location 22} :rexpr {PARAMREF :number 0 :location 26} :location 24} :groupClause <> :havingClause <> :windowClause <> :valuesLists <> :sortClause <> :limitOffset <> :limitCount <> :lockingClause <> :withClause <>
  28. 28. pg_query nodeToString is incomplete :( PgQuery._raw_parse(“CREATE SCHEMA foo”) WARNING: 01000: could not dump unrecognized node type: 754
  29. 29. src/backend/nodes/outfuncs.c pg_query Patch: Generate automatically, JSON output
  30. 30. PgQuery._raw_parse( pg_query “SELECT * FROM x WHERE y = 1”) [{"SELECT": { "targetList": [{ "RESTARGET": { "val": { "COLUMNREF": { "fields": [{"A_STAR": {}}], "location": 7 } }, "location": 7 } } ], "fromClause": [ { "RANGEVAR": { "relname": "x", "inhOpt": 2, "relpersistence": "p", "location": 14 } } ], "whereClause": { "AEXPR": { "name": [ "=" ], "lexpr": {
  31. 31. pg_query Parsing a normalized Normalize Parse SQL query Fingerprint Extract Tables
  32. 32. EXPLAIN SELECT * FROM x WHERE y = 1 QUERY PLAN --------------------------------------------------------------------- Index Scan using idx_for_y on x (cost=0.15..8.17 rows=1 width=140) Index Cond: (id = 1) Parse Analyze Plan pg_query
  33. 33. EXPLAIN SELECT * FROM x WHERE y = ? ERROR: syntax error at or near ";" LINE 1: EXPLAIN SELECT * FROM x WHERE y = ?; Parse Analyze Plan pg_query
  34. 34. EXPLAIN SELECT * FROM x WHERE y = ? EXPLAIN SELECT * FROM x WHERE y = $1 ERROR: there is no parameter $1 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $1; Parse Analyze Plan pg_query
  35. 35. pg_query Parser Patch to support parsing “?”
  36. 36. pg_query Downside: Breaks ? operator in some cases Real fix: Don’t use ? as a replacement character.
  37. 37. pg_query Fingerprinting Normalize Parse Fingerprint Extract Tables
  38. 38. pg_query > require ‘pg_query’ ! > q1 = PgQuery.parse(‘SELECT a, b FROM x’) > q1.fingerprint [“c72f1bc9feda72c0b4ba030eea90b4fed3ac8e86”] ! > q2 = PgQuery.parse(‘SELECT b, a FROM x’) > q2.fingerprint [“c72f1bc9feda72c0b4ba030eea90b4fed3ac8e86”]
  39. 39. pg_query 40 lines of unit-tested Ruby code
  40. 40. pg_query Extracting Table References Normalize Parse Fingerprint Extract Tables
  41. 41. pg_query > require ‘pg_query’ > q = PgQuery.parse(‘SELECT * FROM x’) > q.tables [“x”]
  42. 42. pg_query ~90 lines of unit-tested Ruby code
  43. 43. github.com/pganalyze/pg_query pg_query
  44. 44. Improving Data Quality pg_query Filtering & Regression Testing
  45. 45. Filtering Filtering & Regression Testing
  46. 46. monitor.rb Filtering & Regression Testing Simple top-like tool that shows pg_stat_statements data https://gist.github.com/lfittl/301542602607b738b23f
  47. 47. Filtering & Regression Testing monitor.rb -d testdb AVG | QUERY -------------------------------------------------------------------------------- 10.7ms | SELECT oid, typname, typelem, typdelim, typinput FROM pg_type 3.0ms | SET time zone 'UTC' 0.4ms | SELECT a.attname, format_type(a.atttypid, a.atttypmod), pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod FROM pg_attribute a LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum WHERE a.attrelid = ?::regclass AND a.attnum > ? AND NOT a.attisdropped ORDER BY a.attnum 0.2ms | SELECT pg_stat_statements_reset() 0.1ms | SELECT query, calls, total_time FROM pg_stat_statements 0.1ms | SELECT attr.attname FROM pg_attribute attr INNER JOIN pg_constraint cons ON attr.attrelid = cons.conrelid AND attr.attnum = cons.conkey[?] WHERE cons.contype = ? AND cons.conrelid = ?: :regclass 0.0ms | SELECT COUNT(*) FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind in (?,?) AND c.relname = ? AND n.nspname = ANY (current_schemas(?)) 0.0ms | SELECT * FROM posts JOIN users ON (posts.author_id = users.id) WHERE users.login = ?; 0.0ms | SET client_min_messages TO 'panic' 0.0ms | set client_encoding to 'UTF8' 0.0ms | SHOW client_min_messages 0.0ms | SELECT * FROM ad_reels WHERE id = ?; 0.0ms | SELECT * FROM posts WHERE guid = ?; 0.0ms | SELECT ? 0.0ms | SET client_min_messages TO 'warning' 0.0ms | SET standard_conforming_strings = on 0.0ms | SELECT "posts".* FROM "posts" ORDER BY "posts"."id" DESC LIMIT ? 0.0ms | SHOW TIME ZONE
  48. 48. Filtering & Regression Testing monitor.rb -d testdb -t posts AVG | QUERY -------------------------------------------------------------------------------- 0.0ms | SELECT * FROM posts JOIN users ON (posts.author_id = users.id) WHERE users.login = ?; 0.0ms | SELECT * FROM posts WHERE guid = ?; 0.0ms | SELECT "posts".* FROM "posts" ORDER BY "posts"."id" DESC LIMIT ?
  49. 49. Filtering & Regression Testing if cli.config[:table] q = PgQuery.parse(query["query"]) next unless q.tables.include?(cli.config[:table]) end
  50. 50. Regression Testing Filtering & Regression Testing
  51. 51. Which query plans are affected by removal of an index? ! How would execution plans be affected by an upgrade to 9.X? Filtering & Regression Testing
  52. 52. Regression Test based on pg_stat_statements + table statistics. ! (no actual data) Filtering & Regression Testing
  53. 53. Schema Dump + Table Level Statistics "n_live_tup": 75, "relpages": 1, "reltuples": 75.0, “stanumbers1": [..], "stavalues1": “{..}”, … Local Test Database Testing Setup Production Database EXPLAIN SELECT FROM x WHERE y = ? Filtering & Regression Testing
  54. 54. EXPLAIN SELECT * FROM x WHERE y = ? EXPLAIN SELECT * FROM x WHERE y = $1 ERROR: there is no parameter $1 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $1; Parse Analyze Plan Filtering & Regression Testing
  55. 55. y = $1 ERROR: there is no parameter $0 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $0; Filtering & Regression Testing
  56. 56. y = $1 ERROR: there is no parameter $0 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $0; y = NULL QUERY PLAN ---------------------------------------------------------------- Result (cost=0.00..21.60 rows=1 width=40) One-Time Filter: NULL::boolean -> Seq Scan on x (cost=0.00..21.60 rows=1 width=40) Filtering & Regression Testing
  57. 57. y = $1 ERROR: there is no parameter $0 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $0; y = NULL QUERY PLAN ---------------------------------------------------------------- Result (cost=0.00..21.60 rows=1 width=40) One-Time Filter: NULL::boolean -> Seq Scan on x (cost=0.00..21.60 rows=1 width=40) y = (SELECT null) ERROR: failed to find conversion function from unknown to integer Filtering & Regression Testing
  58. 58. y = $1 ERROR: there is no parameter $0 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $0; y = NULL QUERY PLAN ---------------------------------------------------------------- Result (cost=0.00..21.60 rows=1 width=40) One-Time Filter: NULL::boolean -> Seq Scan on x (cost=0.00..21.60 rows=1 width=40) y = (SELECT null) ERROR: failed to find conversion function from unknown to integer y = (SELECT null::integer) QUERY PLAN ---------------------------------------------------------------------- Index Scan using idx_for_y on x (cost=0.16..8.18 rows=1 width=144) Index Cond: (y = $0) InitPlan 1 (returns $0) -> Result (cost=0.00..0.01 rows=1 width=0) Filtering & Regression Testing
  59. 59. Finding out the type y = $1 ERROR: there is no parameter $1 LINE 1: EXPLAIN SELECT * FROM x WHERE y = $1; pg_prepared_statements PREPARE tmp AS SELECT * FROM x WHERE y = $1; SELECT unnest(parameter_types) AS data_type FROM pg_prepared_statements WHERE name = ‘tmp’; DEALLOCATE tmp; data_type ----------- integer Filtering & Regression Testing
  60. 60. EXPLAIN SELECT * FROM x WHERE y = ? EXPLAIN SELECT * FROM x WHERE y = $0 EXPLAIN SELECT * FROM x WHERE y = ((SELECT null::integer)::integer) QUERY PLAN --------------------------------------------------------------------- Index Scan using idx_for_y on x (cost=0.16..8.18 rows=1 width=144) Index Cond: (y = $0) InitPlan 1 (returns $0) -> Result (cost=0.00..0.01 rows=1 width=0) Parse Analyze Plan Filtering & Regression Testing
  61. 61. Open Issue: Planner reads actual physical size whilst planning Filtering & Regression Testing
  62. 62. github.com/pganalyze/pg_simulator Filtering & Regression Testing
  63. 63. Improving Data Quality pg_query Filtering & Regression Testing
  64. 64. 9.5 proposal for pg_s_s: Closing ! Instead of ? use $0 as replacement character - making the output parseable again.
  65. 65. 9.5 proposal for outfuncs.c: ! Generate automatically from struct definitions, cutting 3000 hand-written lines down to 1000. ! Add JSON output support. Closing
  66. 66. 9.X proposal: ! Consider adding a way to get a parsetree more easily. ! Via SQL / shared library / helper tool. Closing
  67. 67. Tools & libraries available at: Closing github.com/pganalyze
  68. 68. @LukasFittl Thank you! github.com/pganalyze pganalyze.com
  69. 69. Backup Slides
  70. 70. Classifying queries Improving Data Quality Frequent/OLTP vs analytical query

×