SlideShare a Scribd company logo
1 of 64
Download to read offline
Indexes in Postgres
(the long story or crocodiles going to the dentist)
Louise Grandjonc
1
About me
Solutions Engineer at Citus Data
Previously lead python developer
Postgres enthusiast
@louisemeta on twitter
www.louisemeta.com
!2
What we’re going to talk about
1. What are indexes for?
2. Pages and CTIDs
3. B-Tree
4. GIN
5. GiST
6. SP-GiST
7. Brin
8. Hash
!3
First things first: the crocodiles
!4
• 250k crocodiles
• 100k birds
• 2M appointments
5
What are indexes for?
Constraints
!6
Some constraints transform into indexes.
- PRIMARY KEY
- UNIQUE
- EXCLUDE USING
"crocodile_pkey" PRIMARY KEY, btree (id)
"crocodile_email_uq" UNIQUE CONSTRAINT, btree (email)
Indexes:
"appointment_pkey" PRIMARY KEY, btree (id)
"appointment_crocodile_id_schedule_excl" EXCLUDE USING gist
(crocodile_id WITH =, schedule WITH &&)
In the crocodile table
In the appointment table
Query optimization
!7
Often the main reason why we create indexes
Why do indexes make queries faster
In an index, tuples (value, pointer) are stored.
Instead of reading the entire table for a value, you just go to the index (kind of like in an
encyclopedia)
8
Pages, heaps and their
pointers
Pages
!9
- PostgreSQL uses pages to store data from indexes or tables
- A page has a fixed size of 8kB
- A page has a header and items
- In an index, each item is a tuple (value, pointer)
- Each item in a page is referenced to with a pointer called ctid
- The ctid consist of two numbers, the number of the page (the block number) and the offset
of the item.
The ctid of the item with value 4 would be (3, 2).
10
pageinspect and gevel
Extensions to look into your index pages
pageinspect, gevel and a bit of python
!11
Page inspect is an extension that allows you to explore a bit what’s inside the pages
Functions for BTree, GIN, BRIN and Hash indexes
Gevel adds functions to GiST, SP-Gist and GIN.
Used them to generate pictures for BTree and GiST
https://github.com/louiseGrandjonc/pageinspect_inspector
12
B-Trees
B-Trees internal data structure - 1
!13
- A BTree in a balanced tree
- All the leaves are at equal distance from the root.
- A parent node can have multiple children minimizing the tree’s depth
- Postgres implements the Lehman & Yao Btree
Let’s say we would like to filter or order on the crocodile’s number of teeth.
CREATE INDEX ON crocodile (number_of_teeth);
B-Trees internal data structure - 2
Metapage
!14
The metapage is always the first page of a BTree index. It contains:
- The block number of the root page
- The level of the root
- A block number for the fast root
- The level of the fast root
B-Trees internal data structure - 2
Metapage
!15
SELECT * FROM bt_metap('crocodile_number_of_teeth_idx');
magic | version | root | level | fastroot | fastlevel
--------+---------+------+-------+----------+-----------
340322 | 2 | 290 | 2 | 290 | 2
(1 row)
Using page inspect, you can get the information on the metapage
B-Trees internal data structure - 3
Pages
!16
The root, the parents, and the leaves are all pages with the same structure.
Pages have:
- A block number, here the root block number is 290
- A high key
- A pointer to the next (right) and previous pages
- Items
B-Trees internal data structure - 4
Pages high key
!17
- High key is specific to Lehman & Yao BTrees
- Any item in the page will have a value lower or equal to the high key
- The root doesn’t have a high key
- The right-most page of a level doesn’t have a high key
And in page 575, there is no high key as it’s the
rightmost page.
In page 3, I will find crocodiles with 3 or less teeth
In page 289, with 31 and less
B-Trees internal data structure - 5
Next and previous pages pointers
!18
- Specificity of the Yao and Lehmann BTree
- Pages in the same level are in a linked list
Very useful for ORDER BY
For example:
SELECT number_of_teeth
FROM crocodile ORDER BY number_of_teeth ASC
Postgres would start at the first leaf page and thanks to the next
page pointer, has directly all rows in the right order.
B-Trees internal data structure - 6
Page inspect for BTree pages
!19
SELECT * FROM bt_page_stats(‘crocodile_number_of_teeth_idx’,
289);
-[ RECORD 1 ]-+-----
blkno | 289
type | i
live_items | 285
dead_items | 0
avg_item_size | 15
page_size | 8192
free_size | 2456
btpo_prev | 3
btpo_next | 575
btpo | 1
btpo_flags | 0
B-Trees internal data structure - 7
Items
!20
- Items have a value and a pointer
- In the parents, the ctid points to the child page
- In the parents, the value is the value of the first item in the child page
B-Trees internal data structure - 8
Items
!21
- In the leaves, the ctid is to the heap tuple in the table
- In the leaves it’s the value of the column(s) of the row
B-Trees internal data structure
To sum it up
!22
- A Btree is a balanced tree. PostgreSQL implements the Lehmann & Yao algorithm
- Metapage contains information on the root and fast root
- Root, parent, and leaves are pages.
- Each level is a linked list making it easier to move from one page to an other within the same level.
- Pages have a high key defining the biggest value in the page
- Pages have items pointing to an other page or the row.

B-Trees - Searching in a BTree
!23
1. Scan keys are created
2. Starting from the root until a leaf page
• Is moving to the right page necessary?
• If the page is a leaf, return the first item with a value
higher or equal to the scan key
• Binary search to find the right path to follow
• Descend to the child page and lock it
SELECT email FROM crocodile WHERE number_of_teeth >= 20;
B-Trees - Scan keys
!24
Postgres uses the query scan to define scankeys.
If possible, redundant keys in your query are eliminated to keep only
the tightest bounds.
The tightest bound is number_of_teeth > 5
SELECT email, number_of teeth FROM crocodile
WHERE number_of_teeth > 4 AND number_of_teeth > 5
ORDER BY number_of_teeth ASC;
email | number_of_teeth
----------------------------------------+-----------------
anne.chow222131@croco.com | 6
valentin.williams222154@croco.com | 6
pauline.lal222156@croco.com | 6
han.yadav232276@croco.com | 6
B-Trees - About read locks
!25
We put a read lock on the currently examined page.
Read locks  ensure that the  records on that page are not
modified while reading it.
There could still be a concurrent insert on a child page causing
a page split.
BTrees - Is moving right necessary?
!26
Concurrent insert while visiting the root:
SELECT email FROM crocodile WHERE number_of_teeth >= 20;
BTrees - Is moving right necessary?
!27
The new high key of child page is 19
So we need to move right to the page 840
B-Trees - Searching in a BTree
!28
1. Scan keys are created
2. Starting from the root until a leaf page
• Is moving to the right page necessary?
• If the page is a leaf, return the first item with a value
higher or equal to the scan key
• Binary search to find the right path to follow
• Descend to the child page and lock it
SELECT email FROM crocodile WHERE number_of_teeth >= 20;
BTrees - Inserting
!29
1. Find the right insert page
2. Lock the page
3. Check constraint
4. Split page if necessary and insert row
5. In case of page split, recursively insert a new
item in the parent level
BTrees -Inserting
Finding the right page
!30
Auto-incremented values:
Primary keys with a sequence for example, like the index crocodile_pkey.
New values will always be inserted in the right-most leaf page.
To avoid using the search algorithm, Postgres caches this page.
Non auto-incremented values:
The search algorithm is used to find the right leaf page.
BTrees -Inserting
Page split
!31
1. Is a split necessary?
If the free space on the target page is lower than the item’s size, then a split is necessary.
2. Finding the split point
Postgres wants to equalize the free space on each page to limit page splits in future inserts.
3. Splitting
BTrees - Deleting
!32
- Items are marked as deleted and will be ignored in future index scans until VACUUM
- A page is deleted only if all its items have been deleted.
- It is possible to end up with a tree with several levels with only one page.
- The fast root is used to optimize the search.
33
GIN
GIN
!34
- GIN (Generalized Inverted Index) 
- Used to index arrays, jsonb, and tsvector (for fulltext search) columns.
- Efficient for <@, &&, @@@ operators
New column healed_teeth (integer[])
 
Here is how to create the GIN index for this column
croco=# SELECT email, number_of_teeth, healed_teeth FROM crocodile WHERE id =1;
-[ RECORD 1 ]---+--------------------------------------------------------
email | louise.grandjonc1@croco.com
number_of_teeth | 58
healed_teeth | {16,11,55,27,22,41,38,2,5,40,52,57,28,50,10,15,1,12,46}
CREATE INDEX ON crocodile USING GIN(healed_teeth);
GIN
How is it different from a BTree? - Keys
!35
- GIN indexes are binary trees
- Just like BTree, their first page is a metapage
First difference: the keys
BTree index on healed_teeth
The indexed values are arrays
Seq Scan on crocodile (cost=…)
Filter: ('{1,2}'::integer[] <@ healed_teeth)
Rows Removed by Filter: 250728
Planning time: 0.157 ms
Execution time: 161.716 ms
(5 rows)
SELECT email FROM crocodile
WHERE ARRAY[1, 2] <@ healed_teeth;
GIN
How is it different from a BTree? - Keys
!36
- In a GIN index, the array is split and each value is an entry
- The values are unique
GIN
How is it different from a BTree? - Keys
!37
Bitmap Heap Scan on crocodile
(cost=516.59..6613.42 rows=54786 width=29)
(actual time=15.960..38.197 rows=73275 loops=1)
Recheck Cond: ('{1,2}'::integer[] <@ healed_teeth)
Heap Blocks: exact=4218
-> Bitmap Index Scan on crocodile_healed_teeth_idx
(cost=0.00..502.90 rows=54786 width=0)
(actual time=15.302..15.302 rows=73275 loops=1)
Index Cond: ('{1,2}'::integer[] <@ healed_teeth)
Planning time: 0.124 ms
Execution time: 41.018 ms
(7 rows)
Seq Scan on crocodile (cost=…)
Filter: ('{1,2}'::integer[] <@ healed_teeth)
Rows Removed by Filter: 250728
Planning time: 0.157 ms
Execution time: 161.716 ms
(5 rows)
GIN
How is it different from a BTree? Leaves
!38
- In a leaf page, the items contain a posting list of pointers to the rows in the table
- If the list can’t fit in the page, it becomes a posting tree
- In the leaf item remains a pointer to the posting tree
GIN
How is it different from a BTree? Pending list
!39
- To optimise inserts, we store the new entries in a pending list (linear list of pages)
- Entries are moved to the main tree on VACUUM or when the list is full
- You can disable the pending list by setting fastupdate to false (on CREATE or ALTER INDEX)
SELECT * FROM gin_metapage_info(get_raw_page('crocodile_healed_teeth_idx', 0));
-[ RECORD 1 ]----+-----------
pending_head | 4294967295
pending_tail | 4294967295
tail_free_size | 0
n_pending_pages | 0
n_pending_tuples | 0
n_total_pages | 358
n_entry_pages | 1
n_data_pages | 356
n_entries | 47
version | 2
GIN
To sum it up
!40
To sum up, a GIN index has:
- A metapage
- A BTree of key entries
- The values are unique in the main binary tree
- The leaves either contain a pointer to a posting tree, or a posting list of heap
pointers
- New rows go into a pending list until it’s full or VACUUM, that list needs to be
scanned while searching the index
41
GIST
GiST - keys
!42
Differences with a BTree index
- Data isn’t ordered
- The key ranges can overlap
Which means that a same value can be inserted in different pages
GiST - keys
!43
Differences with a BTree index
- Data isn’t ordered
- The key ranges can overlap
Which means that a same value can be inserted in different pages
Data isn’t ordered
GiST - keys
!44
A new appointment scheduled from
 August 14th 2014 7:30am to 8:30am
can be inserted in both pages.
CREATE INDEX ON appointment USING GIST(schedule)
Differences with a BTree index
- Data isn’t ordered
- The key ranges can overlap
Which means that a same value can be inserted in different pages
GiST - keys
!45
Differences with a BTree index
- Data isn’t ordered
- The key ranges can overlap
Which means that a same value can be inserted in different pages
A new appointment scheduled from
 August 14th 2014 7:30am to 8:30am
can be inserted in both pages.
CREATE INDEX ON appointment USING GIST(schedule)
GiST
key class functions
!46
GiST allows the development of custom data types with the appropriate access methods.
These functions are key class functions:
Union: used while inserting, if the range changed
Distance: used for ORDER BY and nearest neighbor, calculates the distance to the scan
key
GiST
key class functions - 2
!47
Consistent: returns MAYBE if the range contains the searched value, meaning that rows
could be in the page
Child pages could contain the appointments overlapping
[2018-05-17 08:00:00, 2018-05-17 13:00:00]
Consistent returns MAYBE
GiST - Searching
!48
SELECT c.email, schedule, done, emergency_level
FROM appointment
INNER JOIN crocodile c ON (c.id=crocodile_id)
WHERE schedule && '[2018-05-17 08:00:00,
2018-05-17 13:00:00]'::tstzrange
AND done IS FALSE
ORDER BY schedule DESC LIMIT 3;
1. Create a search queue of pages to explore with the root in it
2. While the search queue isn’t empty, pops a page
1. If the page is a leaf: update the bitmap with CTIDs of rows
2. Else, adds to the search queue the items where Consistent
returned MAYBE
GiST - Inserting
!49
A new item can be inserted in any page.
Penalty: key class function (defined by user) gives a number representing
how bad it would be to insert the value in the child page.
About page split:
Picksplit: makes groups with little distance
Performance of search will depend a lot of Picksplit
GiST - Inserting
!50
A new item can be inserted in any page.
Penalty: key class function (defined by user) gives a number representing
how bad it would be to insert the value in the child page.
About page split:
Picksplit: makes groups with little distance
Performance of search will depend a lot of Picksplit
To sum up
!51
- Useful for overlapping (geometries, array etc.)
- Nearest neighbor
- Can be used for full text search (tsvector, tsquery)
- Any data type can implement GiST as long as a few methods are available
52
SP-GiST
SP-GiST
Internal data structure
!53
- Not a balanced tree
- A same page can’t have inner tuples and leaf tuples
- Keys are decomposed
- In an inner tuple, the value is the prefix
- In a leaf tuple, the value is the rest (postfix)
P
L
A
Page blkno: 1
ABLO
UISE
RIAN
O
D
Page blkno: 8 Page blkno: 4
SP-GiST
Pages
!54
SELECT tid, level, leaf_value FROM spgist_print('crocodile_first_name_idx3') as t
(tid tid, a bool, n int, level int, p tid, pr text, l smallint, leaf_value text) ;
tid | level | leaf_value
----------+-------+------------
…
(4,36) | 2 | ablo
(4,57) | 2 | ustafa
(4,84) | 3 | rian
(4,153) | 3 | uise
…
Here are how the pages are
organized if we look into gevel’s
sp-gist functions for this index
SP-GiST
!55
- Can be used for points
- For text to search for prefix
- For non balanced data structures (k-d trees)
- Like GiST: allows the development of custom data types
56
BRIN
BRIN
Internal data structure
!57
- Block Range Index
- Not a binary tree
- Not even a tree
- Block range: group of pages physically adjacent
- For each block range: the range of values is stored
- BRIN indexes are very small
- Fast scanning on large tables
BRIN
Internal data structure
!58
SELECT * FROM brin_page_items(get_raw_page('appointment_created_at_idx', 2), 'appointment_created_at_idx');
itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
------------+--------+--------+----------+----------+-------------+---------------------------------------------------
1 | 0 | 1 | f | f | f | {2008-03-01 00:00:00-08 .. 2009-07-07 07:30:00-07}
2 | 128 | 1 | f | f | f | {2009-07-07 08:00:00-07 .. 2010-11-12 15:30:00-08}
3 | 256 | 1 | f | f | f | {2010-11-12 16:00:00-08 .. 2012-03-19 23:30:00-07}
4 | 384 | 1 | f | f | f | {2012-03-20 00:00:00-07 .. 2013-07-26 07:30:00-07}
5 | 512 | 1 | f | f | f | {2013-07-26 08:00:00-07 .. 2014-12-01 15:30:00-08}
SELECT id, created_at FROM appointment WHERE ctid='(0, 1)'::tid;
id | created_at
--------+------------------------
101375 | 2008-03-01 00:00:00-08
(1 row)
BRIN
Internal data structure
!59
SELECT * FROM brin_page_items(get_raw_page('crocodile_birthday_idx', 2),
'crocodile_birthday_idx');
itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
------------+--------+--------+----------+----------+-------------+----------------------------
1 | 0 | 1 | f | f | f | {1948-09-05 .. 2018-09-04}
2 | 128 | 1 | f | f | f | {1948-09-07 .. 2018-09-03}
3 | 256 | 1 | f | f | f | {1948-09-05 .. 2018-09-03}
4 | 384 | 1 | f | f | f | {1948-09-05 .. 2018-09-04}
5 | 512 | 1 | f | f | f | {1948-09-05 .. 2018-09-02}
6 | 640 | 1 | f | f | f | {1948-09-09 .. 2018-09-04}
…
(14 rows)
In this case, the values in birthday has no correlation with the physical
location, the index would not speed up the search as all pages would have
to be visited.
BRIN is interesting for data where the value is correlated with the
physical location.
BRIN
Warning on DELETE and INSERT
!60
SELECT * FROM brin_page_items(get_raw_page('appointment_created_at_idx', 2), 'appointment_created_at_idx');
itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
------------+--------+--------+----------+----------+-------------+---------------------------------------------------
1 | 0 | 1 | f | f | f | {2008-03-01 00:00:00-08 .. 2018-07-01 07:30:00-07}
2 | 128 | 1 | f | f | f | {2009-07-07 08:00:00-07 .. 2018-07-01 23:30:00-07}
3 | 256 | 1 | f | f | f | {2010-11-12 16:00:00-08 .. 2012-03-19 23:30:00-07}
4 | 384 | 1 | f | f | f | {2012-03-20 00:00:00-07 .. 2018-07-06 23:30:00-07}
DELETE FROM appointment WHERE created_at >= '2009-07-07' AND created_at < ‘2009-07-08';
DELETE FROM appointment WHERE created_at >= '2012-03-20' AND created_at < ‘2012-03-25';
Deleted and then vacuum on the appointment table
New rows are inserted in the free space after VACUUM
BRIN index has some ranges with big data ranges.
Search will visit a lot of pages.
61
HASH
Hash
Internal data structure
!62
- Only useful if you have a data not fitting
into a page (8kb)
- Only operator is =
- If you use a PG version < 10, it’s just awful
Conclusion
!63
- B-Tree
- Great for <, >, =, >=, <=
- GIN
- Fulltext search, jsonb, arrays
- ADD OPERATORS
- Inserts can be slow because of unicity of the
keys
- BRIN
- Great for huge table with correlation between
value and physical location
- <, >, =, >=, <=
- GiST
- Great for overlapping
- Using key class functions
- Can be implemented for any data type
- SP-Gist
- Also using key class function
- Decomposed keys
- Can be used for non balanced data
structures (k-d trees)
- Hash
- If you have a value > 8kB
- Only for =
Questions
!64
Thanks for your attention
Go read the articles www.louisemeta.com
Now only the ones on BTrees are published,
but I’ll announce the rest on twitter
@louisemeta
Crocodiles by https://www.instagram.com/zimmoriarty/?hl=en

More Related Content

What's hot

PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuningFederico Campoli
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Jaime Crespo
 
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우PgDay.Seoul
 
SQL Tutorial - How To Create, Drop, and Truncate Table
SQL Tutorial - How To Create, Drop, and Truncate TableSQL Tutorial - How To Create, Drop, and Truncate Table
SQL Tutorial - How To Create, Drop, and Truncate Table1keydata
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architectureBishal Khanal
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuningelliando dias
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDBMongoDB
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the RoadmapEDB
 
Introduction of sql server indexing
Introduction of sql server indexingIntroduction of sql server indexing
Introduction of sql server indexingMahabubur Rahaman
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresqlbotsplash.com
 
Postgres MVCC - A Developer Centric View of Multi Version Concurrency Control
Postgres MVCC - A Developer Centric View of Multi Version Concurrency ControlPostgres MVCC - A Developer Centric View of Multi Version Concurrency Control
Postgres MVCC - A Developer Centric View of Multi Version Concurrency ControlReactive.IO
 
Triggers in SQL | Edureka
Triggers in SQL | EdurekaTriggers in SQL | Edureka
Triggers in SQL | EdurekaEdureka!
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slidesmetsarin
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsFederico Razzoli
 
Postgresql Database Administration Basic - Day1
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1PoguttuezhiniVP
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
MySQL Query And Index Tuning
MySQL Query And Index TuningMySQL Query And Index Tuning
MySQL Query And Index TuningManikanda kumar
 

What's hot (20)

PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuning
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
 
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우
 
SQL Tutorial - How To Create, Drop, and Truncate Table
SQL Tutorial - How To Create, Drop, and Truncate TableSQL Tutorial - How To Create, Drop, and Truncate Table
SQL Tutorial - How To Create, Drop, and Truncate Table
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the Roadmap
 
Introduction of sql server indexing
Introduction of sql server indexingIntroduction of sql server indexing
Introduction of sql server indexing
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
Postgres MVCC - A Developer Centric View of Multi Version Concurrency Control
Postgres MVCC - A Developer Centric View of Multi Version Concurrency ControlPostgres MVCC - A Developer Centric View of Multi Version Concurrency Control
Postgres MVCC - A Developer Centric View of Multi Version Concurrency Control
 
Triggers in SQL | Edureka
Triggers in SQL | EdurekaTriggers in SQL | Edureka
Triggers in SQL | Edureka
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
InnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick FiguresInnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick Figures
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAs
 
Postgresql
PostgresqlPostgresql
Postgresql
 
Postgresql Database Administration Basic - Day1
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
MySQL Query And Index Tuning
MySQL Query And Index TuningMySQL Query And Index Tuning
MySQL Query And Index Tuning
 
MYSQL.ppt
MYSQL.pptMYSQL.ppt
MYSQL.ppt
 

Similar to Indexes in postgres

A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncCitus Data
 
Indexes in Postgres | PostgreSQL Conference Europe 2018 | Louise Grandjonc
Indexes in Postgres | PostgreSQL Conference Europe 2018 | Louise GrandjoncIndexes in Postgres | PostgreSQL Conference Europe 2018 | Louise Grandjonc
Indexes in Postgres | PostgreSQL Conference Europe 2018 | Louise GrandjoncCitus Data
 
Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)
Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)
Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)Ontico
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMarco Tusa
 
Optimal Binary Search tree ppt seminar.pptx
Optimal Binary Search tree ppt seminar.pptxOptimal Binary Search tree ppt seminar.pptx
Optimal Binary Search tree ppt seminar.pptxssusered44c8
 
Introduction to Search Systems - ScaleConf Colombia 2017
Introduction to Search Systems - ScaleConf Colombia 2017Introduction to Search Systems - ScaleConf Colombia 2017
Introduction to Search Systems - ScaleConf Colombia 2017Toria Gibbs
 
"MySQL Boosting - DB Best Practices & Optimization" by José Luis Martínez - C...
"MySQL Boosting - DB Best Practices & Optimization" by José Luis Martínez - C..."MySQL Boosting - DB Best Practices & Optimization" by José Luis Martínez - C...
"MySQL Boosting - DB Best Practices & Optimization" by José Luis Martínez - C...CAPSiDE
 
Postgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsPostgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsEDB
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MYXPLAIN
 
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Michael Mathioudakis
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL IndexingBADR
 
It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.Alex Powers
 
Esoteric Data structures
Esoteric Data structures Esoteric Data structures
Esoteric Data structures Mugisha Moses
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and developmentWes McKinney
 
Analytics: The Final Data Frontier (or, Why Users Need Your Data and How Pino...
Analytics: The Final Data Frontier (or, Why Users Need Your Data and How Pino...Analytics: The Final Data Frontier (or, Why Users Need Your Data and How Pino...
Analytics: The Final Data Frontier (or, Why Users Need Your Data and How Pino...HostedbyConfluent
 
Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL. Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL. Anastasia Lubennikova
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure Eman magdy
 

Similar to Indexes in postgres (20)

A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
 
Croco talk pgconfeu
Croco talk pgconfeuCroco talk pgconfeu
Croco talk pgconfeu
 
Indexes in Postgres | PostgreSQL Conference Europe 2018 | Louise Grandjonc
Indexes in Postgres | PostgreSQL Conference Europe 2018 | Louise GrandjoncIndexes in Postgres | PostgreSQL Conference Europe 2018 | Louise Grandjonc
Indexes in Postgres | PostgreSQL Conference Europe 2018 | Louise Grandjonc
 
Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)
Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)
Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pages
 
Optimal Binary Search tree ppt seminar.pptx
Optimal Binary Search tree ppt seminar.pptxOptimal Binary Search tree ppt seminar.pptx
Optimal Binary Search tree ppt seminar.pptx
 
Introduction to Search Systems - ScaleConf Colombia 2017
Introduction to Search Systems - ScaleConf Colombia 2017Introduction to Search Systems - ScaleConf Colombia 2017
Introduction to Search Systems - ScaleConf Colombia 2017
 
"MySQL Boosting - DB Best Practices & Optimization" by José Luis Martínez - C...
"MySQL Boosting - DB Best Practices & Optimization" by José Luis Martínez - C..."MySQL Boosting - DB Best Practices & Optimization" by José Luis Martínez - C...
"MySQL Boosting - DB Best Practices & Optimization" by José Luis Martínez - C...
 
Boosting MySQL (for starters)
Boosting MySQL (for starters)Boosting MySQL (for starters)
Boosting MySQL (for starters)
 
Postgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data ModelsPostgres Vision 2018: Five Sharding Data Models
Postgres Vision 2018: Five Sharding Data Models
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6
 
B+ trees and height balance tree
B+ trees and height balance treeB+ trees and height balance tree
B+ trees and height balance tree
 
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL Indexing
 
It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.
 
Esoteric Data structures
Esoteric Data structures Esoteric Data structures
Esoteric Data structures
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and development
 
Analytics: The Final Data Frontier (or, Why Users Need Your Data and How Pino...
Analytics: The Final Data Frontier (or, Why Users Need Your Data and How Pino...Analytics: The Final Data Frontier (or, Why Users Need Your Data and How Pino...
Analytics: The Final Data Frontier (or, Why Users Need Your Data and How Pino...
 
Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL. Btree. Explore the heart of PostgreSQL.
Btree. Explore the heart of PostgreSQL.
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure
 

More from Louise Grandjonc

Amazing SQL your django ORM can or can't do
Amazing SQL your django ORM can or can't doAmazing SQL your django ORM can or can't do
Amazing SQL your django ORM can or can't doLouise Grandjonc
 
Becoming a better developer with EXPLAIN
Becoming a better developer with EXPLAINBecoming a better developer with EXPLAIN
Becoming a better developer with EXPLAINLouise Grandjonc
 
The amazing world behind your ORM
The amazing world behind your ORMThe amazing world behind your ORM
The amazing world behind your ORMLouise Grandjonc
 
Meetup pg recherche fulltext ES -> PG
Meetup pg recherche fulltext ES -> PGMeetup pg recherche fulltext ES -> PG
Meetup pg recherche fulltext ES -> PGLouise Grandjonc
 

More from Louise Grandjonc (6)

Amazing SQL your django ORM can or can't do
Amazing SQL your django ORM can or can't doAmazing SQL your django ORM can or can't do
Amazing SQL your django ORM can or can't do
 
Pg exercices
Pg exercicesPg exercices
Pg exercices
 
Becoming a better developer with EXPLAIN
Becoming a better developer with EXPLAINBecoming a better developer with EXPLAIN
Becoming a better developer with EXPLAIN
 
The amazing world behind your ORM
The amazing world behind your ORMThe amazing world behind your ORM
The amazing world behind your ORM
 
Conf orm - explain
Conf orm - explainConf orm - explain
Conf orm - explain
 
Meetup pg recherche fulltext ES -> PG
Meetup pg recherche fulltext ES -> PGMeetup pg recherche fulltext ES -> PG
Meetup pg recherche fulltext ES -> PG
 

Recently uploaded

Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmDeepika Walanjkar
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming languageSmritiSharma901052
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
signals in triangulation .. ...Surveying
signals in triangulation .. ...Surveyingsignals in triangulation .. ...Surveying
signals in triangulation .. ...Surveyingsapna80328
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 

Recently uploaded (20)

Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming language
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
signals in triangulation .. ...Surveying
signals in triangulation .. ...Surveyingsignals in triangulation .. ...Surveying
signals in triangulation .. ...Surveying
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 

Indexes in postgres

  • 1. Indexes in Postgres (the long story or crocodiles going to the dentist) Louise Grandjonc 1
  • 2. About me Solutions Engineer at Citus Data Previously lead python developer Postgres enthusiast @louisemeta on twitter www.louisemeta.com !2
  • 3. What we’re going to talk about 1. What are indexes for? 2. Pages and CTIDs 3. B-Tree 4. GIN 5. GiST 6. SP-GiST 7. Brin 8. Hash !3
  • 4. First things first: the crocodiles !4 • 250k crocodiles • 100k birds • 2M appointments
  • 6. Constraints !6 Some constraints transform into indexes. - PRIMARY KEY - UNIQUE - EXCLUDE USING "crocodile_pkey" PRIMARY KEY, btree (id) "crocodile_email_uq" UNIQUE CONSTRAINT, btree (email) Indexes: "appointment_pkey" PRIMARY KEY, btree (id) "appointment_crocodile_id_schedule_excl" EXCLUDE USING gist (crocodile_id WITH =, schedule WITH &&) In the crocodile table In the appointment table
  • 7. Query optimization !7 Often the main reason why we create indexes Why do indexes make queries faster In an index, tuples (value, pointer) are stored. Instead of reading the entire table for a value, you just go to the index (kind of like in an encyclopedia)
  • 8. 8 Pages, heaps and their pointers
  • 9. Pages !9 - PostgreSQL uses pages to store data from indexes or tables - A page has a fixed size of 8kB - A page has a header and items - In an index, each item is a tuple (value, pointer) - Each item in a page is referenced to with a pointer called ctid - The ctid consist of two numbers, the number of the page (the block number) and the offset of the item. The ctid of the item with value 4 would be (3, 2).
  • 10. 10 pageinspect and gevel Extensions to look into your index pages
  • 11. pageinspect, gevel and a bit of python !11 Page inspect is an extension that allows you to explore a bit what’s inside the pages Functions for BTree, GIN, BRIN and Hash indexes Gevel adds functions to GiST, SP-Gist and GIN. Used them to generate pictures for BTree and GiST https://github.com/louiseGrandjonc/pageinspect_inspector
  • 13. B-Trees internal data structure - 1 !13 - A BTree in a balanced tree - All the leaves are at equal distance from the root. - A parent node can have multiple children minimizing the tree’s depth - Postgres implements the Lehman & Yao Btree Let’s say we would like to filter or order on the crocodile’s number of teeth. CREATE INDEX ON crocodile (number_of_teeth);
  • 14. B-Trees internal data structure - 2 Metapage !14 The metapage is always the first page of a BTree index. It contains: - The block number of the root page - The level of the root - A block number for the fast root - The level of the fast root
  • 15. B-Trees internal data structure - 2 Metapage !15 SELECT * FROM bt_metap('crocodile_number_of_teeth_idx'); magic | version | root | level | fastroot | fastlevel --------+---------+------+-------+----------+----------- 340322 | 2 | 290 | 2 | 290 | 2 (1 row) Using page inspect, you can get the information on the metapage
  • 16. B-Trees internal data structure - 3 Pages !16 The root, the parents, and the leaves are all pages with the same structure. Pages have: - A block number, here the root block number is 290 - A high key - A pointer to the next (right) and previous pages - Items
  • 17. B-Trees internal data structure - 4 Pages high key !17 - High key is specific to Lehman & Yao BTrees - Any item in the page will have a value lower or equal to the high key - The root doesn’t have a high key - The right-most page of a level doesn’t have a high key And in page 575, there is no high key as it’s the rightmost page. In page 3, I will find crocodiles with 3 or less teeth In page 289, with 31 and less
  • 18. B-Trees internal data structure - 5 Next and previous pages pointers !18 - Specificity of the Yao and Lehmann BTree - Pages in the same level are in a linked list Very useful for ORDER BY For example: SELECT number_of_teeth FROM crocodile ORDER BY number_of_teeth ASC Postgres would start at the first leaf page and thanks to the next page pointer, has directly all rows in the right order.
  • 19. B-Trees internal data structure - 6 Page inspect for BTree pages !19 SELECT * FROM bt_page_stats(‘crocodile_number_of_teeth_idx’, 289); -[ RECORD 1 ]-+----- blkno | 289 type | i live_items | 285 dead_items | 0 avg_item_size | 15 page_size | 8192 free_size | 2456 btpo_prev | 3 btpo_next | 575 btpo | 1 btpo_flags | 0
  • 20. B-Trees internal data structure - 7 Items !20 - Items have a value and a pointer - In the parents, the ctid points to the child page - In the parents, the value is the value of the first item in the child page
  • 21. B-Trees internal data structure - 8 Items !21 - In the leaves, the ctid is to the heap tuple in the table - In the leaves it’s the value of the column(s) of the row
  • 22. B-Trees internal data structure To sum it up !22 - A Btree is a balanced tree. PostgreSQL implements the Lehmann & Yao algorithm - Metapage contains information on the root and fast root - Root, parent, and leaves are pages. - Each level is a linked list making it easier to move from one page to an other within the same level. - Pages have a high key defining the biggest value in the page - Pages have items pointing to an other page or the row.

  • 23. B-Trees - Searching in a BTree !23 1. Scan keys are created 2. Starting from the root until a leaf page • Is moving to the right page necessary? • If the page is a leaf, return the first item with a value higher or equal to the scan key • Binary search to find the right path to follow • Descend to the child page and lock it SELECT email FROM crocodile WHERE number_of_teeth >= 20;
  • 24. B-Trees - Scan keys !24 Postgres uses the query scan to define scankeys. If possible, redundant keys in your query are eliminated to keep only the tightest bounds. The tightest bound is number_of_teeth > 5 SELECT email, number_of teeth FROM crocodile WHERE number_of_teeth > 4 AND number_of_teeth > 5 ORDER BY number_of_teeth ASC; email | number_of_teeth ----------------------------------------+----------------- anne.chow222131@croco.com | 6 valentin.williams222154@croco.com | 6 pauline.lal222156@croco.com | 6 han.yadav232276@croco.com | 6
  • 25. B-Trees - About read locks !25 We put a read lock on the currently examined page. Read locks  ensure that the  records on that page are not modified while reading it. There could still be a concurrent insert on a child page causing a page split.
  • 26. BTrees - Is moving right necessary? !26 Concurrent insert while visiting the root: SELECT email FROM crocodile WHERE number_of_teeth >= 20;
  • 27. BTrees - Is moving right necessary? !27 The new high key of child page is 19 So we need to move right to the page 840
  • 28. B-Trees - Searching in a BTree !28 1. Scan keys are created 2. Starting from the root until a leaf page • Is moving to the right page necessary? • If the page is a leaf, return the first item with a value higher or equal to the scan key • Binary search to find the right path to follow • Descend to the child page and lock it SELECT email FROM crocodile WHERE number_of_teeth >= 20;
  • 29. BTrees - Inserting !29 1. Find the right insert page 2. Lock the page 3. Check constraint 4. Split page if necessary and insert row 5. In case of page split, recursively insert a new item in the parent level
  • 30. BTrees -Inserting Finding the right page !30 Auto-incremented values: Primary keys with a sequence for example, like the index crocodile_pkey. New values will always be inserted in the right-most leaf page. To avoid using the search algorithm, Postgres caches this page. Non auto-incremented values: The search algorithm is used to find the right leaf page.
  • 31. BTrees -Inserting Page split !31 1. Is a split necessary? If the free space on the target page is lower than the item’s size, then a split is necessary. 2. Finding the split point Postgres wants to equalize the free space on each page to limit page splits in future inserts. 3. Splitting
  • 32. BTrees - Deleting !32 - Items are marked as deleted and will be ignored in future index scans until VACUUM - A page is deleted only if all its items have been deleted. - It is possible to end up with a tree with several levels with only one page. - The fast root is used to optimize the search.
  • 34. GIN !34 - GIN (Generalized Inverted Index)  - Used to index arrays, jsonb, and tsvector (for fulltext search) columns. - Efficient for <@, &&, @@@ operators New column healed_teeth (integer[])   Here is how to create the GIN index for this column croco=# SELECT email, number_of_teeth, healed_teeth FROM crocodile WHERE id =1; -[ RECORD 1 ]---+-------------------------------------------------------- email | louise.grandjonc1@croco.com number_of_teeth | 58 healed_teeth | {16,11,55,27,22,41,38,2,5,40,52,57,28,50,10,15,1,12,46} CREATE INDEX ON crocodile USING GIN(healed_teeth);
  • 35. GIN How is it different from a BTree? - Keys !35 - GIN indexes are binary trees - Just like BTree, their first page is a metapage First difference: the keys BTree index on healed_teeth The indexed values are arrays Seq Scan on crocodile (cost=…) Filter: ('{1,2}'::integer[] <@ healed_teeth) Rows Removed by Filter: 250728 Planning time: 0.157 ms Execution time: 161.716 ms (5 rows) SELECT email FROM crocodile WHERE ARRAY[1, 2] <@ healed_teeth;
  • 36. GIN How is it different from a BTree? - Keys !36 - In a GIN index, the array is split and each value is an entry - The values are unique
  • 37. GIN How is it different from a BTree? - Keys !37 Bitmap Heap Scan on crocodile (cost=516.59..6613.42 rows=54786 width=29) (actual time=15.960..38.197 rows=73275 loops=1) Recheck Cond: ('{1,2}'::integer[] <@ healed_teeth) Heap Blocks: exact=4218 -> Bitmap Index Scan on crocodile_healed_teeth_idx (cost=0.00..502.90 rows=54786 width=0) (actual time=15.302..15.302 rows=73275 loops=1) Index Cond: ('{1,2}'::integer[] <@ healed_teeth) Planning time: 0.124 ms Execution time: 41.018 ms (7 rows) Seq Scan on crocodile (cost=…) Filter: ('{1,2}'::integer[] <@ healed_teeth) Rows Removed by Filter: 250728 Planning time: 0.157 ms Execution time: 161.716 ms (5 rows)
  • 38. GIN How is it different from a BTree? Leaves !38 - In a leaf page, the items contain a posting list of pointers to the rows in the table - If the list can’t fit in the page, it becomes a posting tree - In the leaf item remains a pointer to the posting tree
  • 39. GIN How is it different from a BTree? Pending list !39 - To optimise inserts, we store the new entries in a pending list (linear list of pages) - Entries are moved to the main tree on VACUUM or when the list is full - You can disable the pending list by setting fastupdate to false (on CREATE or ALTER INDEX) SELECT * FROM gin_metapage_info(get_raw_page('crocodile_healed_teeth_idx', 0)); -[ RECORD 1 ]----+----------- pending_head | 4294967295 pending_tail | 4294967295 tail_free_size | 0 n_pending_pages | 0 n_pending_tuples | 0 n_total_pages | 358 n_entry_pages | 1 n_data_pages | 356 n_entries | 47 version | 2
  • 40. GIN To sum it up !40 To sum up, a GIN index has: - A metapage - A BTree of key entries - The values are unique in the main binary tree - The leaves either contain a pointer to a posting tree, or a posting list of heap pointers - New rows go into a pending list until it’s full or VACUUM, that list needs to be scanned while searching the index
  • 42. GiST - keys !42 Differences with a BTree index - Data isn’t ordered - The key ranges can overlap Which means that a same value can be inserted in different pages
  • 43. GiST - keys !43 Differences with a BTree index - Data isn’t ordered - The key ranges can overlap Which means that a same value can be inserted in different pages Data isn’t ordered
  • 44. GiST - keys !44 A new appointment scheduled from  August 14th 2014 7:30am to 8:30am can be inserted in both pages. CREATE INDEX ON appointment USING GIST(schedule) Differences with a BTree index - Data isn’t ordered - The key ranges can overlap Which means that a same value can be inserted in different pages
  • 45. GiST - keys !45 Differences with a BTree index - Data isn’t ordered - The key ranges can overlap Which means that a same value can be inserted in different pages A new appointment scheduled from  August 14th 2014 7:30am to 8:30am can be inserted in both pages. CREATE INDEX ON appointment USING GIST(schedule)
  • 46. GiST key class functions !46 GiST allows the development of custom data types with the appropriate access methods. These functions are key class functions: Union: used while inserting, if the range changed Distance: used for ORDER BY and nearest neighbor, calculates the distance to the scan key
  • 47. GiST key class functions - 2 !47 Consistent: returns MAYBE if the range contains the searched value, meaning that rows could be in the page Child pages could contain the appointments overlapping [2018-05-17 08:00:00, 2018-05-17 13:00:00] Consistent returns MAYBE
  • 48. GiST - Searching !48 SELECT c.email, schedule, done, emergency_level FROM appointment INNER JOIN crocodile c ON (c.id=crocodile_id) WHERE schedule && '[2018-05-17 08:00:00, 2018-05-17 13:00:00]'::tstzrange AND done IS FALSE ORDER BY schedule DESC LIMIT 3; 1. Create a search queue of pages to explore with the root in it 2. While the search queue isn’t empty, pops a page 1. If the page is a leaf: update the bitmap with CTIDs of rows 2. Else, adds to the search queue the items where Consistent returned MAYBE
  • 49. GiST - Inserting !49 A new item can be inserted in any page. Penalty: key class function (defined by user) gives a number representing how bad it would be to insert the value in the child page. About page split: Picksplit: makes groups with little distance Performance of search will depend a lot of Picksplit
  • 50. GiST - Inserting !50 A new item can be inserted in any page. Penalty: key class function (defined by user) gives a number representing how bad it would be to insert the value in the child page. About page split: Picksplit: makes groups with little distance Performance of search will depend a lot of Picksplit
  • 51. To sum up !51 - Useful for overlapping (geometries, array etc.) - Nearest neighbor - Can be used for full text search (tsvector, tsquery) - Any data type can implement GiST as long as a few methods are available
  • 53. SP-GiST Internal data structure !53 - Not a balanced tree - A same page can’t have inner tuples and leaf tuples - Keys are decomposed - In an inner tuple, the value is the prefix - In a leaf tuple, the value is the rest (postfix)
  • 54. P L A Page blkno: 1 ABLO UISE RIAN O D Page blkno: 8 Page blkno: 4 SP-GiST Pages !54 SELECT tid, level, leaf_value FROM spgist_print('crocodile_first_name_idx3') as t (tid tid, a bool, n int, level int, p tid, pr text, l smallint, leaf_value text) ; tid | level | leaf_value ----------+-------+------------ … (4,36) | 2 | ablo (4,57) | 2 | ustafa (4,84) | 3 | rian (4,153) | 3 | uise … Here are how the pages are organized if we look into gevel’s sp-gist functions for this index
  • 55. SP-GiST !55 - Can be used for points - For text to search for prefix - For non balanced data structures (k-d trees) - Like GiST: allows the development of custom data types
  • 57. BRIN Internal data structure !57 - Block Range Index - Not a binary tree - Not even a tree - Block range: group of pages physically adjacent - For each block range: the range of values is stored - BRIN indexes are very small - Fast scanning on large tables
  • 58. BRIN Internal data structure !58 SELECT * FROM brin_page_items(get_raw_page('appointment_created_at_idx', 2), 'appointment_created_at_idx'); itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value ------------+--------+--------+----------+----------+-------------+--------------------------------------------------- 1 | 0 | 1 | f | f | f | {2008-03-01 00:00:00-08 .. 2009-07-07 07:30:00-07} 2 | 128 | 1 | f | f | f | {2009-07-07 08:00:00-07 .. 2010-11-12 15:30:00-08} 3 | 256 | 1 | f | f | f | {2010-11-12 16:00:00-08 .. 2012-03-19 23:30:00-07} 4 | 384 | 1 | f | f | f | {2012-03-20 00:00:00-07 .. 2013-07-26 07:30:00-07} 5 | 512 | 1 | f | f | f | {2013-07-26 08:00:00-07 .. 2014-12-01 15:30:00-08} SELECT id, created_at FROM appointment WHERE ctid='(0, 1)'::tid; id | created_at --------+------------------------ 101375 | 2008-03-01 00:00:00-08 (1 row)
  • 59. BRIN Internal data structure !59 SELECT * FROM brin_page_items(get_raw_page('crocodile_birthday_idx', 2), 'crocodile_birthday_idx'); itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value ------------+--------+--------+----------+----------+-------------+---------------------------- 1 | 0 | 1 | f | f | f | {1948-09-05 .. 2018-09-04} 2 | 128 | 1 | f | f | f | {1948-09-07 .. 2018-09-03} 3 | 256 | 1 | f | f | f | {1948-09-05 .. 2018-09-03} 4 | 384 | 1 | f | f | f | {1948-09-05 .. 2018-09-04} 5 | 512 | 1 | f | f | f | {1948-09-05 .. 2018-09-02} 6 | 640 | 1 | f | f | f | {1948-09-09 .. 2018-09-04} … (14 rows) In this case, the values in birthday has no correlation with the physical location, the index would not speed up the search as all pages would have to be visited. BRIN is interesting for data where the value is correlated with the physical location.
  • 60. BRIN Warning on DELETE and INSERT !60 SELECT * FROM brin_page_items(get_raw_page('appointment_created_at_idx', 2), 'appointment_created_at_idx'); itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value ------------+--------+--------+----------+----------+-------------+--------------------------------------------------- 1 | 0 | 1 | f | f | f | {2008-03-01 00:00:00-08 .. 2018-07-01 07:30:00-07} 2 | 128 | 1 | f | f | f | {2009-07-07 08:00:00-07 .. 2018-07-01 23:30:00-07} 3 | 256 | 1 | f | f | f | {2010-11-12 16:00:00-08 .. 2012-03-19 23:30:00-07} 4 | 384 | 1 | f | f | f | {2012-03-20 00:00:00-07 .. 2018-07-06 23:30:00-07} DELETE FROM appointment WHERE created_at >= '2009-07-07' AND created_at < ‘2009-07-08'; DELETE FROM appointment WHERE created_at >= '2012-03-20' AND created_at < ‘2012-03-25'; Deleted and then vacuum on the appointment table New rows are inserted in the free space after VACUUM BRIN index has some ranges with big data ranges. Search will visit a lot of pages.
  • 62. Hash Internal data structure !62 - Only useful if you have a data not fitting into a page (8kb) - Only operator is = - If you use a PG version < 10, it’s just awful
  • 63. Conclusion !63 - B-Tree - Great for <, >, =, >=, <= - GIN - Fulltext search, jsonb, arrays - ADD OPERATORS - Inserts can be slow because of unicity of the keys - BRIN - Great for huge table with correlation between value and physical location - <, >, =, >=, <= - GiST - Great for overlapping - Using key class functions - Can be implemented for any data type - SP-Gist - Also using key class function - Decomposed keys - Can be used for non balanced data structures (k-d trees) - Hash - If you have a value > 8kB - Only for =
  • 64. Questions !64 Thanks for your attention Go read the articles www.louisemeta.com Now only the ones on BTrees are published, but I’ll announce the rest on twitter @louisemeta Crocodiles by https://www.instagram.com/zimmoriarty/?hl=en