Short introduction to full-text search in PostgreSQL database system explaining mechanism behind ts_vector data type and built-in functions using it. Also shown how to integrate this feature with Django webframework using its object-relational mapper.
2. ● What is full-text search
● How it works in PostgreSQL
○ search
○ ranking
● How to use it in Django
● Questions
Agenda
3. Full text search refers to techniques for searching a
single computer-stored document or a collection in a
full text database.
https://en.wikipedia.org/wiki/Full_text_search
WHAT IS FULL TEXT SEARCH
5. SELECT *
FROM table
WHERE Col1 LIKE '%query%';
WHAT IS FULL TEXT SEARCH
SLOW, EXPENSIVE,
NO ORDERING BY RELEVANCE
● LIKE ‘%query’ can’t use index
● Col1 can be very long (eg. entire book)
6. SELECT to_tsvector(
'english',
'Try not to become a man of success, but rather try to become a man of
value'
);
to_tsvector
----------------------------------------------------------------------
'becom':4,13 'man':6,15 'rather':10 'success':8 'tri':1,11 'valu':17
(1 row)
HOW IT WORKS IN POSTGRESQL
PostgreSQL, please help!
TSVECTOR
Since PostgreSQL 8.3
7. select to_tsvector('If you can dream it, you can do it') @@ 'dream';
?column?
----------
t
(1 row)
select to_tsvector('It''s kind of fun to do the impossible') @@ 'impossible';
?column?
----------
f
(1 row)
HOW IT WORKS IN POSTGRESQL
Search Operator: @@
8. SELECT 'dream'::tsquery, to_tsquery('dream');
tsquery | to_tsquery
--------------+------------
'dream' | 'dream'
(1 row)
SELECT 'impossible'::tsquery, to_tsquery('impossible');
tsquery | to_tsquery
--------------+------------
'impossible' | 'imposs'
(1 row)
HOW IT WORKS IN POSTGRESQL
TO_TSQUERY function
9. SELECT to_tsvector('It''s kind of fun to do the impossible') @@ to_tsquery
('impossible');
?column?
----------
t
(1 row)
HOW IT WORKS IN POSTGRESQL
TO_TSQUERY function
10. SELECT to_tsvector('If the facts don't fit the theory, change the facts') @@
to_tsquery('! fact');
SELECT to_tsvector('If the facts don''t fit the theory, change the facts') @@
to_tsquery('theory & !fact');
SELECT to_tsvector('If the facts don''t fit the theory, change the facts.') @@
to_tsquery('fiction | theory');
HOW IT WORKS IN POSTGRESQL
Query Operators: ! & |
11. SELECT COUNT(*) FROM ticketing_event WHERE name ILIKE '%madonna%rebel%heart%
tour%';
Time: 78,083 ms
HOW IT WORKS IN POSTGRESQL
Some numbers
SELECT COUNT(*) FROM ticketing_event WHERE search_vector @@ 'madonna & rebel &
heart & tour'::tsquery;
Time: 30,065 ms
SELECT COUNT(*) FROM ticketing_event;
count
-------
68889
Time: 11,440 ms
12. SELECT post.id, setweight(to_tsvector(post.title), ‘A’) ||
setweight(to_tsvector(post.content), ‘B’) AS vector1
FROM post
WHERE vector1 @@ to_tsquery(‘Michael & Jackson’)
ORDER BY ts_rank(vector1, to_tsquery(‘Michael & Jackson’));
HOW IT WORKS IN POSTGRESQL
Ranking:
SETWEIGHT, TS_RANK functions
13. SELECT ts_rank(to_tsvector('This is an example of document'),
to_tsquery('example')) as relevancy;
relevancy
-----------
0.0607927
(1 row)
SELECT ts_rank(to_tsvector('This is an example of document'),
to_tsquery('example | unknown')) as relevancy;
relevancy
-----------
0.0303964
(1 row)
HOW IT WORKS IN POSTGRESQL
Ranking:
SETWEIGHT, TS_RANK functions
14. HOW TO USE IT IN DJANGO
● django-pg-fts
● djorm-ext-pgfulltext
15. HOW TO USE IT IN DJANGO
● django-pg-fts
● djorm-ext-pgfulltext
[WIP] Refs #3254 -- Add Full Text Search to contrib.postgres
17. HOW IT WORKS IN POSTGRESQL
SearchVector model field (stored)
class Post(models.Model):
title = models.CharField(max_length=100)
content = models.TextField()
search_vector = SearchVectorField()
Post.objects.filter(search_vector='Michael Jackson')
vector = SearchVector('title', weight=’A’) + SearchVector('content', weight=’B’)
post.search_vector = vector
post.save()
Update SearchVector field in post_save signal
18. HOW IT WORKS IN POSTGRESQL
django.contrib.postgres.search.SearchRank
queryset = Post.objects.annotate(
rank=SearchRank(
models.F('search_vector'),
SearchQuery('Michael Jackson')
),
)
queryset.filter(rank__gt=0.5).order_by('-rank')