SlideShare a Scribd company logo
1 of 47
Download to read offline
Dive into
full text search
with Python
Andrii Soldatenko
18-19 September 2015
@a_soldatenko
About me:
• Lead QA Automation Engineer at
• Backend Python Developer at
• Speaker at PyCon Ukraine 2014
• Speaker at PyCon Belarus 2015
• @a_soldatenko
Preface
Information Explosion
Text Search
grep	
  -­‐-­‐ignore-­‐case	
  -­‐-­‐recursive	
  foo	
  books/	
  
grep	
  -­‐-­‐ignore-­‐case	
  -­‐-­‐recursive	
  -­‐-­‐file=words.txt	
  books/
Entry.objects.get(headline__icontains='foo')	
  
words	
  =	
  []	
  
with	
  open('words.txt',	
  'r')	
  as	
  f:	
  
	
  	
  	
  	
  words	
  =	
  f.readlines()	
  
Entry.objects.get(headline__icontains_in=words)
Full text search
Search index
Simple sentences
1. The quick brown fox jumped over the lazy dog
2. Quick brown foxes leap over lazy dogs in summer
Inverted index
Term	
  	
  	
  	
  	
  	
  Doc_1	
  	
  Doc_2	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
Quick	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
The	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
brown	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
dog	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
dogs	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
fox	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
foxes	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
in	
  	
  	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
jumped	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
lazy	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
leap	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
over	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
quick	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
summer	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
the	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐
Inverted index
Term	
  	
  	
  	
  	
  	
  Doc_1	
  	
  Doc_2	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
brown	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
quick	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
Total	
  	
  	
  |	
  	
  	
  2	
  	
  	
  |	
  	
  1
Inverted index:
normalization
Term	
  	
  	
  	
  	
  	
  Doc_1	
  	
  Doc_2	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
brown	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
dog	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
fox	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
in	
  	
  	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
jump	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
lazy	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
over	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
quick	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
summer	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
the	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐
Term	
  	
  	
  	
  	
  	
  Doc_1	
  	
  Doc_2	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
Quick	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
The	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
brown	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
dog	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
dogs	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
fox	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
foxes	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
in	
  	
  	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
jumped	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
lazy	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
leap	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
over	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
quick	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
summer	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
the	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐
Search Engines
PostgreSQL
PostgreSQL:

operators for textual data types
-­‐-­‐-­‐	
  PostgreSQL	
  has	
  operators	
  for	
  textual	
  data	
  types:	
  
-­‐-­‐-­‐	
  LIKE	
  -­‐	
  match	
  case-­‐sensitive	
  
-­‐-­‐-­‐	
  ILIKE	
  -­‐	
  match	
  case-­‐insensitive	
  
-­‐-­‐-­‐	
  ~	
  -­‐	
  Matches	
  POSIX	
  regular	
  expression,	
  case-­‐sensitive	
  
-­‐-­‐-­‐	
  ~*	
  -­‐	
  Matches	
  POSIX	
  regular	
  expression,	
  case-­‐insensitive	
  
select	
  'foo'	
  LIKE	
  'foo';	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  true	
  
select	
  'bar'	
  ILIKE	
  'BAR';	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  true	
  	
  
select	
  'abc'	
  LIKE	
  'b';	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  true	
  
select	
  'abc'	
  LIKE	
  'c';	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  false	
  
select	
  'abc'	
  ~	
  'abc';	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  true	
  
select	
  'abc'	
  ~	
  '^a';	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  true	
  
select	
  'abc'	
  ~	
  '(b|d)';	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  true	
  
select	
  'abc'	
  ~	
  '^(b|c)';	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  false	
  
select	
  'andrii'	
  ~*	
  '.*Andrii.*';	
  -­‐-­‐	
  true
PostgreSQL:

accuracy issue
select	
  'prone'	
  like	
  '%one%';	
  -­‐-­‐true	
  	
  
select	
  'money'	
  like	
  '%one%';	
  -­‐-­‐true	
  	
  
select	
  'lonely'	
  like	
  '%one%';	
  -­‐-­‐true	
  	
  
Full text search in
PostgreSQL
1. Creating tokens
2. Converting tokens into Lexemes
3. Storing preprocessed documents
Full text search in
PostgreSQL
27 built-in configurations for 10 languages
Support of user-defined FTS configurations
Pluggable dictionaries, parsers
Inverted indexes
functions to convert
normal text to tsvector
explain	
  SELECT	
  'a	
  fat	
  cat	
  sat	
  on	
  a	
  mat	
  and	
  ate	
  a	
  fat	
  rat'::tsvector	
  @@	
  	
  
	
  	
  	
  	
  	
  	
  	
  'cat	
  &	
  rat’::tsquery;	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  QUERY	
  PLAN	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
	
  Result	
  	
  (cost=0.00..0.01	
  rows=1	
  width=0)	
  
(1	
  row)	
  
explain	
  SELECT	
  'fat	
  &	
  cow'::tsquery	
  @@	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  'a	
  fat	
  cat	
  sat	
  on	
  a	
  mat	
  and	
  ate	
  a	
  fat	
  rat'::tsvector;	
  -­‐-­‐	
  false	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  QUERY	
  PLAN	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
	
  Result	
  	
  (cost=0.00..0.01	
  rows=1	
  width=0)	
  
(1	
  row)
PostgreSQL:

index management
CREATE	
  FUNCTION	
  notes_vector_update()	
  RETURNS	
  TRIGGER	
  AS	
  $$	
  
BEGIN	
  
	
  	
  	
  	
  IF	
  TG_OP	
  =	
  'INSERT'	
  THEN	
  
	
  	
  	
  	
  	
  	
  	
  	
  new.search_index	
  =	
  to_tsvector('pg_catalog.english',	
  COALESCE(NEW.name,	
  ''));	
  
	
  	
  	
  	
  END	
  IF;	
  
	
  	
  	
  	
  IF	
  TG_OP	
  =	
  'UPDATE'	
  THEN	
  
	
  	
  	
  	
  	
  	
  	
  	
  IF	
  NEW.name	
  <>	
  OLD.name	
  THEN	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  new.search_index	
  =	
  to_tsvector('pg_catalog.english',	
  COALESCE(NEW.name,	
  ''));	
  
	
  	
  	
  	
  	
  	
  	
  	
  END	
  IF;	
  
	
  	
  	
  	
  END	
  IF;	
  
	
  	
  	
  	
  RETURN	
  NEW;	
  
END	
  
$$	
  LANGUAGE	
  'plpgsql';	
  
PostgreSQL:

stopwords
SELECT	
  to_tsvector('english','in	
  the	
  list	
  of	
  stop	
  words');	
  
	
  	
  	
  	
  	
  	
  	
  to_tsvector	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
	
  'list':3	
  'stop':5	
  'word':6
/usr/pgsql-9.3/share/tsearch_data/english.stop
Django:
Malcolm Tredinnick's Advice
on Writing SQL in Django :
“︎If you need to write advanced SQL you should write it.
I would balance that by cautioning against
overuse of the raw() and extra() methods.”
PostgreSQL full-text search
integration with django orm
https://github.com/linuxlewis/djorm-ext-pgfulltext
from	
  djorm_pgfulltext.models	
  import	
  SearchManager	
  
from	
  djorm_pgfulltext.fields	
  import	
  VectorField	
  
from	
  django.db	
  import	
  models	
  
class	
  Page(models.Model):	
  
	
  	
  	
  	
  name	
  =	
  models.CharField(max_length=200)	
  
	
  	
  	
  	
  description	
  =	
  models.TextField()	
  
	
  	
  	
  	
  search_index	
  =	
  VectorField()	
  
	
  	
  	
  	
  objects	
  =	
  SearchManager(	
  
	
  	
  	
  	
  	
  	
  	
  	
  fields	
  =	
  ('name',	
  'description'),	
  
	
  	
  	
  	
  	
  	
  	
  	
  config	
  =	
  'pg_catalog.english',	
  #	
  this	
  is	
  default	
  
	
  	
  	
  	
  	
  	
  	
  	
  search_field	
  =	
  'search_index',	
  #	
  this	
  is	
  default	
  
	
  	
  	
  	
  	
  	
  	
  	
  auto_update_search_field	
  =	
  True	
  
	
  	
  	
  	
  )
For search just use search
method of the manager
https://github.com/linuxlewis/djorm-ext-pgfulltext
>>>	
  Page.objects.search("documentation	
  &	
  about")	
  
[<Page:	
  Page:	
  Home	
  page>]	
  
>>>	
  Page.objects.search("about	
  |	
  documentation	
  |	
  django	
  |	
  home",	
  raw=True)	
  
[<Page:	
  Page:	
  Home	
  page>,	
  <Page:	
  Page:	
  About>,	
  <Page:	
  Page:	
  Navigation>]
Second way
class	
  Page(models.Model):	
  
	
  	
  	
  	
  name	
  =	
  models.CharField(max_length=200)	
  
	
  	
  	
  	
  description	
  =	
  models.TextField()	
  
	
  	
  	
  	
  objects	
  =	
  SearchManager(fields=None,	
  search_field=None)	
  
>>>	
  Page.objects.search("documentation	
  &	
  about",	
  fields=('name',	
  
'description'))	
  
[<Page:	
  Page:	
  Home	
  page>]	
  
>>>	
  Page.objects.search("about	
  |	
  documentation	
  |	
  django	
  |	
  home",	
  
raw=True,	
  fields=('name',	
  'description'))	
  
[<Page:	
  Page:	
  Home	
  page>,	
  <Page:	
  Page:	
  About>,	
  <Page:	
  Page:	
  
Navigation>]
Pros and Cons
Pros:
• Quick implementation
• No dependency
Cons:
• Need manually manage indexes
• Not as flexible as pure search engines
• tied to PostgreSQL
• no analytics data
• no DSL only `&` and `|` queries
• difficult to manage stop words
ElasticSearch
Who uses ElasticSearch?
ElasticSearch:
Quick Intro
Relational DB Databases TablesRows Columns
ElasticSearch Indices FieldsTypes Documents
ElasticSearch:
Quick Intro
PUT	
  /haystack/user/1	
  
{	
  
	
  	
  	
  	
  "first_name"	
  :	
  "Andrii",	
  
	
  	
  	
  	
  "last_name"	
  :	
  	
  "Soldatenko",	
  
	
  	
  	
  	
  "age"	
  :	
  	
  	
  	
  	
  	
  	
  	
  30,	
  
	
  	
  	
  	
  "about"	
  :	
  	
  	
  	
  	
  	
  "I	
  love	
  to	
  go	
  rock	
  climbing",	
  
	
  	
  	
  	
  "interests":	
  [	
  "sports",	
  "music"	
  ],	
  
	
  	
  	
  	
  "likes":	
  [	
  "python",	
  "django"	
  ]	
  
}
ElasticSearch:
Locks
•Pessimistic concurrency control
•Optimistic concurrency control
ElasticSearch:
Setup
#!/bin/bash	
  
VERSION=1.7.1	
  
curl	
  -­‐L	
  -­‐O	
  https://download.elastic.co/elasticsearch/elasticsearch/
elasticsearch-­‐$VERSION.zip	
  
unzip	
  elasticsearch-­‐$VERSION.zip	
  
cd	
  elasticsearch-­‐$VERSION	
  
#	
  Download	
  plugin	
  marvel	
  
./bin/plugin	
  -­‐i	
  elasticsearch/marvel/latest	
  
echo	
  'marvel.agent.enabled:	
  false'	
  >>	
  ./config/elasticsearch.yml	
  
#	
  run	
  elastic	
  
./bin/elasticsearch	
  -­‐d
ElasticSearch:
Setup
$	
  curl	
  ‘http://localhost:9200/?pretty'	
  
{	
  
	
  	
  "status"	
  :	
  200,	
  
	
  	
  "name"	
  :	
  "Dredmund	
  Druid",	
  
	
  	
  "cluster_name"	
  :	
  "elasticsearch",	
  
	
  	
  "version"	
  :	
  {	
  
	
  	
  	
  	
  "number"	
  :	
  "1.7.1",	
  
	
  	
  	
  	
  "build_hash"	
  :	
  "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",	
  
	
  	
  	
  	
  "build_timestamp"	
  :	
  "2015-­‐07-­‐29T09:54:16Z",	
  
	
  	
  	
  	
  "build_snapshot"	
  :	
  false,	
  
	
  	
  	
  	
  "lucene_version"	
  :	
  "4.10.4"	
  
	
  	
  },	
  
	
  	
  "tagline"	
  :	
  "You	
  Know,	
  for	
  Search"	
  
}
Haystack
Adding search functionality
to Simple Model
$	
  cat	
  myapp/models.py	
  
from	
  django.db	
  import	
  models	
  
from	
  django.contrib.auth.models	
  import	
  User	
  
class	
  Page(models.Model):	
  
	
  	
  	
  	
  user	
  =	
  models.ForeignKey(User)	
  
	
  	
  	
  	
  name	
  =	
  models.CharField(max_length=200)	
  
	
  	
  	
  	
  description	
  =	
  models.TextField()	
  
	
  	
  	
  	
  def	
  __unicode__(self):	
  
	
  	
  	
  	
  	
  	
  	
  	
  return	
  self.name	
  
Haystack: Installation
$	
  pip	
  install	
  django-­‐haystack	
  
$	
  cat	
  settings.py	
  
INSTALLED_APPS	
  =	
  [	
  
	
  	
  	
  	
  'django.contrib.admin',	
  
	
  	
  	
  	
  'django.contrib.auth',	
  
	
  	
  	
  	
  'django.contrib.contenttypes',	
  
	
  	
  	
  	
  'django.contrib.sessions',	
  
	
  	
  	
  	
  'django.contrib.sites',	
  
	
  	
  	
  	
  #	
  Added.	
  
	
  	
  	
  	
  'haystack',	
  
	
  	
  	
  	
  #	
  Then	
  your	
  usual	
  apps...	
  
	
  	
  	
  	
  'blog',	
  
]
Haystack: Installation
$	
  pip	
  install	
  elasticsearch	
  
$	
  cat	
  settings.py	
  
...	
  
HAYSTACK_CONNECTIONS	
  =	
  {	
  
	
  	
  	
  	
  'default':	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  'ENGINE':	
  
'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',	
  
	
  	
  	
  	
  	
  	
  	
  	
  'URL':	
  'http://127.0.0.1:9200/',	
  
	
  	
  	
  	
  	
  	
  	
  	
  'INDEX_NAME':	
  'haystack',	
  
	
  	
  	
  	
  },	
  
}	
  
...
Haystack:
Creating SearchIndexes
$	
  cat	
  myapp/search_indexes.py	
  
import	
  datetime	
  
from	
  haystack	
  import	
  indexes	
  
from	
  myapp.models	
  import	
  Note	
  
class	
  PageIndex(indexes.SearchIndex,	
  indexes.Indexable):	
  
	
  	
  	
  	
  text	
  =	
  indexes.CharField(document=True,	
  use_template=True)	
  
	
  	
  	
  	
  author	
  =	
  indexes.CharField(model_attr='user')	
  
	
  	
  	
  	
  pub_date	
  =	
  indexes.DateTimeField(model_attr='pub_date')	
  
	
  	
  	
  	
  def	
  get_model(self):	
  
	
  	
  	
  	
  	
  	
  	
  	
  return	
  Note	
  
	
  	
  	
  	
  def	
  index_queryset(self,	
  using=None):	
  
	
  	
  	
  	
  	
  	
  	
  	
  """Used	
  when	
  the	
  entire	
  index	
  for	
  model	
  is	
  updated."""	
  
	
  	
  	
  	
  	
  	
  	
  	
  return	
  self.get_model().objects.	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  filter(pub_date__lte=datetime.datetime.now())
Haystack:
SearchQuerySet API
from	
  haystack.query	
  import	
  SearchQuerySet	
  
from	
  haystack.inputs	
  import	
  Raw	
  
all_results	
  =	
  SearchQuerySet().all()	
  
hello_results	
  =	
  SearchQuerySet().filter(content='hello')	
  
unfriendly_results	
  =	
  SearchQuerySet().	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  exclude(content=‘hello’).	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  filter(content=‘world’)	
  
#	
  To	
  send	
  unescaped	
  data:	
  
sqs	
  =	
  SearchQuerySet().filter(title=Raw(trusted_query))	
  
Keeping data in sync
#	
  Update	
  everything.	
  
./manage.py	
  update_index	
  -­‐-­‐settings=settings.prod	
  
#	
  Update	
  everything	
  with	
  lots	
  of	
  information	
  about	
  what's	
  going	
  on.	
  
./manage.py	
  update_index	
  -­‐-­‐settings=settings.prod	
  -­‐-­‐verbosity=2	
  
#	
  Update	
  everything,	
  cleaning	
  up	
  after	
  deleted	
  models.	
  
./manage.py	
  update_index	
  -­‐-­‐remove	
  -­‐-­‐settings=settings.prod	
  
#	
  Update	
  everything	
  changed	
  in	
  the	
  last	
  2	
  hours.	
  
./manage.py	
  update_index	
  -­‐-­‐age=2	
  -­‐-­‐settings=settings.prod	
  
#	
  Update	
  everything	
  between	
  Dec.	
  1,	
  2011	
  &	
  Dec	
  31,	
  2011	
  
./manage.py	
  update_index	
  -­‐-­‐start='2011-­‐12-­‐01T00:00:00'	
  -­‐-­‐
end='2011-­‐12-­‐31T23:59:59'	
  -­‐-­‐settings=settings.prod
Signals
class	
  RealtimeSignalProcessor(BaseSignalProcessor):	
  
	
  	
  	
  	
  """	
  
	
  	
  	
  	
  Allows	
  for	
  observing	
  when	
  saves/deletes	
  fire	
  &	
  automatically	
  updates	
  the	
  
	
  	
  	
  	
  search	
  engine	
  appropriately.	
  
	
  	
  	
  	
  """	
  
	
  	
  	
  	
  def	
  setup(self):	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  Naive	
  (listen	
  to	
  all	
  model	
  saves).	
  
	
  	
  	
  	
  	
  	
  	
  	
  models.signals.post_save.connect(self.handle_save)	
  
	
  	
  	
  	
  	
  	
  	
  	
  models.signals.post_delete.connect(self.handle_delete)	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  Efficient	
  would	
  be	
  going	
  through	
  all	
  backends	
  &	
  collecting	
  all	
  models	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  being	
  used,	
  then	
  hooking	
  up	
  signals	
  only	
  for	
  those.	
  
	
  	
  	
  	
  def	
  teardown(self):	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  Naive	
  (listen	
  to	
  all	
  model	
  saves).	
  
	
  	
  	
  	
  	
  	
  	
  	
  models.signals.post_save.disconnect(self.handle_save)	
  
	
  	
  	
  	
  	
  	
  	
  	
  models.signals.post_delete.disconnect(self.handle_delete)	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  Efficient	
  would	
  be	
  going	
  through	
  all	
  backends	
  &	
  collecting	
  all	
  models	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  being	
  used,	
  then	
  disconnecting	
  signals	
  only	
  for	
  those.
Haystack:
Pros and Cons
Pros:
• easy to setup
• looks like Django ORM but for searches
• search engine independent
• support 4 engines (Elastic, Solr, Xapian, Whoosh)
Cons:
• poor SearchQuerySet API
• difficult to manage stop words
• loose performance, because extra layer
• Model - based
Future FTS and
Roadmap Django 1.9
• PostgreSQL Full Text Search (Marc Tamlyn)
https://github.com/django/django/pull/4726
• Custom indexes (Marc Tamlyn)
• etc.
Final Thoughts
https://www.elastic.co/guide/en/elasticsearch/guide/master/
index.html
Thank You
a_soldatenko@wargaming.net
@a_soldatenko
https://asoldatenko.com
We are hiring
a_soldatenko@wargaming.net
Questions
?

More Related Content

What's hot

2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekingeProf. Wim Van Criekinge
 
Understanding Graph Databases with Neo4j and Cypher
Understanding Graph Databases with Neo4j and CypherUnderstanding Graph Databases with Neo4j and Cypher
Understanding Graph Databases with Neo4j and CypherRuhaim Izmeth
 
Compact ordered dict__k_lab_meeting_
Compact ordered dict__k_lab_meeting_Compact ordered dict__k_lab_meeting_
Compact ordered dict__k_lab_meeting_miki koganei
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016Duyhai Doan
 
Node collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBNode collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBm_richardson
 
The Ring programming language version 1.5.1 book - Part 38 of 180
The Ring programming language version 1.5.1 book - Part 38 of 180The Ring programming language version 1.5.1 book - Part 38 of 180
The Ring programming language version 1.5.1 book - Part 38 of 180Mahmoud Samir Fayed
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesJonathan Katz
 
Alta vista indexing and search engine
Alta vista  indexing and search engineAlta vista  indexing and search engine
Alta vista indexing and search enginedaomucun
 
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppet
 
dns.workshop.hsgr
dns.workshop.hsgrdns.workshop.hsgr
dns.workshop.hsgrebalaskas
 
2015 bioinformatics databases_wim_vancriekinge
2015 bioinformatics databases_wim_vancriekinge2015 bioinformatics databases_wim_vancriekinge
2015 bioinformatics databases_wim_vancriekingeProf. Wim Van Criekinge
 
Parse, scale to millions
Parse, scale to millionsParse, scale to millions
Parse, scale to millionsFlorent Vilmart
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYPostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYEmanuel Calvo
 
Doing Horrible Things with DNS - Web Directions South
Doing Horrible Things with DNS - Web Directions SouthDoing Horrible Things with DNS - Web Directions South
Doing Horrible Things with DNS - Web Directions SouthTom Croucher
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxData
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Jonathan Katz
 
아파트 정보를 이용한 ELK stack 활용 - 오근문
아파트 정보를 이용한 ELK stack 활용 - 오근문아파트 정보를 이용한 ELK stack 활용 - 오근문
아파트 정보를 이용한 ELK stack 활용 - 오근문NAVER D2
 
Value protocols and codables
Value protocols and codablesValue protocols and codables
Value protocols and codablesFlorent Vilmart
 

What's hot (20)

2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge
 
Understanding Graph Databases with Neo4j and Cypher
Understanding Graph Databases with Neo4j and CypherUnderstanding Graph Databases with Neo4j and Cypher
Understanding Graph Databases with Neo4j and Cypher
 
Compact ordered dict__k_lab_meeting_
Compact ordered dict__k_lab_meeting_Compact ordered dict__k_lab_meeting_
Compact ordered dict__k_lab_meeting_
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
 
Node collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBNode collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDB
 
The Ring programming language version 1.5.1 book - Part 38 of 180
The Ring programming language version 1.5.1 book - Part 38 of 180The Ring programming language version 1.5.1 book - Part 38 of 180
The Ring programming language version 1.5.1 book - Part 38 of 180
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
Alta vista indexing and search engine
Alta vista  indexing and search engineAlta vista  indexing and search engine
Alta vista indexing and search engine
 
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
 
dns.workshop.hsgr
dns.workshop.hsgrdns.workshop.hsgr
dns.workshop.hsgr
 
2015 bioinformatics databases_wim_vancriekinge
2015 bioinformatics databases_wim_vancriekinge2015 bioinformatics databases_wim_vancriekinge
2015 bioinformatics databases_wim_vancriekinge
 
Parse, scale to millions
Parse, scale to millionsParse, scale to millions
Parse, scale to millions
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYPostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
 
Doing Horrible Things with DNS - Web Directions South
Doing Horrible Things with DNS - Web Directions SouthDoing Horrible Things with DNS - Web Directions South
Doing Horrible Things with DNS - Web Directions South
 
RediSearch Mumbai Meetup 2020
RediSearch Mumbai Meetup 2020RediSearch Mumbai Meetup 2020
RediSearch Mumbai Meetup 2020
 
Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)
 
아파트 정보를 이용한 ELK stack 활용 - 오근문
아파트 정보를 이용한 ELK stack 활용 - 오근문아파트 정보를 이용한 ELK stack 활용 - 오근문
아파트 정보를 이용한 ELK stack 활용 - 오근문
 
Value protocols and codables
Value protocols and codablesValue protocols and codables
Value protocols and codables
 

Viewers also liked

Practical continuous quality gates for development process
Practical continuous quality gates for development processPractical continuous quality gates for development process
Practical continuous quality gates for development processAndrii Soldatenko
 
PyCon 2015 Belarus Andrii Soldatenko
PyCon 2015 Belarus Andrii SoldatenkoPyCon 2015 Belarus Andrii Soldatenko
PyCon 2015 Belarus Andrii SoldatenkoAndrii Soldatenko
 
SeleniumCamp 2015 Andrii Soldatenko
SeleniumCamp 2015 Andrii SoldatenkoSeleniumCamp 2015 Andrii Soldatenko
SeleniumCamp 2015 Andrii SoldatenkoAndrii Soldatenko
 
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013 Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013 Miriade Spa
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Djangotow21
 
Tricuris.trichiura
Tricuris.trichiuraTricuris.trichiura
Tricuris.trichiuraJoel Rojas
 
Better Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQLBetter Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQLArtur Zakirov
 

Viewers also liked (10)

Practical continuous quality gates for development process
Practical continuous quality gates for development processPractical continuous quality gates for development process
Practical continuous quality gates for development process
 
PyCon 2015 Belarus Andrii Soldatenko
PyCon 2015 Belarus Andrii SoldatenkoPyCon 2015 Belarus Andrii Soldatenko
PyCon 2015 Belarus Andrii Soldatenko
 
SeleniumCamp 2015 Andrii Soldatenko
SeleniumCamp 2015 Andrii SoldatenkoSeleniumCamp 2015 Andrii Soldatenko
SeleniumCamp 2015 Andrii Soldatenko
 
PyCon Ukraine 2014
PyCon Ukraine 2014PyCon Ukraine 2014
PyCon Ukraine 2014
 
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013 Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Django
 
Plasmodium
PlasmodiumPlasmodium
Plasmodium
 
Tricuris.trichiura
Tricuris.trichiuraTricuris.trichiura
Tricuris.trichiura
 
Better Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQLBetter Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQL
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 

Similar to Dive into full text search with Python

Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...
Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...
Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...it-people
 
Hvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøsterHvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøsterLibriotech
 
Tests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapTests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapRodolphe Quiédeville
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataAnne Nicolas
 
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfcadejaumafiq
 
Lec25-CS110 Computational Engineering
Lec25-CS110 Computational EngineeringLec25-CS110 Computational Engineering
Lec25-CS110 Computational EngineeringSri Harsha Pamu
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humansCraig Kerstiens
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData StackPeadar Coyle
 
UNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAYUNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAYSignis Vavere
 
An introduction to Raku
An introduction to RakuAn introduction to Raku
An introduction to RakuSimon Proctor
 
Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Prof. Wim Van Criekinge
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLCommand Prompt., Inc
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
 
Defensive Programming 2013-03-18
Defensive Programming 2013-03-18Defensive Programming 2013-03-18
Defensive Programming 2013-03-18Laura A Schild
 

Similar to Dive into full text search with Python (20)

Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...
Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...
Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...
 
Kyiv.py #16 october 2015
Kyiv.py #16 october 2015Kyiv.py #16 october 2015
Kyiv.py #16 october 2015
 
Hvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøsterHvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøster
 
Tests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapTests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTap
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
 
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
 
Odp
OdpOdp
Odp
 
Lec25-CS110 Computational Engineering
Lec25-CS110 Computational EngineeringLec25-CS110 Computational Engineering
Lec25-CS110 Computational Engineering
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humans
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData Stack
 
UNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAYUNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAY
 
UNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAYUNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAY
 
An introduction to Raku
An introduction to RakuAn introduction to Raku
An introduction to Raku
 
Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014
 
AD3251-Data Structures Design-Notes-Searching-Hashing.pdf
AD3251-Data Structures  Design-Notes-Searching-Hashing.pdfAD3251-Data Structures  Design-Notes-Searching-Hashing.pdf
AD3251-Data Structures Design-Notes-Searching-Hashing.pdf
 
Fuzzing - A Tale of Two Cultures
Fuzzing - A Tale of Two CulturesFuzzing - A Tale of Two Cultures
Fuzzing - A Tale of Two Cultures
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Bioinformatica p4-io
Bioinformatica p4-ioBioinformatica p4-io
Bioinformatica p4-io
 
Defensive Programming 2013-03-18
Defensive Programming 2013-03-18Defensive Programming 2013-03-18
Defensive Programming 2013-03-18
 

More from Andrii Soldatenko

Debugging concurrency programs in go
Debugging concurrency programs in goDebugging concurrency programs in go
Debugging concurrency programs in goAndrii Soldatenko
 
Building robust and friendly command line applications in go
Building robust and friendly command line applications in goBuilding robust and friendly command line applications in go
Building robust and friendly command line applications in goAndrii Soldatenko
 
Advanced debugging  techniques in different environments
Advanced debugging  techniques in different environmentsAdvanced debugging  techniques in different environments
Advanced debugging  techniques in different environmentsAndrii Soldatenko
 
Building serverless-applications
Building serverless-applicationsBuilding serverless-applications
Building serverless-applicationsAndrii Soldatenko
 
Building Serverless applications with Python
Building Serverless applications with PythonBuilding Serverless applications with Python
Building Serverless applications with PythonAndrii Soldatenko
 

More from Andrii Soldatenko (6)

Debugging concurrency programs in go
Debugging concurrency programs in goDebugging concurrency programs in go
Debugging concurrency programs in go
 
Building robust and friendly command line applications in go
Building robust and friendly command line applications in goBuilding robust and friendly command line applications in go
Building robust and friendly command line applications in go
 
Advanced debugging  techniques in different environments
Advanced debugging  techniques in different environmentsAdvanced debugging  techniques in different environments
Advanced debugging  techniques in different environments
 
Origins of Serverless
Origins of ServerlessOrigins of Serverless
Origins of Serverless
 
Building serverless-applications
Building serverless-applicationsBuilding serverless-applications
Building serverless-applications
 
Building Serverless applications with Python
Building Serverless applications with PythonBuilding Serverless applications with Python
Building Serverless applications with Python
 

Recently uploaded

Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleanscorenetworkseo
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 

Recently uploaded (20)

Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleans
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 

Dive into full text search with Python

  • 1. Dive into full text search with Python Andrii Soldatenko 18-19 September 2015 @a_soldatenko
  • 2. About me: • Lead QA Automation Engineer at • Backend Python Developer at • Speaker at PyCon Ukraine 2014 • Speaker at PyCon Belarus 2015 • @a_soldatenko
  • 5. Text Search grep  -­‐-­‐ignore-­‐case  -­‐-­‐recursive  foo  books/   grep  -­‐-­‐ignore-­‐case  -­‐-­‐recursive  -­‐-­‐file=words.txt  books/ Entry.objects.get(headline__icontains='foo')   words  =  []   with  open('words.txt',  'r')  as  f:          words  =  f.readlines()   Entry.objects.get(headline__icontains_in=words)
  • 8. Simple sentences 1. The quick brown fox jumped over the lazy dog 2. Quick brown foxes leap over lazy dogs in summer
  • 9. Inverted index Term            Doc_1    Doc_2   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   Quick      |              |    X   The          |      X      |   brown      |      X      |    X   dog          |      X      |   dogs        |              |    X   fox          |      X      |   foxes      |              |    X   in            |              |    X   jumped    |      X      |   lazy        |      X      |    X   leap        |              |    X   over        |      X      |    X   quick      |      X      |   summer    |              |    X   the          |      X      |   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐
  • 10. Inverted index Term            Doc_1    Doc_2   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   brown      |      X      |    X   quick      |      X      |   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   Total      |      2      |    1
  • 11. Inverted index: normalization Term            Doc_1    Doc_2   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   brown      |      X      |    X   dog          |      X      |    X   fox          |      X      |    X   in            |              |    X   jump        |      X      |    X   lazy        |      X      |    X   over        |      X      |    X   quick      |      X      |    X   summer    |              |    X   the          |      X      |    X   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐ Term            Doc_1    Doc_2   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   Quick      |              |    X   The          |      X      |   brown      |      X      |    X   dog          |      X      |   dogs        |              |    X   fox          |      X      |   foxes      |              |    X   in            |              |    X   jumped    |      X      |   lazy        |      X      |    X   leap        |              |    X   over        |      X      |    X   quick      |      X      |   summer    |              |    X   the          |      X      |   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐
  • 14. PostgreSQL:
 operators for textual data types -­‐-­‐-­‐  PostgreSQL  has  operators  for  textual  data  types:   -­‐-­‐-­‐  LIKE  -­‐  match  case-­‐sensitive   -­‐-­‐-­‐  ILIKE  -­‐  match  case-­‐insensitive   -­‐-­‐-­‐  ~  -­‐  Matches  POSIX  regular  expression,  case-­‐sensitive   -­‐-­‐-­‐  ~*  -­‐  Matches  POSIX  regular  expression,  case-­‐insensitive   select  'foo'  LIKE  'foo';                  -­‐-­‐  true   select  'bar'  ILIKE  'BAR';                -­‐-­‐  true     select  'abc'  LIKE  'b';                      -­‐-­‐  true   select  'abc'  LIKE  'c';                      -­‐-­‐  false   select  'abc'  ~  'abc';                        -­‐-­‐  true   select  'abc'  ~  '^a';                          -­‐-­‐  true   select  'abc'  ~  '(b|d)';                    -­‐-­‐  true   select  'abc'  ~  '^(b|c)';                  -­‐-­‐  false   select  'andrii'  ~*  '.*Andrii.*';  -­‐-­‐  true
  • 15. PostgreSQL:
 accuracy issue select  'prone'  like  '%one%';  -­‐-­‐true     select  'money'  like  '%one%';  -­‐-­‐true     select  'lonely'  like  '%one%';  -­‐-­‐true    
  • 16. Full text search in PostgreSQL 1. Creating tokens 2. Converting tokens into Lexemes 3. Storing preprocessed documents
  • 17. Full text search in PostgreSQL 27 built-in configurations for 10 languages Support of user-defined FTS configurations Pluggable dictionaries, parsers Inverted indexes
  • 18. functions to convert normal text to tsvector explain  SELECT  'a  fat  cat  sat  on  a  mat  and  ate  a  fat  rat'::tsvector  @@                  'cat  &  rat’::tsquery;                                  QUERY  PLAN                                   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐    Result    (cost=0.00..0.01  rows=1  width=0)   (1  row)   explain  SELECT  'fat  &  cow'::tsquery  @@                    'a  fat  cat  sat  on  a  mat  and  ate  a  fat  rat'::tsvector;  -­‐-­‐  false                                  QUERY  PLAN                                   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐    Result    (cost=0.00..0.01  rows=1  width=0)   (1  row)
  • 19. PostgreSQL:
 index management CREATE  FUNCTION  notes_vector_update()  RETURNS  TRIGGER  AS  $$   BEGIN          IF  TG_OP  =  'INSERT'  THEN                  new.search_index  =  to_tsvector('pg_catalog.english',  COALESCE(NEW.name,  ''));          END  IF;          IF  TG_OP  =  'UPDATE'  THEN                  IF  NEW.name  <>  OLD.name  THEN                          new.search_index  =  to_tsvector('pg_catalog.english',  COALESCE(NEW.name,  ''));                  END  IF;          END  IF;          RETURN  NEW;   END   $$  LANGUAGE  'plpgsql';  
  • 20. PostgreSQL:
 stopwords SELECT  to_tsvector('english','in  the  list  of  stop  words');                to_tsvector   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐    'list':3  'stop':5  'word':6 /usr/pgsql-9.3/share/tsearch_data/english.stop
  • 22. Malcolm Tredinnick's Advice on Writing SQL in Django : “︎If you need to write advanced SQL you should write it. I would balance that by cautioning against overuse of the raw() and extra() methods.”
  • 23. PostgreSQL full-text search integration with django orm https://github.com/linuxlewis/djorm-ext-pgfulltext from  djorm_pgfulltext.models  import  SearchManager   from  djorm_pgfulltext.fields  import  VectorField   from  django.db  import  models   class  Page(models.Model):          name  =  models.CharField(max_length=200)          description  =  models.TextField()          search_index  =  VectorField()          objects  =  SearchManager(                  fields  =  ('name',  'description'),                  config  =  'pg_catalog.english',  #  this  is  default                  search_field  =  'search_index',  #  this  is  default                  auto_update_search_field  =  True          )
  • 24. For search just use search method of the manager https://github.com/linuxlewis/djorm-ext-pgfulltext >>>  Page.objects.search("documentation  &  about")   [<Page:  Page:  Home  page>]   >>>  Page.objects.search("about  |  documentation  |  django  |  home",  raw=True)   [<Page:  Page:  Home  page>,  <Page:  Page:  About>,  <Page:  Page:  Navigation>]
  • 25. Second way class  Page(models.Model):          name  =  models.CharField(max_length=200)          description  =  models.TextField()          objects  =  SearchManager(fields=None,  search_field=None)   >>>  Page.objects.search("documentation  &  about",  fields=('name',   'description'))   [<Page:  Page:  Home  page>]   >>>  Page.objects.search("about  |  documentation  |  django  |  home",   raw=True,  fields=('name',  'description'))   [<Page:  Page:  Home  page>,  <Page:  Page:  About>,  <Page:  Page:   Navigation>]
  • 26. Pros and Cons Pros: • Quick implementation • No dependency Cons: • Need manually manage indexes • Not as flexible as pure search engines • tied to PostgreSQL • no analytics data • no DSL only `&` and `|` queries • difficult to manage stop words
  • 29. ElasticSearch: Quick Intro Relational DB Databases TablesRows Columns ElasticSearch Indices FieldsTypes Documents
  • 30. ElasticSearch: Quick Intro PUT  /haystack/user/1   {          "first_name"  :  "Andrii",          "last_name"  :    "Soldatenko",          "age"  :                30,          "about"  :            "I  love  to  go  rock  climbing",          "interests":  [  "sports",  "music"  ],          "likes":  [  "python",  "django"  ]   }
  • 32. ElasticSearch: Setup #!/bin/bash   VERSION=1.7.1   curl  -­‐L  -­‐O  https://download.elastic.co/elasticsearch/elasticsearch/ elasticsearch-­‐$VERSION.zip   unzip  elasticsearch-­‐$VERSION.zip   cd  elasticsearch-­‐$VERSION   #  Download  plugin  marvel   ./bin/plugin  -­‐i  elasticsearch/marvel/latest   echo  'marvel.agent.enabled:  false'  >>  ./config/elasticsearch.yml   #  run  elastic   ./bin/elasticsearch  -­‐d
  • 33. ElasticSearch: Setup $  curl  ‘http://localhost:9200/?pretty'   {      "status"  :  200,      "name"  :  "Dredmund  Druid",      "cluster_name"  :  "elasticsearch",      "version"  :  {          "number"  :  "1.7.1",          "build_hash"  :  "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",          "build_timestamp"  :  "2015-­‐07-­‐29T09:54:16Z",          "build_snapshot"  :  false,          "lucene_version"  :  "4.10.4"      },      "tagline"  :  "You  Know,  for  Search"   }
  • 35. Adding search functionality to Simple Model $  cat  myapp/models.py   from  django.db  import  models   from  django.contrib.auth.models  import  User   class  Page(models.Model):          user  =  models.ForeignKey(User)          name  =  models.CharField(max_length=200)          description  =  models.TextField()          def  __unicode__(self):                  return  self.name  
  • 36. Haystack: Installation $  pip  install  django-­‐haystack   $  cat  settings.py   INSTALLED_APPS  =  [          'django.contrib.admin',          'django.contrib.auth',          'django.contrib.contenttypes',          'django.contrib.sessions',          'django.contrib.sites',          #  Added.          'haystack',          #  Then  your  usual  apps...          'blog',   ]
  • 37. Haystack: Installation $  pip  install  elasticsearch   $  cat  settings.py   ...   HAYSTACK_CONNECTIONS  =  {          'default':  {                  'ENGINE':   'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',                  'URL':  'http://127.0.0.1:9200/',                  'INDEX_NAME':  'haystack',          },   }   ...
  • 38. Haystack: Creating SearchIndexes $  cat  myapp/search_indexes.py   import  datetime   from  haystack  import  indexes   from  myapp.models  import  Note   class  PageIndex(indexes.SearchIndex,  indexes.Indexable):          text  =  indexes.CharField(document=True,  use_template=True)          author  =  indexes.CharField(model_attr='user')          pub_date  =  indexes.DateTimeField(model_attr='pub_date')          def  get_model(self):                  return  Note          def  index_queryset(self,  using=None):                  """Used  when  the  entire  index  for  model  is  updated."""                  return  self.get_model().objects.                                            filter(pub_date__lte=datetime.datetime.now())
  • 39. Haystack: SearchQuerySet API from  haystack.query  import  SearchQuerySet   from  haystack.inputs  import  Raw   all_results  =  SearchQuerySet().all()   hello_results  =  SearchQuerySet().filter(content='hello')   unfriendly_results  =  SearchQuerySet().                                            exclude(content=‘hello’).                                            filter(content=‘world’)   #  To  send  unescaped  data:   sqs  =  SearchQuerySet().filter(title=Raw(trusted_query))  
  • 40. Keeping data in sync #  Update  everything.   ./manage.py  update_index  -­‐-­‐settings=settings.prod   #  Update  everything  with  lots  of  information  about  what's  going  on.   ./manage.py  update_index  -­‐-­‐settings=settings.prod  -­‐-­‐verbosity=2   #  Update  everything,  cleaning  up  after  deleted  models.   ./manage.py  update_index  -­‐-­‐remove  -­‐-­‐settings=settings.prod   #  Update  everything  changed  in  the  last  2  hours.   ./manage.py  update_index  -­‐-­‐age=2  -­‐-­‐settings=settings.prod   #  Update  everything  between  Dec.  1,  2011  &  Dec  31,  2011   ./manage.py  update_index  -­‐-­‐start='2011-­‐12-­‐01T00:00:00'  -­‐-­‐ end='2011-­‐12-­‐31T23:59:59'  -­‐-­‐settings=settings.prod
  • 41. Signals class  RealtimeSignalProcessor(BaseSignalProcessor):          """          Allows  for  observing  when  saves/deletes  fire  &  automatically  updates  the          search  engine  appropriately.          """          def  setup(self):                  #  Naive  (listen  to  all  model  saves).                  models.signals.post_save.connect(self.handle_save)                  models.signals.post_delete.connect(self.handle_delete)                  #  Efficient  would  be  going  through  all  backends  &  collecting  all  models                  #  being  used,  then  hooking  up  signals  only  for  those.          def  teardown(self):                  #  Naive  (listen  to  all  model  saves).                  models.signals.post_save.disconnect(self.handle_save)                  models.signals.post_delete.disconnect(self.handle_delete)                  #  Efficient  would  be  going  through  all  backends  &  collecting  all  models                  #  being  used,  then  disconnecting  signals  only  for  those.
  • 42. Haystack: Pros and Cons Pros: • easy to setup • looks like Django ORM but for searches • search engine independent • support 4 engines (Elastic, Solr, Xapian, Whoosh) Cons: • poor SearchQuerySet API • difficult to manage stop words • loose performance, because extra layer • Model - based
  • 43. Future FTS and Roadmap Django 1.9 • PostgreSQL Full Text Search (Marc Tamlyn) https://github.com/django/django/pull/4726 • Custom indexes (Marc Tamlyn) • etc.