Search and information discovery is a huge part of almost any modern site.
Solr is an incredibly powerful search tool that allows us to quickly add advanced search capabilities such as full-text search, faceting, autocomplete and spelling suggestions to our projects without much effort. We will be using 'django-haystack' to communicate between Django and Solr.
1. ADVANCED SEARCH
WITH
SOLR + DJANGO-HAYSTACK
MARCEL CHASTAIN
LA DJANGO – 2014-09-30
2. WHAT WE’LL COVER
1. THE PITCH:
The Problem With Search
The Solution(s)
Overall Architecture of System with Django/Solr/Haystack
2. THE GOOD STUFF:
Indexing Data for Search
Querying the Search Index
Advanced Search Methods
Resources
11. THE PROBLEM
3. Good search involves lots of
challenges
• Stemming:
User Searches For Word “Stem”
“argue”
“argues”
“argued”
“argu”
“argument”
“arguments”
“argument”
12. THE PROBLEM
3. Good search involves lots of
challenges
And more..!
• Synonyms
• Acronyms
• Non-ASCII characters
• Stop words (“and”, “to”, “a”)
• Calculating relevance
• Performance with millions/billions(!) of documents
16. WHAT IS SOLR?
Open-source enterprise search
Java-based
Created in 2004
Built on Apache Lucene
Most popular enterprise search engine
Apache 2.0 License
Built for millions or billions of documents
17. WHAT DOES IT DO?
• Full-text search
• Hit highlighting
• Faceted search
• Clustering/replication/sharding
• Database integration
• Rich document (word, pdf, etc) handling
• Geospatial search
• Spelling corrections/suggestions
• … loads and loads more
21. THE GOOD
STUFF
INSTALLING, CONFIGURING & USING
SOLR/HAYSTACK
22. WHO DOES WHAT
Solr:
• Provides API for submitting to & querying from index
• Stores actual index data
• Manages fields/data types in xml config (‘schema.xml’)
Haystack:
• Manages connection(s) to solr
• Provides familiar API for querying
• Uses templates and declarative search index definitions
• Helps generate solr xml config
• Management commands to index content
• Generic views/forms for common search use-cases
• Hooks into signals to keep data up-to-date
25. 1. SETUP SOLR
(from github repo root)
./solr_download.sh
(or, manually)
wget http://apache.mirrors.pair.com/lucene/solr/4.10.1/solr-4.10.1.tgz
tar –xzvf solr-4.10.1.tgz
ln –s ./solr-4.10.1 ./solr
The one file to care about:
• solr/example/solr/collection1/conf/schema.xml
Stores field definitions and data types. Frequently updated during
development
26. 2. RUN SOLR
(from github repo root)
./solr_start.sh
(or, manually)
cd solr/example && java –jar start.jar
Requires java 1.7+. To install on debian/ubuntu:
sudo apt-get install openjdk-7-jre-headless
32. 7.5 BOOSTING FIELD
RELEVANCE
Some fields are simply more relevant!
(Note: changes to field boosts require reindex)
33. 8. CREATE A TEMPLATE
FOR INDEXED TEXT
templates/search/indexes/myapp/note_text.txt
34. 9. UPDATE SOLR SCHEMA
(CWD: haystackdemo/demo/)
./manage.py build_solr_schema >
../solr/example/solr/collection1/conf/schema.xml
Which adds:
*Restart solr for changes to go into effect
41. HAYSTACK COMPONENTS TO
EXTEND
• haystack.forms.SearchForm
django form with extendable .search() method. Define additional
fields on the form, then incorporate them in the .search()
method’s logic
• haystack.views.SearchView
Class-based view made to be flexible for common search cases