Based on the information provided:- Only 1 file (films.json) was posted to the Solr core "films" using the bin/post command. - An error occurred while posting that file, indicating there was an issue with one of the document fields. - When querying for all documents using q=*:*, only 1 document is returned.So in summary, it appears only 1 document was successfully indexed/posted to the Solr core, despite the films.json file containing multiple documents. The error during posting prevented all documents from being added
Similar to Based on the information provided:- Only 1 file (films.json) was posted to the Solr core "films" using the bin/post command. - An error occurred while posting that file, indicating there was an issue with one of the document fields. - When querying for all documents using q=*:*, only 1 document is returned.So in summary, it appears only 1 document was successfully indexed/posted to the Solr core, despite the films.json file containing multiple documents. The error during posting prevented all documents from being added
Similar to Based on the information provided:- Only 1 file (films.json) was posted to the Solr core "films" using the bin/post command. - An error occurred while posting that file, indicating there was an issue with one of the document fields. - When querying for all documents using q=*:*, only 1 document is returned.So in summary, it appears only 1 document was successfully indexed/posted to the Solr core, despite the films.json file containing multiple documents. The error during posting prevented all documents from being added (20)
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Based on the information provided:- Only 1 file (films.json) was posted to the Solr core "films" using the bin/post command. - An error occurred while posting that file, indicating there was an issue with one of the document fields. - When querying for all documents using q=*:*, only 1 document is returned.So in summary, it appears only 1 document was successfully indexed/posted to the Solr core, despite the films.json file containing multiple documents. The error during posting prevented all documents from being added
5. 5
Why should you use Open Source?
• State of the Art Technologies
• Community Support
• Vast Documentation
• Code is accessible
• Customizable
• Mostly free licensing
6. 6
Why should you contribute to Open Source?
• Share Knowledge and Ideas
• Improve established Technologies
• Become part of a Community
• Not only code, all your skills are relevant
• Be useful to the World
8. 8
What is SOLR
• A Search Engine
• A REST-like API
• Built on Lucene
• Open Source
• Blazing-fast
• Scalable
• Fault tolerant
9. 9
Why SOLR
Scalable
Solr scales by distributing work (indexing and query processing) to multiple servers in a cluster.
Ready to deploy
Solr is open source, is easy to install and configure, and provides a preconfigured example to help you get
started.
Optimized for search
Solr is fast and can execute complex queries in subsecond speed, often only tens of milliseconds.
Large volumes of documents
Solr is designed to deal with indexes containing many millions of documents.
Text-centric
Solr is optimized for searching natural-language text, like emails, web pages, resumes, PDF documents,
and social messages such as tweets or blogs.
Results sorted by relevance
Solr returns documents in ranked order based on how relevant each document is to the user’s query.
11. 11
Features overview
• Pagination and sorting
• Faceting
• Autosuggest
• Spell-checking
• Highlighting
• Geospatial search
• More Like This
12. 12
Features overview
• Flexible query support
• Document clustering
• Import rich document formats (PDF, Office…)
• Import data from databases
• Multilingual support
DIH
Data Import Handler
16. 16
Lucene Document
• Documents are the unit of information for
indexing and search
• A Document is a set of fields
• Each field has a name and a value
• All field types must be defined, and all field
names (or dynamic field-naming patterns)
should be specified in Solr’s schema.xml
Seminars
Schema Configuration
• Per collection/index
• Xml file
• Define how the inverted Index will be built
• Fields/Field Types definition
Seminars
Schema Configuration
• Per collection/index
• Xml file
• Define how the inverted Index will be built
• Fields/Field Types definition
DOCUMENT
FIELD
17. 17
Lucene Document – Search problem
The Beginner’s Guide to Buying a House
How to Buy Your First House
Purchasing a Home
Becoming a New Home owner
Buying a New Home
Decorating Your Home
A Fun Guide to Cooking
How to Raise a Child
Buying a New Car
SELECT * FROM Books WHERE Name = 'buying a new home’;
0 results
SELECT * FROM Books
WHERE Name LIKE '%buying%’
AND Name LIKE '%a%’
AND Name LIKE '%home%’;
1 result
Buying a New Home
SELECT * FROM Books
WHERE Name LIKE '%buying%’
OR Name LIKE '%a%’
OR Name LIKE '%home%’;
8 results
A Fun Guide to Cooking, Decorating Your Home, How to Raise a Child, Buying a New Car,
Buying a New Home, The Beginner’s Guide to Buying a House, Purchasing a Home,
Becoming a New Home owner
Unimportant words
Synonyms
Linguistic variations
Ordering
18. 18
Lucene Document – Inverted Index
Doc # Content field Term Doc #
1 A Fun Guide to Cooking a 1,3,4,5,6,7,8
2 Decorating Your Home becoming 8
3 How to Raise a Child beginner’s 6
4 Buying a New Car buy 9
5 Buying a New Home buying 4,5,6
6 The Beginner’s Guide to Buying a House child 3
7 Purchasing a Home cooking 1
8 Becoming a New Home Owner decorating 2
9 How to Buy Your First House home 2,5,7,8
house 6,9
how 3,9
new 4,5,8
purchasing 7
your 2,9
INVERTED
INDEX
19. 19
Searching
TERM DOCS
buying 4,5,6,7,9
home 2,5,6,7,8,9
Unimportant word “a” is skipped
Synonyms purchasing ~ buying
Linguistic variations buy ~ buying
Synonyms house ~ home
(AND) = 5,6,7,9
Buying a New Home
The Beginner’s Guide to Buying a House
Purchasing a Home
How to Buy Your First House
20. 20
Searching operators
• Required terms
• Optional terms
• Negated terms
• Phrases
• Grouped expressions
• Fuzzy matching
• Wildcard
• Range
• Distance
• Proximity
buying AND home
buying OR home
buying NOT home
“buying a home”
(buying OR renting) AND home
offi* off*r off?r
yearsOld:[18-21]
administrator~
“chief officer”~1
21. 21
Relevancy till SOLR 4 (TF/IDF)
A relevancy score for each document is calculated and the search results are sorted from the highest score to the lowest.
Similarity
Term frequency
• A document is more relevant for a particular term if the term appears multiple times
Inverse document frequency
• Measure of how “rare” a search term is, is calculated by finding the document frequency (how many total documents
the search term appears within)
Boosting
• Multiplier in query time to adjust the weight of a field
• title:solr^2.5 description:solr
Normalization factors for fields, queries and coord
Ordering
22. 22
Relevancy from SOLR 6 (BM25)
BM25 improves upon TF/IDF
BM25 stands for “Best Match 25” (25th iteration on TF/IDF)
Includes different factors
• Frequency of a term in all Documents
• Term Frequency in a Document
• Document Length
BM25 limits influence of term frequency:
• less influence of commonwords
With TF/IDF: short fields (title,...) are automatically scored higher
BM25: Scales field length with average
• field length treatment does not automatically boost short fields
Ordering
23. 23
Precision and Recall
Precision is a measure of how “good” each of the results of a query is. A query that returns one single
correct document out of a million other correct documents is still considered perfectly precise.
Recall is a measure of how many of the correct documents are returned. A query that returns one
single correct document out of a million other correct documents is considered a very poor recall
scoring.
>> Precision and Recall balance will improve the quality of your search results.
20 correct documents
Search results containing 10 documents
(8 correct and 2 incorrect)
Precision = 80% (8 / 10)
Recall = 40% (8 / 20)
What is the precision and
recall for the
previous ”buying a home”
sample?
24. 24
Searching at Scale
Scaling SOLR
Solr is able to scale to handle
billions of documents and an
infinite number of queries
by adding servers.
Some limitations
• You can insert, delete, and update documents, but not single fields (easily)
• Solr is not optimized for processing quite long queries (thousands of terms) or returning quite
large result sets to users.
26. 26
Requirements
• Java Runtime Environment 1.8+
$ java -version
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
• Supported Operating Systems
• Linux
• MacOS
• Windows
https://lucene.apache.org/solr/downloads.html
27. 27
Directory layout
bin/
• solr | solr.cmd : start SOLR
• post : posting content to SOLR
• solr.in.sh | solr.in.cmd : configuration
contrib/
• add-ons plugins
dist/
• SOLR Jar files
docs/
• JavaDocs
example/
• CSV, XML and JSON
• DIH for databases
• Word and PDF files
licenses/
• 3rd party libraries
server/
• SOLR Admin UI
• Jetty Libraries
• Log files
• Sample configsets
28. 28
Starting SOLR
• Use the command line interface tool called bin/solr (Linux) or binsolr.cmd (Windows)
$ bin/solr start -p 8983
Waiting up to 180 seconds to see Solr running on port 8983 []
Started Solr server on port 8983 (pid=4521). Happy searching!
• Check if Solr is Running
$ bin/solr status
Found 1 Solr nodes:
Solr process 4521 running on port 8983
{
"solr_home":"/Users/aborroy/Downloads/solr-introduction-university/solr-8.4.1/server/solr",
"version":"8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan - 2020-01-10 13:40:28",
"startTime":"2020-03-08T08:13:49.969Z",
"uptime":"0 days, 0 hours, 17 minutes, 56 seconds",
"memory":"91.6 MB (%17.9) of 512 MB"}
30. 30
Creating a new Core
$ bin/solr create -c films
• -c indicates the collection name
Check default fields added by SOLR to the Schema >>>>>>>
Check JSON Data to be posted in example/films/films.json
{
"id": "/en/45_2006",
"directed_by": [
"Gary Lennon"
],
"initial_release_date": "2006-11-30",
"genre": [
"Black comedy",
"Thriller"
],
"name": ".45"
}
31. 31
Posting data
$ bin/post -c films example/films/films.json
Posting files to [base] url http://localhost:8983/solr/films/update...
POSTing file films.json (application/json) to [base]/json/docs
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8983/solr/films/update/json/docs
SimplePostTool: WARNING: Response: {
"responseHeader":{
"status":400,
"QTime":120},
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","java.lang.NumberFormatException"],
"msg":"ERROR: [doc=/en/quien_es_el_senor_lopez] Error adding field 'name'='¿Quién es el señor López?' msg=For input string:
"¿Quién es el señor López?"",
"code":400}}
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for
URL: http://localhost:8983/solr/films/update/json/docs
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/films/update...
Time spent: 0:00:00.323
32. 32
How many results were posted?
http://127.0.0.1:8983/solr/films/select?indent=on&q=*:*&wt=json
• q: query event
• fq: filter queries
• sort: asc or desc
• start, rows: offset and number of rows
• fl: list of fields to return
• wt: response in XML or JSON
33. 33
What was wrong?
Check carefully JSON Data to be posted in example/films/films.json
{
"id": "/en/quien_es_el_senor_lopez",
"directed_by": [
"Luis Mandoki"
],
"genre": [
"Documentary film"
],
"name": "u00bfQuiu00e9n es el seu00f1or Lu00f3pez?"
},
http://127.0.0.1:8983/solr/#/films/schema?field=name
34. 34
Auto-Generated SOLR Schema
http://127.0.0.1:8983/solr/#/films/files?file=managed-schema
A single document might
contain multiple values
for this field type
The value of the field
can be used in queries
to retrieve matching
documents (true by
default)
SOLR rejects any
attempts to add a
document which does
not have a value for this
field
The actual value of the
field can be retrieved by
queries
name can contain text!
35. 35
Re-Creating the Core
Deleting core “films”
$ bin/solr delete -c films
Deleting core 'films' using command:
http://localhost:8983/solr/admin/cores?action=UNLOAD&core=film
s&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true
Creating core “films”
$ bin/solr create -c films
Created new core 'films’
Creating the field “name” for the core “films”
http://127.0.0.1:8983/solr/#/films/schema
36. 36
Posting Data 2
$ bin/post -c films example/films/films.json
Posting files to [base] url http://localhost:8983/solr/films/update...
POSTing file films.json (application/json) to [base]/json/docs
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/films/update...
Time spent: 0:00:00.417
http://127.0.0.1:8983/solr/films/select?indent=on&q=*:*&wt=json
37. 37
Exploring SOLR Analyzers
• Solr analyzes both index content and query input before matching the results
• The live analysis can be observed by using “Analysis” option from Solr Admin UI
39. 39
Searching
q = genre:Fantasy directed_by:"Robert Zemeckis"
• This query is searching for both genre Fantasy and directed by Robert Zemeckis (OR is default operator)
40. 40
Filtering
q = genre:Fantasy
fq = initial_release_date:[NOW-12YEAR TO *]
• This query is searching for both genre Fantasy in the latest 12 years
41. 41
Sorting
q = *:*
sort = initial_release_date desc
• This query is ordering all the films by release date in descent order
44. 44
Faceting
Multiple fields for faceting
http://127.0.0.1:8983/solr/films/select?facet.field=directed_by_str&facet.field=genre&facet=on&indent=on&q=*:*&wt=js
on