SlideShare a Scribd company logo
1 of 27
Download to read offline
Workshop
Yasas Senarath
Visiting Instructor & Research Assistant
Dept. of Computer Science and Engineering,
University of Moratuwa
Solr
Introduction [Recall]
● Search Platform
● Open-Source
● Search Applications
● Built on top of Lucene
● Why…
○ Enterprise-ready
○ Fast
○ Highly Scalable
● Search + NoSQL
○ Non Relational Data Storage
Features of Apache Solr [Recall]
● Restful APIs
○ No Java programming skills Required
● Full text search
○ tokens, phrases, spell check, wildcard, and auto-complete
● Enterprise ready
● Flexible and Extensible
● NoSQL database
● Admin Interface
● Highly Scalable
● Text-Centric and Sorted by Relevance
How do Search Engines Work?
Installing Solr
● Go to Solr Website and Download Binary Version of Solr-8.1.1 (Latest Version
of Slor)
● Extract the Downloaded Compressed File to Your System
● Now in the Terminal Run Command (should change directory of terminal to
Extracted Solr Folder)
○ Unix*: bin/solr start
○ Windows: binsolr.cmd start
● Goto http://localhost:8983/
Techproducts Example
● Starting Solr with Example
○ Unix*: bin/solr -e techproducts
○ Windows: binsolr.cmd -e techproducts
● To verify that Solr is running, you can do this:
○ Unix*: bin/solr status
○ Windows: binsolr.cmd status
● Access Admin Panel
○ http://localhost:8983/solr/
Adding Documents
● Open example/exampledocs/sd500.xml
● Add files to Solr using post.jar
○ cd example/exampledocs
○ java -Dc=techproducts -jar post.jar sd500.xml
● 2 main ways
○ HTTP
○ Native client
<add><doc>
<field name="id">9885A004</field>
<field name="name">Canon PowerShot SD500</field>
<field name="manu">Canon Inc.</field>
...
<field name="inStock">true</field>
</doc></add>
Searching Overview
● Select API Command
○ http://localhost:8983/solr/ techproducts/select?q=sd500&wt=json
● Need only Name and ID of all elements?
○ http://localhost:8983/solr/ techproducts/select?q=inStock:false&wt=jso
n&fl=id,name
● Shutdown
○ Unix*: bin/solr stop
○ Windows: binsolr.cmd stop
● Delete Collection
○ Unix*: bin/solr delete -c techproducts
○ Windows: binsolr.cmd delete -c techproducts
Basic Solr Concepts
● Inverted Index
● Index consists of one or more Documents
● Document consists of one or more Fields
● Every field has a Field Type
● Schema
○ Before adding documents to Solr, you need to specify the schema ! (very important)
○ Schema File: schema.xml
● Schema declares
○ what kinds of fields there are
○ which field should be used as the unique/primary key
○ which fields are required
○ how to index and search each field
Basic Solr Concepts [Contd..]
● Field Types
○ float
○ long
○ double
○ date
○ Text
● Define new field types!
<fieldtype name="phonetic" stored="false" indexed="true" class="solr.TextField" >
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/>
</analyzer>
</fieldtype>
Basic Solr Concepts [Contd..]
● Defining a Field
○ name: Name of the field
○ type: Field type
○ indexed: Should this field be added to the inverted index?
○ stored: Should the original value of this field be stored?
○ multiValued: Can this field have multiple values
<field name="id" type="text" indexed="true" stored="true" multiValued="true"/>
Example Documents
● Use your own project corpus
● Movie Dataset: URL: https://bit.ly/2JhpEhF
Create a Collection
● Start Solr
○ Unix*: bin/solr start
○ Windows: binsolr.cmd start
● Create Collection
○ Unix*: bin/solr create -c movies
○ Windows: binsolr.cmd create -c movies
● Defining Schema
○ Two Approaches
■ Schemaless with “field guessing” feature (Managed Schema)
■ Use schema.xml with custom schema
Custom Schema
● Rename managed_schema file to schema.xml
● schema.xml
○ <field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/>
○ <field name="tagline" type="text_general" indexed="true" stored="true" multiValued="false"/>
○ <field name="overview" type="text_general" indexed="true" stored="true" multiValued="false"/>
○ <field name="status" type="text_general" indexed="true" stored="true" multiValued="false"/>
○ <field name="budget" type="plong" indexed="true" stored="true" multiValued="false"/>
○ <field name="popularity" type="pdouble" indexed="true" stored="true" multiValued="false"/>
○ <field name="release_date" type="pdate" indexed="true" stored="true" multiValued="false"/>
○ <field name="revenue" type="plong" indexed="true" stored="true" multiValued="false"/>
○ <field name="runtime" type="pint" indexed="true" stored="true" multiValued="false"/>
○ <field name="vote_average" type="pfloat" indexed="true" stored="true" multiValued="false"/>
○ <field name="vote_count" type="pint" indexed="true" stored="true" multiValued="false"/>
● solrconfig.xml
○ <schemaFactory class="ClassicIndexSchemaFactory"/>
○ ${update.autoCreateFields:false}
Add Documents
Curl "http://localhost:8983/solr/movies/update?commit=true"
--data-binary @example/movies/movies_metadata.csv -H
"Content-type:application/csv"
Basic Queries
Get All Documents:
http://localhost:8983/solr/movies/select?q=*:*&wt=json
Search Documents Containing “Toy Story” in “title” field:
http://localhost:8983/solr/movies/select?q=title:Toy%20Story&
wt=json
Search Documents Containing “Toy Story”:
http://localhost:8983/solr/movies/select?q=Toy%20Story&wt=j
son (!)
The Fix… (Copy Field)
● Add a Copy Field
<copyField source="*" dest="_text_"/>
● Is it ok? No!
● Only Few Fields
● Which Fields?
○ Title
○ Tagline
○ Overview
Custom Copy Fields
● Add following to schema.xml
<copyField source="title" dest="_text_"/>
<copyField source="tagline" dest="_text_"/>
<copyField source="overview" dest="_text_"/>
● Note that the destination should be marked multiValued="true"
<field name="_text_" type="text_general" indexed="true"
stored="false" multiValued="true"/>
Analyzers
● Analyzers are specified as a child of the <fieldType>
<fieldType name="nametext" class="solr.TextField">
<analyzer class="org.apache.lucene.analysis.core.WhitespaceAnalyzer"/>
</fieldType>
● Using simple processing steps
<fieldType name="nametext" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"/>
</analyzer>
</fieldType>
● Create custom Text Field: text_title
● Filters used in Analyzers
○ Tokenize : Tokenizer
<tokenizer class="solr.StandardTokenizerFactory"/>
○ Stopwords : Filter (stopwords.txt)
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
○ LowerCase: Filter
<filter class="solr.LowerCaseFilterFactory"/>
○ Synonyms : Filter (synonyms.txt)
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
Filters
Analysis Phases
● Separate Analyzers for Index and Query
<fieldType name="nametext" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeepWordFilterFactory" words="keepwords.txt"/>
<filter class="solr.SynonymFilterFactory" synonyms="syns.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Synonyms (synonyms.txt)
● Add Some Synonyms to synonyms.txt
○ story, story, tale, fiction
○ heat, heat, hot, warm
○ se7en, se7en, seven, 7
● Spell correction with Synonyms
○ stores => stories
Toy Stories Example
Advanced Queries
● Search title:Mask AND tagline:hero
○ title:Mask AND tagline:hero
○ http://localhost:8983/solr/movies/select?q=title%3AMask%20AND%20tagline%3Ahero
● Search The Mask in title or Mask in title with hero in tagline
○ title:Mask AND tagline:hero
○ http://localhost:8983/solr/movies/select?q=(title%3AMask%20AND%20tagline%3Ahero)%20O
R%20title%3A%22The%20Mask%22
● Wildcard matching: Search movies that have a title starting with “The”
○ title: ^the
○ http://localhost:8983/solr/movies/select?q=title%3A%22the*%22
● Proximity matching: Search “exorcist spirits" with proximity of 4 words in the
overview field
○ “exorcist spirits"~4
○ http://localhost:8983/solr/movies/select?q=overview%3A%22exorcist%20spirits%22~4
● Range Queries
○ Inclusive Range Query: Square brackets [ & ]
■ budget:[500000 TO *]
○ Exclusive Range Query: Curly brackets { & }
■ budget:{500000 TO *}
● Boosting a Term with ^
○ Want a term to be more relevant?
■ toy^4 story
● For more about Queries:
○ https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html
Advanced Queries
The Schemaless Approach
● Let's do the same in Schemaless Approach
Questions?
Yasas Senarath
Visiting Instructor & Research Assistant
Dept. of Computer Science and Engineering,
University of Moratuwa

More Related Content

What's hot

Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllMichael Mior
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Analyse OpenLDAP logs with ELK
Analyse OpenLDAP logs with ELKAnalyse OpenLDAP logs with ELK
Analyse OpenLDAP logs with ELKClément OUDOT
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsAlexander Korotkov
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
Entity framework code first
Entity framework code firstEntity framework code first
Entity framework code firstConfiz
 
Dao pattern
Dao patternDao pattern
Dao patternciriako
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
How many ways to monitor oracle golden gate-Collaborate 14
How many ways to monitor oracle golden gate-Collaborate 14How many ways to monitor oracle golden gate-Collaborate 14
How many ways to monitor oracle golden gate-Collaborate 14Bobby Curtis
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Applicationsupertom
 
Spring security oauth2
Spring security oauth2Spring security oauth2
Spring security oauth2axykim00
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Julien Le Dem
 
The Hyperledger Indy Public Blockchain Node
The Hyperledger Indy Public Blockchain NodeThe Hyperledger Indy Public Blockchain Node
The Hyperledger Indy Public Blockchain NodeSSIMeetup
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDatabricks
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationDatabricks
 

What's hot (20)

How to Use JSON in MySQL Wrong
How to Use JSON in MySQL WrongHow to Use JSON in MySQL Wrong
How to Use JSON in MySQL Wrong
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Analyse OpenLDAP logs with ELK
Analyse OpenLDAP logs with ELKAnalyse OpenLDAP logs with ELK
Analyse OpenLDAP logs with ELK
 
Session and cookies ,get and post methods
Session and cookies ,get and post methodsSession and cookies ,get and post methods
Session and cookies ,get and post methods
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Entity framework code first
Entity framework code firstEntity framework code first
Entity framework code first
 
Dao pattern
Dao patternDao pattern
Dao pattern
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Introducing Kogito
Introducing KogitoIntroducing Kogito
Introducing Kogito
 
How many ways to monitor oracle golden gate-Collaborate 14
How many ways to monitor oracle golden gate-Collaborate 14How many ways to monitor oracle golden gate-Collaborate 14
How many ways to monitor oracle golden gate-Collaborate 14
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
Spring security oauth2
Spring security oauth2Spring security oauth2
Spring security oauth2
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
 
Hibernate
HibernateHibernate
Hibernate
 
The Hyperledger Indy Public Blockchain Node
The Hyperledger Indy Public Blockchain NodeThe Hyperledger Indy Public Blockchain Node
The Hyperledger Indy Public Blockchain Node
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 

Similar to Solr workshop

Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksAlexandre Rafalovitch
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHPPaul Borgermans
 
Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Websolutions Agency
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverLucidworks (Archived)
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django applicationbangaloredjangousergroup
 
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Kai Chan
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataJihoon Son
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentAlkacon Software GmbH & Co. KG
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrJayesh Bhoyar
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampKais Hassan, PhD
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Kai Chan
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6DEEPAK KHETAWAT
 

Similar to Solr workshop (20)

Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
Apache solr liferay
Apache solr liferayApache solr liferay
Apache solr liferay
 
Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8Using Search API, Search API Solr and Facets in Drupal 8
Using Search API, Search API Solr and Facets in Drupal 8
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Manticore 6.pdf
Manticore 6.pdfManticore 6.pdf
Manticore 6.pdf
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
Python for web security - beginner
Python for web security - beginnerPython for web security - beginner
Python for web security - beginner
 
Nzitf Velociraptor Workshop
Nzitf Velociraptor WorkshopNzitf Velociraptor Workshop
Nzitf Velociraptor Workshop
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6
 
Solr5
Solr5Solr5
Solr5
 

More from Yasas Senarath

Aspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisAspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisYasas Senarath
 
Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Yasas Senarath
 
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...Yasas Senarath
 
Affect Level Opinion Mining
Affect Level Opinion MiningAffect Level Opinion Mining
Affect Level Opinion MiningYasas Senarath
 
Data science / Big Data
Data science / Big DataData science / Big Data
Data science / Big DataYasas Senarath
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep LearningYasas Senarath
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysisYasas Senarath
 

More from Yasas Senarath (7)

Aspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisAspect Based Sentiment Analysis
Aspect Based Sentiment Analysis
 
Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data
 
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
 
Affect Level Opinion Mining
Affect Level Opinion MiningAffect Level Opinion Mining
Affect Level Opinion Mining
 
Data science / Big Data
Data science / Big DataData science / Big Data
Data science / Big Data
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep Learning
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 

Recently uploaded

How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPTiSEO AI
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideStefan Dietze
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jNeo4j
 

Recently uploaded (20)

How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 

Solr workshop

  • 1. Workshop Yasas Senarath Visiting Instructor & Research Assistant Dept. of Computer Science and Engineering, University of Moratuwa Solr
  • 2. Introduction [Recall] ● Search Platform ● Open-Source ● Search Applications ● Built on top of Lucene ● Why… ○ Enterprise-ready ○ Fast ○ Highly Scalable ● Search + NoSQL ○ Non Relational Data Storage
  • 3. Features of Apache Solr [Recall] ● Restful APIs ○ No Java programming skills Required ● Full text search ○ tokens, phrases, spell check, wildcard, and auto-complete ● Enterprise ready ● Flexible and Extensible ● NoSQL database ● Admin Interface ● Highly Scalable ● Text-Centric and Sorted by Relevance
  • 4. How do Search Engines Work?
  • 5. Installing Solr ● Go to Solr Website and Download Binary Version of Solr-8.1.1 (Latest Version of Slor) ● Extract the Downloaded Compressed File to Your System ● Now in the Terminal Run Command (should change directory of terminal to Extracted Solr Folder) ○ Unix*: bin/solr start ○ Windows: binsolr.cmd start ● Goto http://localhost:8983/
  • 6. Techproducts Example ● Starting Solr with Example ○ Unix*: bin/solr -e techproducts ○ Windows: binsolr.cmd -e techproducts ● To verify that Solr is running, you can do this: ○ Unix*: bin/solr status ○ Windows: binsolr.cmd status ● Access Admin Panel ○ http://localhost:8983/solr/
  • 7. Adding Documents ● Open example/exampledocs/sd500.xml ● Add files to Solr using post.jar ○ cd example/exampledocs ○ java -Dc=techproducts -jar post.jar sd500.xml ● 2 main ways ○ HTTP ○ Native client <add><doc> <field name="id">9885A004</field> <field name="name">Canon PowerShot SD500</field> <field name="manu">Canon Inc.</field> ... <field name="inStock">true</field> </doc></add>
  • 8. Searching Overview ● Select API Command ○ http://localhost:8983/solr/ techproducts/select?q=sd500&wt=json ● Need only Name and ID of all elements? ○ http://localhost:8983/solr/ techproducts/select?q=inStock:false&wt=jso n&fl=id,name ● Shutdown ○ Unix*: bin/solr stop ○ Windows: binsolr.cmd stop ● Delete Collection ○ Unix*: bin/solr delete -c techproducts ○ Windows: binsolr.cmd delete -c techproducts
  • 9. Basic Solr Concepts ● Inverted Index ● Index consists of one or more Documents ● Document consists of one or more Fields ● Every field has a Field Type ● Schema ○ Before adding documents to Solr, you need to specify the schema ! (very important) ○ Schema File: schema.xml ● Schema declares ○ what kinds of fields there are ○ which field should be used as the unique/primary key ○ which fields are required ○ how to index and search each field
  • 10. Basic Solr Concepts [Contd..] ● Field Types ○ float ○ long ○ double ○ date ○ Text ● Define new field types! <fieldtype name="phonetic" stored="false" indexed="true" class="solr.TextField" > <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/> </analyzer> </fieldtype>
  • 11. Basic Solr Concepts [Contd..] ● Defining a Field ○ name: Name of the field ○ type: Field type ○ indexed: Should this field be added to the inverted index? ○ stored: Should the original value of this field be stored? ○ multiValued: Can this field have multiple values <field name="id" type="text" indexed="true" stored="true" multiValued="true"/>
  • 12. Example Documents ● Use your own project corpus ● Movie Dataset: URL: https://bit.ly/2JhpEhF
  • 13. Create a Collection ● Start Solr ○ Unix*: bin/solr start ○ Windows: binsolr.cmd start ● Create Collection ○ Unix*: bin/solr create -c movies ○ Windows: binsolr.cmd create -c movies ● Defining Schema ○ Two Approaches ■ Schemaless with “field guessing” feature (Managed Schema) ■ Use schema.xml with custom schema
  • 14. Custom Schema ● Rename managed_schema file to schema.xml ● schema.xml ○ <field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/> ○ <field name="tagline" type="text_general" indexed="true" stored="true" multiValued="false"/> ○ <field name="overview" type="text_general" indexed="true" stored="true" multiValued="false"/> ○ <field name="status" type="text_general" indexed="true" stored="true" multiValued="false"/> ○ <field name="budget" type="plong" indexed="true" stored="true" multiValued="false"/> ○ <field name="popularity" type="pdouble" indexed="true" stored="true" multiValued="false"/> ○ <field name="release_date" type="pdate" indexed="true" stored="true" multiValued="false"/> ○ <field name="revenue" type="plong" indexed="true" stored="true" multiValued="false"/> ○ <field name="runtime" type="pint" indexed="true" stored="true" multiValued="false"/> ○ <field name="vote_average" type="pfloat" indexed="true" stored="true" multiValued="false"/> ○ <field name="vote_count" type="pint" indexed="true" stored="true" multiValued="false"/> ● solrconfig.xml ○ <schemaFactory class="ClassicIndexSchemaFactory"/> ○ ${update.autoCreateFields:false}
  • 15. Add Documents Curl "http://localhost:8983/solr/movies/update?commit=true" --data-binary @example/movies/movies_metadata.csv -H "Content-type:application/csv"
  • 16. Basic Queries Get All Documents: http://localhost:8983/solr/movies/select?q=*:*&wt=json Search Documents Containing “Toy Story” in “title” field: http://localhost:8983/solr/movies/select?q=title:Toy%20Story& wt=json Search Documents Containing “Toy Story”: http://localhost:8983/solr/movies/select?q=Toy%20Story&wt=j son (!)
  • 17. The Fix… (Copy Field) ● Add a Copy Field <copyField source="*" dest="_text_"/> ● Is it ok? No! ● Only Few Fields ● Which Fields? ○ Title ○ Tagline ○ Overview
  • 18. Custom Copy Fields ● Add following to schema.xml <copyField source="title" dest="_text_"/> <copyField source="tagline" dest="_text_"/> <copyField source="overview" dest="_text_"/> ● Note that the destination should be marked multiValued="true" <field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/>
  • 19. Analyzers ● Analyzers are specified as a child of the <fieldType> <fieldType name="nametext" class="solr.TextField"> <analyzer class="org.apache.lucene.analysis.core.WhitespaceAnalyzer"/> </fieldType> ● Using simple processing steps <fieldType name="nametext" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory"/> </analyzer> </fieldType>
  • 20. ● Create custom Text Field: text_title ● Filters used in Analyzers ○ Tokenize : Tokenizer <tokenizer class="solr.StandardTokenizerFactory"/> ○ Stopwords : Filter (stopwords.txt) <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> ○ LowerCase: Filter <filter class="solr.LowerCaseFilterFactory"/> ○ Synonyms : Filter (synonyms.txt) <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> Filters
  • 21. Analysis Phases ● Separate Analyzers for Index and Query <fieldType name="nametext" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.KeepWordFilterFactory" words="keepwords.txt"/> <filter class="solr.SynonymFilterFactory" synonyms="syns.txt"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
  • 22. Synonyms (synonyms.txt) ● Add Some Synonyms to synonyms.txt ○ story, story, tale, fiction ○ heat, heat, hot, warm ○ se7en, se7en, seven, 7 ● Spell correction with Synonyms ○ stores => stories
  • 24. Advanced Queries ● Search title:Mask AND tagline:hero ○ title:Mask AND tagline:hero ○ http://localhost:8983/solr/movies/select?q=title%3AMask%20AND%20tagline%3Ahero ● Search The Mask in title or Mask in title with hero in tagline ○ title:Mask AND tagline:hero ○ http://localhost:8983/solr/movies/select?q=(title%3AMask%20AND%20tagline%3Ahero)%20O R%20title%3A%22The%20Mask%22 ● Wildcard matching: Search movies that have a title starting with “The” ○ title: ^the ○ http://localhost:8983/solr/movies/select?q=title%3A%22the*%22 ● Proximity matching: Search “exorcist spirits" with proximity of 4 words in the overview field ○ “exorcist spirits"~4 ○ http://localhost:8983/solr/movies/select?q=overview%3A%22exorcist%20spirits%22~4
  • 25. ● Range Queries ○ Inclusive Range Query: Square brackets [ & ] ■ budget:[500000 TO *] ○ Exclusive Range Query: Curly brackets { & } ■ budget:{500000 TO *} ● Boosting a Term with ^ ○ Want a term to be more relevant? ■ toy^4 story ● For more about Queries: ○ https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html Advanced Queries
  • 26. The Schemaless Approach ● Let's do the same in Schemaless Approach
  • 27. Questions? Yasas Senarath Visiting Instructor & Research Assistant Dept. of Computer Science and Engineering, University of Moratuwa