SlideShare a Scribd company logo
1 of 21
Download to read offline
Faceted Search with Lucene
Shai Erera
Researcher, IBM
Who Am I
•
•
•
•

Working at IBM – Information Retrieval Research
Lucene/Solr committer and PMC member
http://shaierera.blogspot.com
shaie@apache.org
Lucene Facets 101
Faceted Search
•

Technique for accessing documents that were classified into a taxonomy of categories
–

•

Flat: Author/John Doe, Tags/Lucene, Popularity/High

–

Hierarchical: Computers/Software/Information Retrieval/Fulltext/Apache Lucene (ODP)

Quick overview of the break down of the search results
–

•

How many documents are in category Committed Paths/lucene/core vs. Committed Paths/lucene/facet

Simplifies interaction with the search application
–

Drilldown to issues that were updated in Past 2 days by clicking a link

–

No knowledge required about search syntax and index schema

http://jirasearch.mikemccandless.com
Lucene Facets
•
•

Contributed by IBM in 2011, released in 3.4.0
Major changes since 4.1.0+
–
–
–
–

•

Two main indexing-time modes
–
–

•

Taxonomy-based: hierarchical facets, managed by a
sidecar index, low NRT reopen cost
SortedSetDocValues: flat facets only, no sidecar index,
higher NRT reopen cost

Runtime modes
–

•

NRT support
Nearly 400% search speedups
Complete API revamp
New features (SortedSet, range faceting, drill-sideways)

Range facets (on NumericDocValues fields)

Other implementations: Solr, ElasticSearch, Bobo
Browse
Lucene Facet Components
•

TaxonomyWriter/Reader
–

•

FacetFields
–

•

Defines which facets to aggregate and the FacetsAggregator (aggregation function)

FacetsCollector
–

•

Add facets information to documents (DocValues fields, drilldown terms)

FacetRequest
–

•

Manage the taxonomy information

Collects matching documents and computes the top-K categories for each facet request
(invokes FacetsAccumulator)

DrillDownQuery / DrillSideways
–

Execute drilldown and drill-sideways requests
Sample Code – Indexing
// Builds the taxonomy as documents are indexed, multi-threaded, single instance
TaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(taxoDir);
// Adds facets information to a document, can be initialized once per thread
FacetFields facetFields = new FacetFields(taxoWriter);
// List of categories to add to the document
List<CategoryPath> cats = new ArrayList<CategoryPath>();
cats.add(new CategoryPath("Author", "Erik Hatcher"));
cats.add(new CategoryPath("Author/Otis Gospodnetić“, ‘/’));
cats.add(new CategoryPath("Pub Date", "2004", "December", "1"));
Document bookDoc = new Document();
bookDoc.add(new TextField(“title”, “lucene in action”, Store.YES);
// add categories fields (DocValues, Postings)
facetFields.addFields(bookDoc, cats);
// index the document
indexWriter.addDocument(bookDoc);
Sample Code – Search
// Open an NRT TaxonomyReader
TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoWriter);
// Define the facets to
FacetSearchParams fsp =
fsp.addFacetRequest(new
fsp.addFacetRequest(new

aggregate (top-10 categories for each)
new FacetSearchParams();
CountFacetRequest(new CategoryPath("Author"), 10));
CountFacetRequest(new CategoryPath("Pub Date"), 10));

// Collect both top-K facets and top-N matching documents
TopDocsCollector tdc = TopScoredDocCollector.create(10, true);
FacetsCollector fc = FacetsCollector.create(fsp, indexr, taxor);
Query q = new TermQuery(new Term(“title”, “lucene”));
searcher.search(q, MultiCollector.wrap(tdc, fc));
// Traverse the top facets
for (FacetResult fres : facetsCollector.getFacetResults()) {
FacetResultNode root = fres.getFacetResultNode();
System.out.println(String.format("%s (%d)", root.label, root.value));
for (FacetResultNode cat : root.getSubResults()) {
System.out.println(“ “ + cat.label.components[0] + “ (“ + cat.value + “)”);
}
}
Drilldown and Drill-Sideways
•

Drilldown adds a filter to the search
–

Multiple categories can be OR’d

// Drilldown – filter results to “Component/core/index”;
// All other “Component/*” and “Component/core/*” get count 0
Query base = new MatchAllDocsQuery();
DrillDownQuery ddq = new DrillDownQuery(facetIndexingParams, base);
ddq.add(new CategoryPath(“Component/core/index”, ‘/’));

•

Drill sideways allows drilldown, yet still aggregate “sideways”
categories

// Drill-Sideways – drilldown on “Component/core/index”;
// Other “Component/*” and “Component/core/*” are counted too
DrillSideways ds = new DrillSideways(searcher, taxoReader);
DrillSidewaysResult sidewaysRes = ds.search(null, ddq, 10, fsp);
http://blog.mikemccandless.com/2013/02/drill-sideways-faceting-with-lucene.html
Dynamic Facets
•

Range facets on NumericDocValues fields
–
–

Define interested buckets during query
Supports any arbitrary ValueSource (Lucene 4.6.0)

// Aggregate matching documents into buckets
RangeAccumulator a = new RangeAccumulator(new
RangeFacetRequest<LongRange>("field",
new LongRange(“1-5", 1L, true, 5L, true),
new LongRange(“6-20", 6L, true, 20L, true),
new LongRange(“21-100", 21L, false, 100L, false),
new LongRange(“over 100", 100L, false, Long.MAX_VALUE, true)));
Facet Associations
•

Not all facets created equal
–
–
–

•

Categories can have values associated with them per document
–
–

•

Categories added by an automatic categorization system, e.g. Category/Apache
Lucene (0.74) (confidence level is 0.74)
Important metadata about the facet, e.g. Contracts/US ($5M) (total $$$ generated
from contracts)
Complex structures, e.g. Users/Shai Erera (lastAccess=YYYY/MM/DD,
numUpdates=8…)
They are later aggregated by these values
NOTE: ≠ NumericDocValuesFields!

Facet associations are completely customizable – encoded as a byte[] per
document

http://shaierera.blogspot.com/2013/01/facet-associations.html
More Features
•

Complements
–
–
–

•

Sampling
–
–

•

Holds the count of each category in-memory, per IndexReader
When number of search results is >50% of the index, count the “complement set”
Useful for “overview” queries, e.g. MatchAllDocsQuery
Aggregate a sampled set of the search results
Optionally re-count top-K facets for accurate values

Partitions
–
–

Partition the taxonomy space to control memory usage during faceted search
Useful for very big taxonomies (10s of millions of categories)
Lucene Facets Under the Hood
The Taxonomy Index
•

The taxonomy maps categories to integer codes (referred to as ordinals)
–
–
–

•

Kind of like a Map<CategoryPath,Integer>, with hierarchy support
Provides taxonomy browsing services
DirectoryTaxonomyWriter is managed as a sidecar Lucene index

Categories are broken down to their path components, e.g.
Date/2012/March/20 becomes:
–
–
–
–

Date, with ordinal=1
Date/2012, with ordinal=2
Date/2012/March, with ordinal=3
Date/2012/March/20, with ordinal=4
The Search Index
•

Categories are added as drilldown terms, e.g. for Date/2012/March/20:
–
–
–

•

$facets:Date
$facets:Date/2012
…

All category ordinals associated with the document are added as a
BinaryDocValuesField
–
–

All path components ordinals’ are added, not just the leafs’
Encoded as VInt + gap for efficient compression and speed
•

–

Other compression methods attempted, but were slower to decode (LUCENE-4609)

Used during faceted search to read all the associated ordinals and aggregate accordingly
(e.g. count)
SortedSet Facets
•
•
•
•

SortedSetFacetFields add SortedSetDocValuesFields and drilldown
terms to documents
Local-segment SortedSet ordinals are mapped to global ones through
SortedSetDocValuesReaderState
Use SortedSetDocValuesAccumulator to accumulate SortedSet facets
Advantages:
–
–
–

•

Taxonomy representation requires less RAM (flat taxonomy)
No sidecar index
Tie-breaks by label-sort order

Disadvantages:
–
–
–
–

Not full taxonomy
Overall uses more RAM (local-to-global ordinal mapping)
Adds NRT reopen cost
Slower than taxonomy-based facets
Global Ordinals
•

Per-segment integer codes (as used by the SortedSet approach) are less efficient
–
–
–

•

Global ordinals allow efficient per-segment faceting and aggregation
–
–

•

Different ordinals for same categories across segments
Hold in-memory codes map (e.g. local-to-global) – more RAM and less scalable
Resolve top-K on the String representation of categories – more CPU
No translation maps required (no extra RAM, highly scalable)
Aggregation, top-K computation done on integer codes

But, do not play well with IndexWriter.addIndexes(Directory…)
–

Must use IndexWriter.addIndexes(IndexReader…), so that the ordinals in the
input search are mapped to the destination’s
Two-Phase Aggregation
•

FacetsCollector works in two steps:
–
–

•

Performance tests show that this improves faceted search (LUCENE-4600)
–

•

Collects matching documents (and optionally their scores)
Invokes FacetsAccumulator to accumulate the top-K facets
Locality of reference?

Useful for Sampling and Complements
–

Hard to do otherwise
FacetIndexingParams
•

Determine how facets are encoded
–
–
–

•

CategoryListParams holds parameters for a category list
–
–

•

Partition size
Facet delimiter character (for drilldown terms, default u001F)
CategoryListParams
Encoder/Decoder (default DGapVInt)
OrdinalPolicy (how path components are encoded): ALL_PARENTS, NO_PARENTS and
ALL_BUT_DIMENSION (default)

CategoryListParams can be used to group facets together
–
–

Default: all facets are put in the same “category list” (i.e. one BinaryDocValues field)
Expert: separate categories by dimension into different category lists
•

•

Useful when sets of categories are always aggregated together, but not with other categories

FacetIndexingParams are currently not recorded per-segment and therefore you
should be careful if you suddenly change them!
Questions?

More Related Content

What's hot

An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaObjectRocket
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Julien Le Dem
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...Rahul K Chauhan
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
 
Java Annotation Processing: A Beginner Walkthrough
Java Annotation Processing: A Beginner WalkthroughJava Annotation Processing: A Beginner Walkthrough
Java Annotation Processing: A Beginner WalkthroughMahfuz Islam Bhuiyan
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solrlucenerevolution
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into ElasticsearchKnoldus Inc.
 
JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)Faysal Shaarani (MBA)
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchpmanvi
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013larsgeorge
 
Elastic Search
Elastic SearchElastic Search
Elastic SearchNavule Rao
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfSease
 

What's hot (20)

An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Encores
EncoresEncores
Encores
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Java Annotation Processing: A Beginner Walkthrough
Java Annotation Processing: A Beginner WalkthroughJava Annotation Processing: A Beginner Walkthrough
Java Annotation Processing: A Beginner Walkthrough
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solr
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
How to Use JSON in MySQL Wrong
How to Use JSON in MySQL WrongHow to Use JSON in MySQL Wrong
How to Use JSON in MySQL Wrong
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdf
 

Viewers also liked

The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeleylucenerevolution
 
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsApproaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsLucidworks
 
How to Wield Kentico 9 in the Real World
How to Wield Kentico 9 in the Real WorldHow to Wield Kentico 9 in the Real World
How to Wield Kentico 9 in the Real WorldBrian McKeiver
 
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid DynamicsFaceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid DynamicsLucidworks
 
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...Earley Information Science
 
Are users really ready for faceted search?
Are users really ready for faceted search?Are users really ready for faceted search?
Are users really ready for faceted search?epek
 
Extending facet search to the general web
Extending facet search to the general webExtending facet search to the general web
Extending facet search to the general web祺傑 林
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solrtomhill
 
Faceted Classification System in Libraries
Faceted Classification System in LibrariesFaceted Classification System in Libraries
Faceted Classification System in LibrariesLaura Loveday Maury
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Lucidworks
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...Grokking VN
 
Comparative study of major classification schemes
Comparative study of major classification schemesComparative study of major classification schemes
Comparative study of major classification schemesNadeem Nazir
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy SokolenkoProvectus
 

Viewers also liked (20)

The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeley
 
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsApproaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
 
How to Wield Kentico 9 in the Real World
How to Wield Kentico 9 in the Real WorldHow to Wield Kentico 9 in the Real World
How to Wield Kentico 9 in the Real World
 
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid DynamicsFaceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
 
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
 
Are users really ready for faceted search?
Are users really ready for faceted search?Are users really ready for faceted search?
Are users really ready for faceted search?
 
Faceted Navigation
Faceted NavigationFaceted Navigation
Faceted Navigation
 
Extending facet search to the general web
Extending facet search to the general webExtending facet search to the general web
Extending facet search to the general web
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
An Introduction to Solr
An Introduction to SolrAn Introduction to Solr
An Introduction to Solr
 
Faceted Classification System in Libraries
Faceted Classification System in LibrariesFaceted Classification System in Libraries
Faceted Classification System in Libraries
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Comparative study of major classification schemes
Comparative study of major classification schemesComparative study of major classification schemes
Comparative study of major classification schemes
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 

Similar to Faceted Search with Lucene

(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol ValidationBIOVIA
 
Redshift Chartio Event Presentation
Redshift Chartio Event PresentationRedshift Chartio Event Presentation
Redshift Chartio Event PresentationChartio
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)maclean liu
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Manish kumar
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friendslucenerevolution
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsJulien Nioche
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Netgramana
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...Cloudera, Inc.
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyRobert Viseur
 
DSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDuraSpace
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
 
Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…Aaron Shilo
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Oracle Database Overview
Oracle Database OverviewOracle Database Overview
Oracle Database Overviewhonglee71
 

Similar to Faceted Search with Lucene (20)

Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
21 domino mohan-1
21 domino mohan-121 domino mohan-1
21 domino mohan-1
 
Lucene and MySQL
Lucene and MySQLLucene and MySQL
Lucene and MySQL
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
 
Redshift Chartio Event Presentation
Redshift Chartio Event PresentationRedshift Chartio Event Presentation
Redshift Chartio Event Presentation
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Oracle by Muhammad Iqbal
Oracle by Muhammad IqbalOracle by Muhammad Iqbal
Oracle by Muhammad Iqbal
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Net
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
DSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/ExportDSpace 4.2 Transmission: Import/Export
DSpace 4.2 Transmission: Import/Export
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Oracle Database Overview
Oracle Database OverviewOracle Database Overview
Oracle Database Overview
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...lucenerevolution
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platformlucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 

Recently uploaded

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Faceted Search with Lucene

  • 1.
  • 2. Faceted Search with Lucene Shai Erera Researcher, IBM
  • 3. Who Am I • • • • Working at IBM – Information Retrieval Research Lucene/Solr committer and PMC member http://shaierera.blogspot.com shaie@apache.org
  • 5. Faceted Search • Technique for accessing documents that were classified into a taxonomy of categories – • Flat: Author/John Doe, Tags/Lucene, Popularity/High – Hierarchical: Computers/Software/Information Retrieval/Fulltext/Apache Lucene (ODP) Quick overview of the break down of the search results – • How many documents are in category Committed Paths/lucene/core vs. Committed Paths/lucene/facet Simplifies interaction with the search application – Drilldown to issues that were updated in Past 2 days by clicking a link – No knowledge required about search syntax and index schema http://jirasearch.mikemccandless.com
  • 6. Lucene Facets • • Contributed by IBM in 2011, released in 3.4.0 Major changes since 4.1.0+ – – – – • Two main indexing-time modes – – • Taxonomy-based: hierarchical facets, managed by a sidecar index, low NRT reopen cost SortedSetDocValues: flat facets only, no sidecar index, higher NRT reopen cost Runtime modes – • NRT support Nearly 400% search speedups Complete API revamp New features (SortedSet, range faceting, drill-sideways) Range facets (on NumericDocValues fields) Other implementations: Solr, ElasticSearch, Bobo Browse
  • 7. Lucene Facet Components • TaxonomyWriter/Reader – • FacetFields – • Defines which facets to aggregate and the FacetsAggregator (aggregation function) FacetsCollector – • Add facets information to documents (DocValues fields, drilldown terms) FacetRequest – • Manage the taxonomy information Collects matching documents and computes the top-K categories for each facet request (invokes FacetsAccumulator) DrillDownQuery / DrillSideways – Execute drilldown and drill-sideways requests
  • 8. Sample Code – Indexing // Builds the taxonomy as documents are indexed, multi-threaded, single instance TaxonomyWriter taxoWriter = new DirectoryTaxonomyWriter(taxoDir); // Adds facets information to a document, can be initialized once per thread FacetFields facetFields = new FacetFields(taxoWriter); // List of categories to add to the document List<CategoryPath> cats = new ArrayList<CategoryPath>(); cats.add(new CategoryPath("Author", "Erik Hatcher")); cats.add(new CategoryPath("Author/Otis Gospodnetić“, ‘/’)); cats.add(new CategoryPath("Pub Date", "2004", "December", "1")); Document bookDoc = new Document(); bookDoc.add(new TextField(“title”, “lucene in action”, Store.YES); // add categories fields (DocValues, Postings) facetFields.addFields(bookDoc, cats); // index the document indexWriter.addDocument(bookDoc);
  • 9. Sample Code – Search // Open an NRT TaxonomyReader TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoWriter); // Define the facets to FacetSearchParams fsp = fsp.addFacetRequest(new fsp.addFacetRequest(new aggregate (top-10 categories for each) new FacetSearchParams(); CountFacetRequest(new CategoryPath("Author"), 10)); CountFacetRequest(new CategoryPath("Pub Date"), 10)); // Collect both top-K facets and top-N matching documents TopDocsCollector tdc = TopScoredDocCollector.create(10, true); FacetsCollector fc = FacetsCollector.create(fsp, indexr, taxor); Query q = new TermQuery(new Term(“title”, “lucene”)); searcher.search(q, MultiCollector.wrap(tdc, fc)); // Traverse the top facets for (FacetResult fres : facetsCollector.getFacetResults()) { FacetResultNode root = fres.getFacetResultNode(); System.out.println(String.format("%s (%d)", root.label, root.value)); for (FacetResultNode cat : root.getSubResults()) { System.out.println(“ “ + cat.label.components[0] + “ (“ + cat.value + “)”); } }
  • 10. Drilldown and Drill-Sideways • Drilldown adds a filter to the search – Multiple categories can be OR’d // Drilldown – filter results to “Component/core/index”; // All other “Component/*” and “Component/core/*” get count 0 Query base = new MatchAllDocsQuery(); DrillDownQuery ddq = new DrillDownQuery(facetIndexingParams, base); ddq.add(new CategoryPath(“Component/core/index”, ‘/’)); • Drill sideways allows drilldown, yet still aggregate “sideways” categories // Drill-Sideways – drilldown on “Component/core/index”; // Other “Component/*” and “Component/core/*” are counted too DrillSideways ds = new DrillSideways(searcher, taxoReader); DrillSidewaysResult sidewaysRes = ds.search(null, ddq, 10, fsp); http://blog.mikemccandless.com/2013/02/drill-sideways-faceting-with-lucene.html
  • 11. Dynamic Facets • Range facets on NumericDocValues fields – – Define interested buckets during query Supports any arbitrary ValueSource (Lucene 4.6.0) // Aggregate matching documents into buckets RangeAccumulator a = new RangeAccumulator(new RangeFacetRequest<LongRange>("field", new LongRange(“1-5", 1L, true, 5L, true), new LongRange(“6-20", 6L, true, 20L, true), new LongRange(“21-100", 21L, false, 100L, false), new LongRange(“over 100", 100L, false, Long.MAX_VALUE, true)));
  • 12. Facet Associations • Not all facets created equal – – – • Categories can have values associated with them per document – – • Categories added by an automatic categorization system, e.g. Category/Apache Lucene (0.74) (confidence level is 0.74) Important metadata about the facet, e.g. Contracts/US ($5M) (total $$$ generated from contracts) Complex structures, e.g. Users/Shai Erera (lastAccess=YYYY/MM/DD, numUpdates=8…) They are later aggregated by these values NOTE: ≠ NumericDocValuesFields! Facet associations are completely customizable – encoded as a byte[] per document http://shaierera.blogspot.com/2013/01/facet-associations.html
  • 13. More Features • Complements – – – • Sampling – – • Holds the count of each category in-memory, per IndexReader When number of search results is >50% of the index, count the “complement set” Useful for “overview” queries, e.g. MatchAllDocsQuery Aggregate a sampled set of the search results Optionally re-count top-K facets for accurate values Partitions – – Partition the taxonomy space to control memory usage during faceted search Useful for very big taxonomies (10s of millions of categories)
  • 15. The Taxonomy Index • The taxonomy maps categories to integer codes (referred to as ordinals) – – – • Kind of like a Map<CategoryPath,Integer>, with hierarchy support Provides taxonomy browsing services DirectoryTaxonomyWriter is managed as a sidecar Lucene index Categories are broken down to their path components, e.g. Date/2012/March/20 becomes: – – – – Date, with ordinal=1 Date/2012, with ordinal=2 Date/2012/March, with ordinal=3 Date/2012/March/20, with ordinal=4
  • 16. The Search Index • Categories are added as drilldown terms, e.g. for Date/2012/March/20: – – – • $facets:Date $facets:Date/2012 … All category ordinals associated with the document are added as a BinaryDocValuesField – – All path components ordinals’ are added, not just the leafs’ Encoded as VInt + gap for efficient compression and speed • – Other compression methods attempted, but were slower to decode (LUCENE-4609) Used during faceted search to read all the associated ordinals and aggregate accordingly (e.g. count)
  • 17. SortedSet Facets • • • • SortedSetFacetFields add SortedSetDocValuesFields and drilldown terms to documents Local-segment SortedSet ordinals are mapped to global ones through SortedSetDocValuesReaderState Use SortedSetDocValuesAccumulator to accumulate SortedSet facets Advantages: – – – • Taxonomy representation requires less RAM (flat taxonomy) No sidecar index Tie-breaks by label-sort order Disadvantages: – – – – Not full taxonomy Overall uses more RAM (local-to-global ordinal mapping) Adds NRT reopen cost Slower than taxonomy-based facets
  • 18. Global Ordinals • Per-segment integer codes (as used by the SortedSet approach) are less efficient – – – • Global ordinals allow efficient per-segment faceting and aggregation – – • Different ordinals for same categories across segments Hold in-memory codes map (e.g. local-to-global) – more RAM and less scalable Resolve top-K on the String representation of categories – more CPU No translation maps required (no extra RAM, highly scalable) Aggregation, top-K computation done on integer codes But, do not play well with IndexWriter.addIndexes(Directory…) – Must use IndexWriter.addIndexes(IndexReader…), so that the ordinals in the input search are mapped to the destination’s
  • 19. Two-Phase Aggregation • FacetsCollector works in two steps: – – • Performance tests show that this improves faceted search (LUCENE-4600) – • Collects matching documents (and optionally their scores) Invokes FacetsAccumulator to accumulate the top-K facets Locality of reference? Useful for Sampling and Complements – Hard to do otherwise
  • 20. FacetIndexingParams • Determine how facets are encoded – – – • CategoryListParams holds parameters for a category list – – • Partition size Facet delimiter character (for drilldown terms, default u001F) CategoryListParams Encoder/Decoder (default DGapVInt) OrdinalPolicy (how path components are encoded): ALL_PARENTS, NO_PARENTS and ALL_BUT_DIMENSION (default) CategoryListParams can be used to group facets together – – Default: all facets are put in the same “category list” (i.e. one BinaryDocValues field) Expert: separate categories by dimension into different category lists • • Useful when sets of categories are always aggregated together, but not with other categories FacetIndexingParams are currently not recorded per-segment and therefore you should be careful if you suddenly change them!