SlideShare a Scribd company logo
1 of 44
Download to read offline
The Many Facets of Apache Solr
          Yonik Seeley, Lucid Imagination
     yonik@lucidimagination.com, Oct 20 2011
What I Will Cover

§    What is Faceted Search
§    Solr’s Faceted Search
§    Tips & Tricks
§    Performance & Algorithms




                           3
My Background
§  Creator of Solr
§  Co-founder of Lucid Imagination
§  Expertise: Distributed Search systems and
    performance
§  Lucene/Solr committer, a member of the
    Lucene PMC,
    member of the Apache Software
    Foundation
§  Work: CNET Networks, BEA, Telcordia,
    among others
§  M.S. in Computer Science, Stanford
What is Faceted Search




           5
Faceted Search Example

                                  The facet count or
                                    constraint count
 Manufacturer is
                                   shows how many
a facet, a way of
                                  results match each
categorizing the
                                         value
     results




 Canon, Sony,
 and Nikon are
constraints, or
  facet values



The breadcrumb
 trail shows what
constraints have
   already been
    applied and
  allows for their
       removal
Key Elements of Faceted Search
§  No hierarchy of options is enforced
   •  Users can apply facet constraints in any order
   •  Users can remove facet constraints in any order
§  No surprises
   •  The user is only given facets and constraints
      that make sense in the context of the items they
      are looking at
   •  The user always knows what to expect before
      they apply a constraint
§  Also known as guided navigation, faceted
    navigation, faceted browsing, parametric search

                            7
Solr’s Faceted Search




          8
Field Faceting
§  Specifies a Field to be used as a Facet
§  Uses each term indexed in that Field as a
    Constraint
§  Field must be indexed
§  Can be used multiple times for multiple fields

  q
  !                    =   iPhone!
    !
 !fq                   =   inStock:true!
 !facet                =   true!
 !facet.field          =   color!
 !facet.field          =   category!
                            9
Field Faceting Response
http://localhost:8983/solr/select?q=iPhone&fq=inStock:true!
&facet=true&facet.field=color&facet.field=category!


<lst name="facet_counts”>!
 <lst name="facet_fields">!
  <lst name="color">!
    <int name="red">17</int>!
    <int name="green">6</int>!
    <int name="blue">2</int>!
    <int name="yellow">2</int>!
  <lst name=”category">!
    <int name=“accessories”>16</int>!
    <int name=“electronics”>11</int>!
                             10
Or if you prefer JSON…
http://localhost:8983/solr/select?q=iPhone&fq=inStock:true!
&facet=true&facet.field=color&facet.field=category&wt=json!


"facet_counts":{!
  "facet_fields":{!
    "color":[!
      "red",17,!
      "green",6,!
      "blue",2,      !
      "yellow",2]!
    "category":[!
      "accessories",16,!
      "electronics",11]!     11
Applying Constraints
  Assume the user clicked on “red”…
                                         Simply add another filter
                                           query to apply that
                                                constraint

http://localhost:8983/solr/select?
q=iPhone&fq=inStock:true&fq=color:red&facet=true&
facet.field=color&facet.field=category!



   Remove redundant facet.field
   (assuming single valued field)

                                    12
facet.field Options
§  facet.prefix - Restricts the possible constraints to only
    indexed values with a specified prefix.
§  facet.mincount=0 - Restricts the constraints returned to
    those containing a minimum number of documents in the
    result set.
§  facet.sort=count - The ordering of constraints: count or
    index
§  facet.offset=0 - Indicates how many constraints in the
    specified sort ordering should be skipped in the response.
§  facet.limit=100 - The number of constraints to return
§  facet.missing=false – Return the number of docs with no
    value in the field
                              13
facet.query
                        !
§  Specifies a query string to be used as a Facet
    Constraint
§  Typically used multiple times to get multiple
    (discrete) sets
§  Any type of query supported


!facet.query = rank:[* TO 20]!
!facet.query = rank:[21 TO *]!


                          14
facet.query Results

<result numFound="27" ... />!
...!
<lst name="facet_counts">!
 <lst name="facet_queries">!
  <int name="rank:[* TO 20]">2</int>!
  <int name="rank:[21 TO *]">15</int>!
 </lst>!
 ...!
The lat,lon center point to
 Spatial faceting                  search from

  !q=*:*&facet=true!
 !pt=45.15,-93.85!       Name of the field
                       containing lat+lon data
 !sfield=store!
 !facet.query={!geofilt d=5}!
 !facet.query={!geofilt d=10}!

                            geospatial query type
"facet_counts":{!
   "facet_queries":{!
     "{!geofilt d=5}":3,!
     "{!geofilt d=10}":6},!
                     16
Range Faceting
                              "facet_counts":{
§  Simpler than a sequence     "facet_ranges":{
    of facet.query params         "price":{
                                   "counts”:[
                                     "0.0”,5,
http://...&facet=true                "50.0”,2,
&facet.range=price                   "100.0”,0,
                                     "150.0”,2,
&facet.range.start=0                 "200.0”,0,
&facet.range.end=500                 "250.0”,1,
                                     "300.0”,2,
&facet.range.gap=50                  "350.0”,2,
                                     "400.0”,0,
                                     "450.0”,1],
                                   "gap":50.0,
                                   "start":0.0,
                                   "end":500.0}}}}
Date Faceting
l  facet.date  is deprecated, use facet.range on a date field now
l  Creates Constraints based on evenly sized date ranges using
    the Gregorian Calendar
l  Ranges are specified using "Date Math" so they DWIM in spite
    of variable length months and leap years


  facet.range                    =   pubdate!
  facet.range.start              =   NOW/YEAR-1YEAR!
  facet.range.end                =   NOW/MONTH+1MONTH!
  facet.range.gap                =   +1MONTH!
Date Faceting Results
 "facet_counts":{!
   "facet_ranges":{!
      ”pubdate":{!
        "counts":[!
          "2010-01-01T00:00:00Z",4,!
          "2010-02-01T00:00:00Z",6,!
          "2010-03-01T00:00:00Z",0,!
          "2010-04-01T00:00:00Z",13,!
[…]!
          "2011-09-01T00:00:00Z",5,!
          "2011-10-01T00:00:00Z",2],!
        "gap":"+1MONTH",!
        "start":"2010-01-01T00:00:00Z",!
        "end":"2011-11-01T00:00:00Z”}}}!
Range Faceting Options
l  facet.range.hardend=false     - Determines what effective
    end value is used when the specified "start" and "end"
    don't divide into even "gap" sized buckets; false means
    the last Constraint range may be shorter then the others
l  facet.range.other=none - Allows you to specify what
    other Constraints you are interested in besides the
    generated ranges: before, after, between, none, all
l  facet.range.include=lower – Specifies what bounds are
    inclusive vs exclusive: lower, upper, edge, outer, all
Pivot Faceting (trunk)
l  Computes   a Matrix of Constraint Counts across multiple
    Facet Fields
l  Syntax: facet.pivot=field1,field2,field3,…


facet.pivot=cat,inStock
                      #docs #docs w/         #docs w/
                            inStock:true     instock:false
cat:electronics       14     10              4
cat:memory            3      3               0
cat:connector         2      0               2
cat:graphics card     2      0               2
cat:hard drive        2      2               0
Pivot Faceting
   http://...&facet=true&facet.pivot=cat,popularity
            "facet_counts":{                    (continued)
               "facet_pivot":{
                 "cat,popularity":[{           {
                   "field":"cat",                "field":"popularity",
14 docs w/         "value":"electronics",        "value”:1,
cat==electronics   "count":14,                   "count":2}]},
                   "pivot":[{               {
5 docs w/             "field":"popularity", "field":"cat",
cat==electronics      "value":6,              "value":"memory",
&& popularity==6      "count":5},             "count":3,
                    {                         "pivot":[]},
                      "field":"popularity",
                      "value":7,                […]
                      "count":4},
Tips & Tricks




      23
term QParser
l  Default Query Parser does special things with whitespace
    and punctuation
l  Problematic when "filtering" on Facet Field Constraints
    that contain whitespace, punctuation, or other reserved
    characters.
l  Use the term parser to filter on an exact Term


 fq = {!term f=category}Books & Magazines
                         OR
 fq = {!term f=category v=$t}
  t = Books & Magazines
Taxonomy Facets
l  What   If Your Documents Are Organized in a Taxonomy?
Taxonomy Facets: Data
l  Flattened   Data
  !Doc#1: NonFic > Law!
  !Doc#2: NonFic > Sci!
  !Doc#3: NonFic > Hist!
          NonFic > Sci > Phys!
l  Indexed   Terms (prepend number of nodes in path segment)

Doc#1: 1/NonFic, 2/NonFic/Law!
Doc#2: 1/NonFic, 2/NonFic/Sci!
Doc#3: 1/NonFic, 2/NonFic/Hist, !
       2/NonFic/Sci, 3/NonFic/Sci/Phys!
Taxonomy Facets: Initial Query

facet.field    = category!
facet.prefix   = 2/NonFic!
facet.mincount = 1!

<result numFound="164" ...!
<lst name="facet_fields">!
 <lst name="category">!
   <int name="2/NonFic/Sci">2</int>!
   <int name="2/NonFic/Hist">1</int>!
   <int name="2/NonFic/Law">1</int>!
Taxonomy Facets: Drill Down

fq           =   {!term f=category}2/NonFic/Sci!
facet.field      = category!
facet.prefix     = 3/NonFic/Sci!
facet.mincount   = 1!


<result numFound="2" ...!
<lst name="facet_fields">!
 <lst name="category">!
   <int name=”3/NonFic/Sci/Phys">1</int>!
Multi-Select Faceting
http://search.lucidimagination.com   §  Very	
  generic	
  support	
  
                                     •  Reuses	
  localParams	
  syntax	
  {!name=val}	
  
                                     •  Ability	
  to	
  tag	
  filters	
  
                                     •  Ability	
  to	
  exclude	
  certain	
  filters	
  when	
  
                                        faceCng,	
  by	
  tag	
  
                                       	
  

                                      	
  q=index	
  replicaCon	
  
                                      	
  facet=true	
  
                                      	
  fq={!tag=pr}project:(lucene	
  OR	
  solr)	
  
                                      	
  facet.field={!ex=pr}project	
  
                                      	
  facet.field={!ex=src}source	
  


                                              29	
  
Same Facet, Different Exclusions
  §  A key can be specified for a facet to change the
      name used to identify it in the response.
          q   =   Hot Rod!
         fq   =   {!df=colors tag=cx}purple green!
facet.field   =   {!key=all_colors ex=cx}colors!
facet.field   =   {!key=overlap_colors}colors!

"facet_counts":{!                 ”overlap_colors":[!
  "facet_fields":{!                   "red",7,!
    ”all_colors":[!                   "green",6,!
      "red",19,!                      "blue”,1]!
      "green",6,!                 }!
      "blue",2],!            }!
                            30
“Pretty” facet.field Terms
      §  Field Faceting uses Indexed Terms
      §  Leverage copyField and TokenFilters that will
          give you good looking Constraints

<tokenizer	
  class="solr.PaPernTokenizerFactory"	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  paPern="(,|;)s*"	
  />	
  
<filter	
  class="solr.PaPernReplaceFilterFactory"	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  paPern="s+"	
  replacement="	
  "	
  />	
  
<filter	
  class="solr.PaPernReplaceFilterFactory"	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  paPern="	
  and	
  "	
  replacement="	
  &amp;	
  "	
  />	
  
<filter	
  class="solr.TrimFilterFactory"	
  />	
  
<filter	
  class="solr.CapitalizaConFilterFactory"	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  onlyFirstWord="false"	
  />	
  
                                                      31
“Pretty” facet.field Results
{“id” : “MyExampleDoc”,!
 "category” : ” books !
 and magazines; !
computers, “!
}!
   copyField
  in schema    <copyField “source”=“category” “dest”=“category_pretty”/>

"facet_counts":{!
  "facet_fields":{!
    "category_pretty":[!
      "Books & Magazines",1,!
      "Computers",1]!
                                 32
facet.field Labels
  §  facet.query params are echoed verbatim when
      returning the constraint counts
  §  Optionally, one can declare a facet.query in
      solrconfig.xml and include a "label" that the
      presentation layer can parse out for display. “label” has no
                                                                                    meaning to solr
facet.query = {!label=‘Hot!’	
  }!
              +pop:[1 TO *] !
              +pub_date:[NOW/DAY-1DAY TO *]!

"facet_queries"	
  :	
  {	
  
	
  	
  "	
  {!label=‘Hot!’	
  }	
  pop:[1	
  TO	
  *]	
  ...	
  "	
  :	
  15	
  
}	
  
                                                        33
Performance




     34
facet.method
                                    !
name     facet.method description                 memory               CPU


enum     enum            Iterates over terms,     filter-per-term in   ~O(nTerms)
                         calculating set          the filterCache
                         intersections

field    fc (single-     Iterates over documents, Lucene               O(nDocs)
cache    valued field)   counting terms           FieldCache
                                                  Entry… int
                                                  [maxDoc]+terms
UnInvert fc (multi-      Iterates over documents, O(maxDoc * num       ~O(nDocs)
edField valued field)    counting terms           terms per doc)

Per-    fcs              Like field-cache, just   Lucene               O(nDocs)
segment                  better for NRT since     FieldCache           +O(nTerms)
field                    FieldCacheEntry is at    Entries… int
cache                    segment level.           [maxDoc]+terms

                                          35
facet.method=fc                                  Mem=int[maxDoc]

CPU=O(nDocs in
                      (single-valued field)                              +unique_values

   base set)
                             Documents
                             matching the
                             base query            Lucene FieldCache Entry
                              Juggernaut           (StringIndex) for the hero
 q=Juggernaut                                      field
                                 0                order: for each
 &facet=true                     2     lookup     doc, an index into   lookup: the
 &facet.field=hero                                the lookup array
                                                                       string values
                                 7
                                                         5                (null)
                                                         3              batman
                              accumulator
                                                         5                flash
                                 0
                                                         1             spiderman
                                 1
                                                         4             superman
            Priority queue       0    increment
               flash, 5
                                                         5             wolverine
                                 0
             Batman, 3                                   2
                                 0
                                                         1
                                 2
facet.method=fcs (trunk)

          (per-segment single-valued)
                  Segment1             Segment2           Segment3               Segment4
                  FieldCache           FieldCache         FieldCache             FieldCache
                     Entry                Entry              Entry                  Entry

                        accumulator1      accumulator2           accumulator3      accumulator4
                  inc
         lookup         0                 0                   1                       0
                        3                 2                   3                       1
         0
Base                    5                 1                   0                       0
DocSet   2
                        0                 0                   4
         7                                                                         thread4
                        1              thread2            thread3
                        2
                    thread1                                                     Priority queue
                                              FieldCache +
                                                                                   flash, 5
                                              accumulator                        Batman, 3
                                              merger
                                              (Priority queue)
facet.method=fcs!
l  Controllable
           multi-threading
   facet.method=fcs!
   facet.field={!threads=4}myfield!
   	
  
l  Disadvantages
   l     Larger memory use (FieldCaches + accumulators)
   l     Slower (extra FieldCache merge step needed) – O(nTerms)
l  Advantages
   l     Rebuilds FieldCache entries only for new segments (NRT friendly)
   l     Multi-threaded
Per-segment faceting performance
                 comparison
Test index: 10M documents, 18 segments, single valued field

    Base DocSet=100 docs, facet.field on a field with 100,000 unique terms
A   Time for request*           facet.method=fc        facet.method=fcs
    static index                3 ms                   244 ms
    quickly changing index      1388 ms                267 ms



    Base DocSet=1,000,000 docs, facet.field on a field with 100 unique terms

B   Time for request*          facet.method=fc          facet.method=fcs
    static index               26 ms                    34 ms
    quickly changing index     741 ms                   94 ms

                   *complete request time, measured externally
facet.method=fc 

                (multi-valued field)
§  UnInvertedField - like single-valued FieldCache algorithm, but
    with multi-valued FieldCache
§  Good for many unique terms, relatively few values per doc
   •  Best case: 50x faster, 5x smaller than “enum” (100K unique values,
      1-5 per doc)
   •  O(n_docs), but optimization to count the inverse when n>maxDoc/2
§  Memory efficient
   •  Terms in a document are delta coded variable width ords (vints)
   •  Ord list for document packed in an int or in a shared byte[]
   •  Hybrid approach: “big terms” that match >5% of index use
      filterCache instead
   •  Only 1/128th of string values in memory
facet.method=fc
              fieldValueCache
§  Implicit cache with UnInvertedField entries
   •  Not autowarmed – use static warming request
   •  http://localhost:8983/solr/admin/stats.jsp (mem size,
      time to create, etc)
Faceting: fieldValueCache
§  Implicit cache with UnInvertedField entries
   •  Not autowarmed – use static warming request
   •  http://localhost:8983/solr/admin/stats.jsp (mem size,
      time to create, etc)


item_cat:
{field=cat,memSize=5376,tindexSize=52,time=2,phase1=2,
nTerms=16,bigTerms=10,termInstances=6,uses=44}
Multi-valued faceting:
               facet.method=enum
§  facet.method=enum
§  For each term in field:
   •  Retrieve filter                                     Solr filterCache (in memory)
   •  Calculate intersection size                         hero:batman     hero:flash

         Lucene Inverted    Docs matching
         Index (on disk)    base query                        1             0
                                         intersection
                                 0                            3             1
      batman      1 3 5 8                   count
                                 1                            5             5
      flash       0 1 5          5                            8
      spiderman   2 4 7          9       Priority queue
                                          batman=2
      superman    0 6 9
      wolverine   1 2 7 8
facet.method=enum
                              !

§    O(n_terms_in_field)
§    Short circuits based on term.df
§    filterCache entries int[ndocs] or BitSet(maxDoc)
§    Size filterCache appropriately
      •  Either autowarm filterCache, or use static warming
         queries (via newSearcher event) in solrconfig.xml
§  facet.enum.cache.minDf - prevent filterCache use
    for small terms
      •  Also useful for huge index w/ no time constraints
Q&A




 45

More Related Content

What's hot

Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitFlink Forward
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...HostedbyConfluent
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Eric Sun
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkIlya Ganelin
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst PracticesKonstantin Knauf
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiDatabricks
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 

What's hot (20)

Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst Practices
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiA Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 

Viewers also liked

Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Yonik Seeley
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solrlucenerevolution
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Lucidworks
 
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)Issei Nishigata
 
Technology for reduce of mistakes - うっかりをなくす技術
Technology for reduce of mistakes - うっかりをなくす技術Technology for reduce of mistakes - うっかりをなくす技術
Technology for reduce of mistakes - うっかりをなくす技術karupanerura
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-WebinarEdureka!
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 

Viewers also liked (12)

Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solr
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
 
Technology for reduce of mistakes - うっかりをなくす技術
Technology for reduce of mistakes - うっかりをなくす技術Technology for reduce of mistakes - うっかりをなくす技術
Technology for reduce of mistakes - うっかりをなくす技術
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 

Similar to The Many Facets of Apache Solr - Yonik Seeley

Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchPedro Franceschi
 
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Codemotion
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchSperasoft
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampAlexei Gorobets
 
Extreme Swift
Extreme SwiftExtreme Swift
Extreme SwiftMovel
 
Creating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading StrategiesCreating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading StrategiesMongoDB
 
visualisasi data praktik pakai excel, py
visualisasi data praktik pakai excel, pyvisualisasi data praktik pakai excel, py
visualisasi data praktik pakai excel, pyElmaLyrics
 
What's Coming Next in Sencha Frameworks
What's Coming Next in Sencha FrameworksWhat's Coming Next in Sencha Frameworks
What's Coming Next in Sencha FrameworksGrgur Grisogono
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationMongoDB
 
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018Codemotion
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-stepsMatteo Moci
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrlucenerevolution
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practiceJano Suchal
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutesDavid Pilato
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 MinutesKarel Minarik
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0Keshav Murthy
 

Similar to The Many Facets of Apache Solr - Yonik Seeley (20)

Solr 3.1 and beyond
Solr 3.1 and beyondSolr 3.1 and beyond
Solr 3.1 and beyond
 
Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearch
 
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
 
Extreme Swift
Extreme SwiftExtreme Swift
Extreme Swift
 
Creating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading StrategiesCreating a Single View: Data Design and Loading Strategies
Creating a Single View: Data Design and Loading Strategies
 
Hidden Gems of Ruby 1.9
Hidden Gems of Ruby 1.9Hidden Gems of Ruby 1.9
Hidden Gems of Ruby 1.9
 
visualisasi data praktik pakai excel, py
visualisasi data praktik pakai excel, pyvisualisasi data praktik pakai excel, py
visualisasi data praktik pakai excel, py
 
What's Coming Next in Sencha Frameworks
What's Coming Next in Sencha FrameworksWhat's Coming Next in Sencha Frameworks
What's Coming Next in Sencha Frameworks
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
 
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practice
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

The Many Facets of Apache Solr - Yonik Seeley

  • 1. The Many Facets of Apache Solr Yonik Seeley, Lucid Imagination yonik@lucidimagination.com, Oct 20 2011
  • 2. What I Will Cover §  What is Faceted Search §  Solr’s Faceted Search §  Tips & Tricks §  Performance & Algorithms 3
  • 3. My Background §  Creator of Solr §  Co-founder of Lucid Imagination §  Expertise: Distributed Search systems and performance §  Lucene/Solr committer, a member of the Lucene PMC, member of the Apache Software Foundation §  Work: CNET Networks, BEA, Telcordia, among others §  M.S. in Computer Science, Stanford
  • 4. What is Faceted Search 5
  • 5. Faceted Search Example The facet count or constraint count Manufacturer is shows how many a facet, a way of results match each categorizing the value results Canon, Sony, and Nikon are constraints, or facet values The breadcrumb trail shows what constraints have already been applied and allows for their removal
  • 6. Key Elements of Faceted Search §  No hierarchy of options is enforced •  Users can apply facet constraints in any order •  Users can remove facet constraints in any order §  No surprises •  The user is only given facets and constraints that make sense in the context of the items they are looking at •  The user always knows what to expect before they apply a constraint §  Also known as guided navigation, faceted navigation, faceted browsing, parametric search 7
  • 8. Field Faceting §  Specifies a Field to be used as a Facet §  Uses each term indexed in that Field as a Constraint §  Field must be indexed §  Can be used multiple times for multiple fields q ! = iPhone! ! !fq = inStock:true! !facet = true! !facet.field = color! !facet.field = category! 9
  • 9. Field Faceting Response http://localhost:8983/solr/select?q=iPhone&fq=inStock:true! &facet=true&facet.field=color&facet.field=category! <lst name="facet_counts”>! <lst name="facet_fields">! <lst name="color">! <int name="red">17</int>! <int name="green">6</int>! <int name="blue">2</int>! <int name="yellow">2</int>! <lst name=”category">! <int name=“accessories”>16</int>! <int name=“electronics”>11</int>! 10
  • 10. Or if you prefer JSON… http://localhost:8983/solr/select?q=iPhone&fq=inStock:true! &facet=true&facet.field=color&facet.field=category&wt=json! "facet_counts":{! "facet_fields":{! "color":[! "red",17,! "green",6,! "blue",2, ! "yellow",2]! "category":[! "accessories",16,! "electronics",11]! 11
  • 11. Applying Constraints Assume the user clicked on “red”… Simply add another filter query to apply that constraint http://localhost:8983/solr/select? q=iPhone&fq=inStock:true&fq=color:red&facet=true& facet.field=color&facet.field=category! Remove redundant facet.field (assuming single valued field) 12
  • 12. facet.field Options §  facet.prefix - Restricts the possible constraints to only indexed values with a specified prefix. §  facet.mincount=0 - Restricts the constraints returned to those containing a minimum number of documents in the result set. §  facet.sort=count - The ordering of constraints: count or index §  facet.offset=0 - Indicates how many constraints in the specified sort ordering should be skipped in the response. §  facet.limit=100 - The number of constraints to return §  facet.missing=false – Return the number of docs with no value in the field 13
  • 13. facet.query ! §  Specifies a query string to be used as a Facet Constraint §  Typically used multiple times to get multiple (discrete) sets §  Any type of query supported !facet.query = rank:[* TO 20]! !facet.query = rank:[21 TO *]! 14
  • 14. facet.query Results <result numFound="27" ... />! ...! <lst name="facet_counts">! <lst name="facet_queries">! <int name="rank:[* TO 20]">2</int>! <int name="rank:[21 TO *]">15</int>! </lst>! ...!
  • 15. The lat,lon center point to Spatial faceting search from !q=*:*&facet=true! !pt=45.15,-93.85! Name of the field containing lat+lon data !sfield=store! !facet.query={!geofilt d=5}! !facet.query={!geofilt d=10}! geospatial query type "facet_counts":{! "facet_queries":{! "{!geofilt d=5}":3,! "{!geofilt d=10}":6},! 16
  • 16. Range Faceting "facet_counts":{ §  Simpler than a sequence "facet_ranges":{ of facet.query params "price":{ "counts”:[ "0.0”,5, http://...&facet=true "50.0”,2, &facet.range=price "100.0”,0, "150.0”,2, &facet.range.start=0 "200.0”,0, &facet.range.end=500 "250.0”,1, "300.0”,2, &facet.range.gap=50 "350.0”,2, "400.0”,0, "450.0”,1], "gap":50.0, "start":0.0, "end":500.0}}}}
  • 17. Date Faceting l  facet.date is deprecated, use facet.range on a date field now l  Creates Constraints based on evenly sized date ranges using the Gregorian Calendar l  Ranges are specified using "Date Math" so they DWIM in spite of variable length months and leap years facet.range = pubdate! facet.range.start = NOW/YEAR-1YEAR! facet.range.end = NOW/MONTH+1MONTH! facet.range.gap = +1MONTH!
  • 18. Date Faceting Results "facet_counts":{! "facet_ranges":{! ”pubdate":{! "counts":[! "2010-01-01T00:00:00Z",4,! "2010-02-01T00:00:00Z",6,! "2010-03-01T00:00:00Z",0,! "2010-04-01T00:00:00Z",13,! […]! "2011-09-01T00:00:00Z",5,! "2011-10-01T00:00:00Z",2],! "gap":"+1MONTH",! "start":"2010-01-01T00:00:00Z",! "end":"2011-11-01T00:00:00Z”}}}!
  • 19. Range Faceting Options l  facet.range.hardend=false - Determines what effective end value is used when the specified "start" and "end" don't divide into even "gap" sized buckets; false means the last Constraint range may be shorter then the others l  facet.range.other=none - Allows you to specify what other Constraints you are interested in besides the generated ranges: before, after, between, none, all l  facet.range.include=lower – Specifies what bounds are inclusive vs exclusive: lower, upper, edge, outer, all
  • 20. Pivot Faceting (trunk) l  Computes a Matrix of Constraint Counts across multiple Facet Fields l  Syntax: facet.pivot=field1,field2,field3,… facet.pivot=cat,inStock #docs #docs w/ #docs w/ inStock:true instock:false cat:electronics 14 10 4 cat:memory 3 3 0 cat:connector 2 0 2 cat:graphics card 2 0 2 cat:hard drive 2 2 0
  • 21. Pivot Faceting http://...&facet=true&facet.pivot=cat,popularity "facet_counts":{ (continued) "facet_pivot":{ "cat,popularity":[{ { "field":"cat", "field":"popularity", 14 docs w/ "value":"electronics", "value”:1, cat==electronics "count":14, "count":2}]}, "pivot":[{ { 5 docs w/ "field":"popularity", "field":"cat", cat==electronics "value":6, "value":"memory", && popularity==6 "count":5}, "count":3, { "pivot":[]}, "field":"popularity", "value":7, […] "count":4},
  • 23. term QParser l  Default Query Parser does special things with whitespace and punctuation l  Problematic when "filtering" on Facet Field Constraints that contain whitespace, punctuation, or other reserved characters. l  Use the term parser to filter on an exact Term fq = {!term f=category}Books & Magazines OR fq = {!term f=category v=$t} t = Books & Magazines
  • 24. Taxonomy Facets l  What If Your Documents Are Organized in a Taxonomy?
  • 25. Taxonomy Facets: Data l  Flattened Data !Doc#1: NonFic > Law! !Doc#2: NonFic > Sci! !Doc#3: NonFic > Hist! NonFic > Sci > Phys! l  Indexed Terms (prepend number of nodes in path segment) Doc#1: 1/NonFic, 2/NonFic/Law! Doc#2: 1/NonFic, 2/NonFic/Sci! Doc#3: 1/NonFic, 2/NonFic/Hist, ! 2/NonFic/Sci, 3/NonFic/Sci/Phys!
  • 26. Taxonomy Facets: Initial Query facet.field = category! facet.prefix = 2/NonFic! facet.mincount = 1! <result numFound="164" ...! <lst name="facet_fields">! <lst name="category">! <int name="2/NonFic/Sci">2</int>! <int name="2/NonFic/Hist">1</int>! <int name="2/NonFic/Law">1</int>!
  • 27. Taxonomy Facets: Drill Down fq = {!term f=category}2/NonFic/Sci! facet.field = category! facet.prefix = 3/NonFic/Sci! facet.mincount = 1! <result numFound="2" ...! <lst name="facet_fields">! <lst name="category">! <int name=”3/NonFic/Sci/Phys">1</int>!
  • 28. Multi-Select Faceting http://search.lucidimagination.com §  Very  generic  support   •  Reuses  localParams  syntax  {!name=val}   •  Ability  to  tag  filters   •  Ability  to  exclude  certain  filters  when   faceCng,  by  tag      q=index  replicaCon    facet=true    fq={!tag=pr}project:(lucene  OR  solr)    facet.field={!ex=pr}project    facet.field={!ex=src}source   29  
  • 29. Same Facet, Different Exclusions §  A key can be specified for a facet to change the name used to identify it in the response. q = Hot Rod! fq = {!df=colors tag=cx}purple green! facet.field = {!key=all_colors ex=cx}colors! facet.field = {!key=overlap_colors}colors! "facet_counts":{! ”overlap_colors":[! "facet_fields":{! "red",7,! ”all_colors":[! "green",6,! "red",19,! "blue”,1]! "green",6,! }! "blue",2],! }! 30
  • 30. “Pretty” facet.field Terms §  Field Faceting uses Indexed Terms §  Leverage copyField and TokenFilters that will give you good looking Constraints <tokenizer  class="solr.PaPernTokenizerFactory"                          paPern="(,|;)s*"  />   <filter  class="solr.PaPernReplaceFilterFactory"                    paPern="s+"  replacement="  "  />   <filter  class="solr.PaPernReplaceFilterFactory"                    paPern="  and  "  replacement="  &amp;  "  />   <filter  class="solr.TrimFilterFactory"  />   <filter  class="solr.CapitalizaConFilterFactory"                    onlyFirstWord="false"  />   31
  • 31. “Pretty” facet.field Results {“id” : “MyExampleDoc”,! "category” : ” books ! and magazines; ! computers, “! }! copyField in schema <copyField “source”=“category” “dest”=“category_pretty”/> "facet_counts":{! "facet_fields":{! "category_pretty":[! "Books & Magazines",1,! "Computers",1]! 32
  • 32. facet.field Labels §  facet.query params are echoed verbatim when returning the constraint counts §  Optionally, one can declare a facet.query in solrconfig.xml and include a "label" that the presentation layer can parse out for display. “label” has no meaning to solr facet.query = {!label=‘Hot!’  }! +pop:[1 TO *] ! +pub_date:[NOW/DAY-1DAY TO *]! "facet_queries"  :  {      "  {!label=‘Hot!’  }  pop:[1  TO  *]  ...  "  :  15   }   33
  • 34. facet.method ! name facet.method description memory CPU enum enum Iterates over terms, filter-per-term in ~O(nTerms) calculating set the filterCache intersections field fc (single- Iterates over documents, Lucene O(nDocs) cache valued field) counting terms FieldCache Entry… int [maxDoc]+terms UnInvert fc (multi- Iterates over documents, O(maxDoc * num ~O(nDocs) edField valued field) counting terms terms per doc) Per- fcs Like field-cache, just Lucene O(nDocs) segment better for NRT since FieldCache +O(nTerms) field FieldCacheEntry is at Entries… int cache segment level. [maxDoc]+terms 35
  • 35. facet.method=fc Mem=int[maxDoc] CPU=O(nDocs in (single-valued field) +unique_values base set) Documents matching the base query Lucene FieldCache Entry Juggernaut (StringIndex) for the hero q=Juggernaut field 0 order: for each &facet=true 2 lookup doc, an index into lookup: the &facet.field=hero the lookup array string values 7 5 (null) 3 batman accumulator 5 flash 0 1 spiderman 1 4 superman Priority queue 0 increment flash, 5 5 wolverine 0 Batman, 3 2 0 1 2
  • 36. facet.method=fcs (trunk)
 (per-segment single-valued) Segment1 Segment2 Segment3 Segment4 FieldCache FieldCache FieldCache FieldCache Entry Entry Entry Entry accumulator1 accumulator2 accumulator3 accumulator4 inc lookup 0 0 1 0 3 2 3 1 0 Base 5 1 0 0 DocSet 2 0 0 4 7 thread4 1 thread2 thread3 2 thread1 Priority queue FieldCache + flash, 5 accumulator Batman, 3 merger (Priority queue)
  • 37. facet.method=fcs! l  Controllable multi-threading facet.method=fcs! facet.field={!threads=4}myfield!   l  Disadvantages l  Larger memory use (FieldCaches + accumulators) l  Slower (extra FieldCache merge step needed) – O(nTerms) l  Advantages l  Rebuilds FieldCache entries only for new segments (NRT friendly) l  Multi-threaded
  • 38. Per-segment faceting performance comparison Test index: 10M documents, 18 segments, single valued field Base DocSet=100 docs, facet.field on a field with 100,000 unique terms A Time for request* facet.method=fc facet.method=fcs static index 3 ms 244 ms quickly changing index 1388 ms 267 ms Base DocSet=1,000,000 docs, facet.field on a field with 100 unique terms B Time for request* facet.method=fc facet.method=fcs static index 26 ms 34 ms quickly changing index 741 ms 94 ms *complete request time, measured externally
  • 39. facet.method=fc 
 (multi-valued field) §  UnInvertedField - like single-valued FieldCache algorithm, but with multi-valued FieldCache §  Good for many unique terms, relatively few values per doc •  Best case: 50x faster, 5x smaller than “enum” (100K unique values, 1-5 per doc) •  O(n_docs), but optimization to count the inverse when n>maxDoc/2 §  Memory efficient •  Terms in a document are delta coded variable width ords (vints) •  Ord list for document packed in an int or in a shared byte[] •  Hybrid approach: “big terms” that match >5% of index use filterCache instead •  Only 1/128th of string values in memory
  • 40. facet.method=fc fieldValueCache §  Implicit cache with UnInvertedField entries •  Not autowarmed – use static warming request •  http://localhost:8983/solr/admin/stats.jsp (mem size, time to create, etc)
  • 41. Faceting: fieldValueCache §  Implicit cache with UnInvertedField entries •  Not autowarmed – use static warming request •  http://localhost:8983/solr/admin/stats.jsp (mem size, time to create, etc) item_cat: {field=cat,memSize=5376,tindexSize=52,time=2,phase1=2, nTerms=16,bigTerms=10,termInstances=6,uses=44}
  • 42. Multi-valued faceting: facet.method=enum §  facet.method=enum §  For each term in field: •  Retrieve filter Solr filterCache (in memory) •  Calculate intersection size hero:batman hero:flash Lucene Inverted Docs matching Index (on disk) base query 1 0 intersection 0 3 1 batman 1 3 5 8 count 1 5 5 flash 0 1 5 5 8 spiderman 2 4 7 9 Priority queue batman=2 superman 0 6 9 wolverine 1 2 7 8
  • 43. facet.method=enum ! §  O(n_terms_in_field) §  Short circuits based on term.df §  filterCache entries int[ndocs] or BitSet(maxDoc) §  Size filterCache appropriately •  Either autowarm filterCache, or use static warming queries (via newSearcher event) in solrconfig.xml §  facet.enum.cache.minDf - prevent filterCache use for small terms •  Also useful for huge index w/ no time constraints