SlideShare a Scribd company logo
1 of 56
Download to read offline
Solr@
Things I’m not going to
     talk about:
     A/B Testing
        i18n
Continuos Deployment
About
 Us
10+ Million Listings
     500 qps
Architecture
 Overview
Architecture Overview
Thrift
Architecture Overview
Thrift
      struct Listing {
         1: i64 listing_id
     }

     struct ListingResults {
         1: i64 count,
         2: list<Listing> listings
     }

     service Search {
         ListingResults search(1:string query)
     }
Architecture Overview
Thrift
Generated Java server code:
 public class Search {

   public interface Iface {

     public ListingResults search(String query) throws TException;

     }


Generated PHP client code:
 class SearchClient implements SearchIf {

    /**...**/
    public function search($query)
    {
      $this->send_search($query);
      return $this->recv_search();
    }
Architecture Overview
Thrift
Why use Thrift?
    • Service Encapsulation
    • Reduced Network Traffic
Architecture Overview
Thrift
Why only return IDs?
    • Index Size
    • Easy to scale PK lookups
The Search Server
Architecture Overview
Search Server


 • Identical Code + Hardware
 • Roles/Behavior controlled by Env variables
 • Single Java Process
 • Solr running as a Jetty Servlet
 • Thrift Servers
 • Smoker
Architecture Overview
Search Server




Master-specific processes:
 • Incremental Indexer
 • External File Field Updaters
Load Balancing
Load Balancing
Thrift TSocketPool
Load Balancing
Thrift TSocketPool
Load Balancing
Thrift TSocketPool
Load Balancing
Server Affinity
Load Balancing
    Server Affinity Algorithm
$serversNew = array();
                                              [“host2”, “host3”, “host1”, “host4”]
$numServers = count($servers);

while($numServers > 0) {
   // Take the first 4 chars of the md5sum of the server count
   // and the query, mod the available servers
   $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%($numServers);
   $keySet = array_keys($servers);
   $serverId = $keySet[$key];

    // Push the chosen server onto the new list and remove it
    // from the initial list
    array_push($serversNew, $servers[$serverId]);
    unset($servers[$serverId]);
    --$numServers;
}
Load Balancing
Server Affinity Algorithm
              $key = hexdec(substr(md5($query),0,4))




  “jewelry”                 [“host2”, “host3”, “host1”, “host4”]

  “scarf”                   [“host2”, “host3”, “host1”, “host4”]
Load Balancing
Server Affinity Algorithm
      $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%(count($servers));




  “jewelry”                            [“host2”, “host3”, “host1”, “host4”]

  “scarf”                              [“host2”, “host1”, “host4”, “host3”]
Load Balancing
Server Affinity Results


      2%        20%
Load Balancing
Server Affinity Caveats

     • Stemming / Analysis
     • Be wary of query distribution
Replication
Replication
The Problem
Replication
The Problem
Replication
Multicast Rsync?
Replication
Multicast Rsync?
[15:25]  <engineer> patrick: i'm gonna test multi-rsyncing some indexes
from host1 to host2 and host3 in prod. I'll be watching the graphs and
what not, but let me know if you see anything funky with the network
[15:26]  <patrick> ok
....

[15:31]  <keyur> is the site down?
Replication
Multicast Rsync?
Hmm...Bit Torrent?
Replication
Bit Torrent POC
Using BitTornado:
Replication
Bit Torrent + Solr
Fork of TTorent: https://github.com/etsy/ttorrent

                            Multi-File Support
                       Performance Enhancements
Replication
Bit Torrent + Solr
Replication
Bit Torrent + Solr
Replication
Bit Torrent + Solr
Replication
Bit Torrent + Solr
Solr InterOp
QParsers
“writing query strings
   is for suckers”
Solr InterOp
QParsers

  http://host:8393/solr/person/select/?q=_query_:%22{!dismax
  %20qf=$fqf%20v=$fnq}%22%20OR%20(_query_:%22{!dismax%20qf=$fiqf
  %20v=$fiq}%22%20AND%20(_query_:%22{!dismax%20qf=$lwqf%20v=$lwq}
  %22%20OR%20_query_:%22{!dismax%20qf=$lqf%20v=$lq}%20%22))&fnq=
  %22giovanni%20fernandez-kincade
 %22&fqf=full_name^4&fiq=giovanni&fiqf=first_name^2.0%20first_name_s
 yn&qt=standard&lwq=fernandez-kincade*&lwqf=last_name&lq=fernandez-
 kincade&lqf=last_name^3
Solr InterOp
QParsers

   http://host:8393/solr/person/select/?q={!personrealqp}giovanni
  %20fernandez-kincade
Solr InterOp
QParsers

 class PersonNameRealQParser extends QParser {
   public PersonNameRealQParser(String qstr, SolrParams localParams,
       SolrParams params, SolrQueryRequest req) {
     super(qstr, localParams, params, req);
   }
Solr InterOp
   QParsers
  @Override
  public Query parse() throws ParseException {
    TermQuery exactFullNameQuery = new TermQuery(new Term("full_name", qstr));
    exactFullNameQuery.setBoost(4.0f);

    String[] userQueryTerms = qstr.split("s+");
    Query firstLastQuery = null;

    if (2 == userQueryTerms.length)
      firstLastQuery = parseAsFirstAndLast(userQueryTerms[0], userQueryTerms[1]);
    else
      firstLastQuery = parseAsFirstOrLast(userQueryTerms);

    DisjunctionMaxQuery realNameQuery = new DisjunctionMaxQuery(0);
    realNameQuery.add(exactFullNameQuery);
    realNameQuery.add(firstLastQuery);

    return realNameQuery;
  }
Solr InterOp
QParsers
The QParserPlugin that returns our new QParser:
  public class PersonNameRealQParserPlugin extends QParserPlugin {
   public static final String NAME = "personrealqp";

   @Override
   public void init(NamedList args) {}

   @Override
   public QParser createParser(String qstr, SolrParams localParams,
       SolrParams params, SolrQueryRequest req) {
     return new PersonNameRealQParser(qstr, localParams, params, req);
   }
 }
Solr InterOp
QParsers

Registering the plugin in solrconfig.xml:

   <queryParser name="personrealqp"
      class="com.etsy.person.solr.PersonNameRealQParserPlugin" />
Custom Stemmer
Solr InterOp
Custom Stemmer
Solr InterOp
Custom Stemmer

 banded, banding, birding, bouldering, bounded, buffing, bundler, canning,
carded, circled, coupler, dangler, doubler, firring, foiling, hooper, japanned,
lipped, napped, papered, pebbled, pitted, pocketed, reductive, ricer, rooter,
roper, seeded, shouldered, silvered, skinning, spindling, staining, stitcher,
                      strapped, threaded, yellowing
Solr InterOp
Custom Stemmer
First we extend KStemmer and intercept stem calls:
  public class LStemmer extends KStemmer {

     /**.....**/

      @Override
      String stem(String term) {
          String override = overrideStemTransformations.get(term);
          if(override != null) return override;
          return super.stem(term);
      }
  }
Solr InterOp
 Custom Stemmer
Then create a TokenFilter that uses the new Stemmer:
 final class LStemFilter extends TokenFilter {

   /**.....**/        
   protected LStemFilter(TokenStream input, int cacheSize) {
     super(input);
     stemmer = new LStemmer(cacheSize);
   }
        
   @Override
   public boolean incrementToken() throws IOException {
     /**....**/
   }
Solr InterOp
Custom Stemmer
Create a FilterFactory that exposes it:
       public class LStemFilterFactory extends BaseTokenFilterFactory {
        private int cacheSize = 20000;
        
        @Override
        public void init(Map<String, String> args) {
          super.init(args);
         String cacheSizeStr = args.get("cacheSize");
         if (cacheSizeStr != null) {
          cacheSize = Integer.parseInt(cacheSizeStr);
         }
       }
        
        @Override
       public TokenStream create(TokenStream in) {
        return new LStemFilter(in, cacheSize);
       }
     }
Solr InterOp
Custom Stemmer
And finally plug it into your analysis chain:

 <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
       words="solr/common/conf/stopwords.txt"/>
    <filter class="com.etsy.solr.analysis.LStemFilterFactory" />
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer>
Thanks!

More Related Content

Viewers also liked

Viewers also liked (6)

B2B, B2C, B2B2C, marketplace: tutte le sfumature dell’eCommerce per creare va...
B2B, B2C, B2B2C, marketplace: tutte le sfumature dell’eCommerce per creare va...B2B, B2C, B2B2C, marketplace: tutte le sfumature dell’eCommerce per creare va...
B2B, B2C, B2B2C, marketplace: tutte le sfumature dell’eCommerce per creare va...
 
Lazada E-commerce Statistics Across Southeast Asia
Lazada E-commerce Statistics Across Southeast AsiaLazada E-commerce Statistics Across Southeast Asia
Lazada E-commerce Statistics Across Southeast Asia
 
Lazada vs Matahari Mall Business Models
Lazada vs Matahari Mall Business ModelsLazada vs Matahari Mall Business Models
Lazada vs Matahari Mall Business Models
 
Introduction to Search Systems - ScaleConf Colombia 2017
Introduction to Search Systems - ScaleConf Colombia 2017Introduction to Search Systems - ScaleConf Colombia 2017
Introduction to Search Systems - ScaleConf Colombia 2017
 
Etsy Case Study
Etsy Case StudyEtsy Case Study
Etsy Case Study
 
Sap sales funnel examples
Sap sales funnel examplesSap sales funnel examples
Sap sales funnel examples
 

More from lucenerevolution

Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
lucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Solr at Etsy - Giovanni Fernandez-Kincade

  • 2. Things I’m not going to talk about: A/B Testing i18n Continuos Deployment
  • 4.
  • 5.
  • 9. Architecture Overview Thrift struct Listing { 1: i64 listing_id } struct ListingResults { 1: i64 count, 2: list<Listing> listings } service Search { ListingResults search(1:string query) }
  • 10. Architecture Overview Thrift Generated Java server code: public class Search { public interface Iface { public ListingResults search(String query) throws TException; } Generated PHP client code: class SearchClient implements SearchIf { /**...**/ public function search($query) { $this->send_search($query); return $this->recv_search(); }
  • 11. Architecture Overview Thrift Why use Thrift? • Service Encapsulation • Reduced Network Traffic
  • 12. Architecture Overview Thrift Why only return IDs? • Index Size • Easy to scale PK lookups
  • 14. Architecture Overview Search Server • Identical Code + Hardware • Roles/Behavior controlled by Env variables • Single Java Process • Solr running as a Jetty Servlet • Thrift Servers • Smoker
  • 15. Architecture Overview Search Server Master-specific processes: • Incremental Indexer • External File Field Updaters
  • 21. Load Balancing Server Affinity Algorithm $serversNew = array(); [“host2”, “host3”, “host1”, “host4”] $numServers = count($servers); while($numServers > 0) { // Take the first 4 chars of the md5sum of the server count // and the query, mod the available servers $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%($numServers); $keySet = array_keys($servers); $serverId = $keySet[$key]; // Push the chosen server onto the new list and remove it // from the initial list array_push($serversNew, $servers[$serverId]); unset($servers[$serverId]); --$numServers; }
  • 22. Load Balancing Server Affinity Algorithm $key = hexdec(substr(md5($query),0,4)) “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host3”, “host1”, “host4”]
  • 23. Load Balancing Server Affinity Algorithm $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%(count($servers)); “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host1”, “host4”, “host3”]
  • 25. Load Balancing Server Affinity Caveats • Stemming / Analysis • Be wary of query distribution
  • 30. Replication Multicast Rsync? [15:25]  <engineer> patrick: i'm gonna test multi-rsyncing some indexes from host1 to host2 and host3 in prod. I'll be watching the graphs and what not, but let me know if you see anything funky with the network [15:26]  <patrick> ok .... [15:31]  <keyur> is the site down?
  • 34. Replication Bit Torrent + Solr Fork of TTorent: https://github.com/etsy/ttorrent Multi-File Support Performance Enhancements
  • 41. “writing query strings is for suckers”
  • 42.
  • 43. Solr InterOp QParsers http://host:8393/solr/person/select/?q=_query_:%22{!dismax %20qf=$fqf%20v=$fnq}%22%20OR%20(_query_:%22{!dismax%20qf=$fiqf %20v=$fiq}%22%20AND%20(_query_:%22{!dismax%20qf=$lwqf%20v=$lwq} %22%20OR%20_query_:%22{!dismax%20qf=$lqf%20v=$lq}%20%22))&fnq= %22giovanni%20fernandez-kincade %22&fqf=full_name^4&fiq=giovanni&fiqf=first_name^2.0%20first_name_s yn&qt=standard&lwq=fernandez-kincade*&lwqf=last_name&lq=fernandez- kincade&lqf=last_name^3
  • 44. Solr InterOp QParsers http://host:8393/solr/person/select/?q={!personrealqp}giovanni %20fernandez-kincade
  • 45. Solr InterOp QParsers class PersonNameRealQParser extends QParser {    public PersonNameRealQParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {      super(qstr, localParams, params, req);    }
  • 46. Solr InterOp QParsers @Override   public Query parse() throws ParseException { TermQuery exactFullNameQuery = new TermQuery(new Term("full_name", qstr));     exactFullNameQuery.setBoost(4.0f);     String[] userQueryTerms = qstr.split("s+");     Query firstLastQuery = null;     if (2 == userQueryTerms.length)       firstLastQuery = parseAsFirstAndLast(userQueryTerms[0], userQueryTerms[1]);     else       firstLastQuery = parseAsFirstOrLast(userQueryTerms);     DisjunctionMaxQuery realNameQuery = new DisjunctionMaxQuery(0);     realNameQuery.add(exactFullNameQuery);     realNameQuery.add(firstLastQuery);     return realNameQuery;   }
  • 47. Solr InterOp QParsers The QParserPlugin that returns our new QParser: public class PersonNameRealQParserPlugin extends QParserPlugin {    public static final String NAME = "personrealqp";    @Override    public void init(NamedList args) {}    @Override    public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {      return new PersonNameRealQParser(qstr, localParams, params, req);    } }
  • 48. Solr InterOp QParsers Registering the plugin in solrconfig.xml: <queryParser name="personrealqp" class="com.etsy.person.solr.PersonNameRealQParserPlugin" />
  • 51. Solr InterOp Custom Stemmer banded, banding, birding, bouldering, bounded, buffing, bundler, canning, carded, circled, coupler, dangler, doubler, firring, foiling, hooper, japanned, lipped, napped, papered, pebbled, pitted, pocketed, reductive, ricer, rooter, roper, seeded, shouldered, silvered, skinning, spindling, staining, stitcher, strapped, threaded, yellowing
  • 52. Solr InterOp Custom Stemmer First we extend KStemmer and intercept stem calls: public class LStemmer extends KStemmer { /**.....**/      @Override      String stem(String term) {          String override = overrideStemTransformations.get(term);          if(override != null) return override;          return super.stem(term);      } }
  • 53. Solr InterOp Custom Stemmer Then create a TokenFilter that uses the new Stemmer: final class LStemFilter extends TokenFilter { /**.....**/         protected LStemFilter(TokenStream input, int cacheSize) { super(input); stemmer = new LStemmer(cacheSize); }          @Override public boolean incrementToken() throws IOException { /**....**/ }
  • 54. Solr InterOp Custom Stemmer Create a FilterFactory that exposes it: public class LStemFilterFactory extends BaseTokenFilterFactory { private int cacheSize = 20000;      @Override public void init(Map<String, String> args) { super.init(args);      String cacheSizeStr = args.get("cacheSize");      if (cacheSizeStr != null) {       cacheSize = Integer.parseInt(cacheSizeStr);      }    }      @Override    public TokenStream create(TokenStream in) {     return new LStemFilter(in, cacheSize);    } }
  • 55. Solr InterOp Custom Stemmer And finally plug it into your analysis chain: <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="solr/common/conf/stopwords.txt"/> <filter class="com.etsy.solr.analysis.LStemFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer>