SlideShare a Scribd company logo
1 of 15
Download to read offline
INCORPORATING
PROBABILISTIC
RETRIEVAL
KNOWLEDGE INTO
TFIDF-BASED SEARCH
ENGINE
Alex Lin
Senior Architect
Intelligent Mining
alin at IntelligentMinining.com
Overview of Retrieval Models
  Boolean Retrieval
  Vector Space Model

  Probabilistic Model

  Language Model
Boolean Retrieval
  lincolnAND NOT (car AND automobile)
  The earliest model and still in use today

  The result is very easy to explain to users

  Highly efficient computationally

  The major drawback – lack of sophisticated
   ranking algorithm.
Vector Space Model
    Term2   Doc1


                   Doc2

                                                t
                   Query
                                            ∑d       ij   *qj
                                            j=1
                             Cos(Di ,Q) =
                                            t              t
                     Term3
                                            ∑ d * ∑q2
                                                    ij
                                                                 2
                                                                 j
                                            j=1            j=1




 Major flaws: It lacks guidance on the details of
                   €
 how weighting and ranking algorithms are
 related to relevance
Probabilistic Retrieval Model

             Relevant       P(R|D)

                                     Document




              Non-
             Relevant      P(NR|D)




                             P(D | R)P(R)
    Bayes’ Rule   P(R | D) =
                                P(D)



    €
Probabilistic Retrieval Model
                       P(D | R)P(R)               P(D | NR)P(NR)
          P(R | D) =                  P(NR | D) =
                          P(D)                          P(D)


          IfP(D | R)P(R) > P(D | NR)P(NR)
€                         €
          then classify D as relevant

    €
Estimate P(D|R) and P(D|NR)
  Define        D = (d1,d2 ,...,dt )
                                t
        then    P(D | R) = ∏ P(di | R)
                                i=1
                                t

    €          P(D | NR) = ∏ P(di | NR)
                                i=1


€
        Binary Independence Model
€        term independence + binary features in documents
Likelihood Ratio
      Likelihood   ratio:
           P(D | R)   P(NR)
                    >
          P(D | NR)    P(R)
                                si: in non-relevant set, the probability of term i occurring
                                pi: in relevant set, the probability of term i occurring

           P(D | R)          pi          1− pi           pi (1− si )
                    =∏ ⋅ ∏                     = ∑ log
€         P(D | NR) i:d i =1 si i:d i = 0 1− si i:d i =1 si (1− pi )
                                               (ri + 0.5) /(R − ri + 0.5)
                      = ∑ log
                       i:d i = q i =1 (n i − ri + 0.5) /(N − n i − R + ri + 0.5)
€
                             N: total number of Non-relevant documents
                             ni: number of non-relevant documents that contain a term
                             ri: number of relevant documents that contain a term
                             R: total number of Relevant documents
          €
Combine with BM25 Ranking
    Algorithm
      BM25   extends the scoring function for the binary
       independence model to include document and
       query term weight.
      It performs very well in TREC experiments


                              (ri + 0.5) /(R − ri + 0.5)        (k + 1) f i (k 2 + 1)qf i
    R(q,D) = ∑ log                                             ⋅ i         ⋅
            i∈Q
                     (n i − ri + 0.5) /(N − n i − R + ri + 0.5) K + f i      k 2 + qf i

                                                                                         dl
                                                                 K = k1 ((1− b) + b ⋅         )
                                                                                        avgdl
€
                                k1 k2 b: tuning parameters
                                dl: document length
                                avgdl: average document length in data set
                                                  €
                                qf: term frequency in query terms
Weighted Fields Boolean Search
 doc-id       field0     field1                     …   text
   1
   2
   3
   …
   n


                   R(q,D) = ∑    ∑w        f   mi
                          i∈q f ∈ fileds




          €
Apply Probabilistic Knowledge
into Fields
           Higher     gradient         Lower

 doc-id   field0      field1           …       Text
   1
          Lightyear    Buzz
   2
   3
   …
   n



          Relevant


                          P(R|D)


                                   Document
           Non-
          Relevant    P(NR|D)
Use the Knowledge during Ranking
     doc-id         field0      field1    …           Text
       1
                    Lightyear    Buzz
       2
       3
       …
       n



      The    goal is:
                                    t
                         t
      P(D | R) = ∏ P(di | R) = ∑ log(P(di | R)) ≈ ∑ ∑ w f mi
                         i=1
                                   i=1           i∈q f ∈F



                                                    Learnable

€
Comparison of Approaches
                                      f ik              N
    RTF −IDF = tf ik ⋅ idf i =    t
                                                  ⋅ log
                                                        nk
                                 ∑f          ij
                                 j=1

                   (k1 + 1) f i (k2 + 1)qf i                                          dl
    Rbm 25 (q,D) =             ⋅                              K = k1 ((1− b) + b ⋅         )
                    K + fi       k 2 + qf i                                          avgdl
€                                  (ri + 0.5) /(R − ri + 0.5)        (k + 1) f i (k 2 + 1)qf i
    R(q,D) = ∑ log                                                  ⋅ 1         ⋅
               i∈Q
                          (n i − ri + 0.5) /(N − n i − R + ri + 0.5) K + f i      k 2 + qf i
€                                               €
                                                              IDF                      TF


€                                (k1 + 1) f i (k 2 + 1)qf i
    R(q,D) = ∑ ∑ w f mi ⋅                    ⋅
               i∈q f ∈F           K + fi       k 2 + qf i

                          IDF                           TF

€
Other Considerations
  Thisis not a formal model
  Require user relevance feedback (search log)

  Harder to handle real-time search queries

  How to Prevent Love/Hate attacks
Thank you

More Related Content

What's hot

Text data mining1
Text data mining1Text data mining1
Text data mining1
KU Leuven
 

What's hot (20)

Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Introduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & Models
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Signature files
Signature filesSignature files
Signature files
 
Multimedia Information Retrieval
Multimedia Information RetrievalMultimedia Information Retrieval
Multimedia Information Retrieval
 
Information retrieval 8 term weighting
Information retrieval 8 term weightingInformation retrieval 8 term weighting
Information retrieval 8 term weighting
 
Resource description framework
Resource description frameworkResource description framework
Resource description framework
 
Vector space model in information retrieval
Vector space model in information retrievalVector space model in information retrieval
Vector space model in information retrieval
 
Parallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval SystemParallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval System
 
Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction)
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
 
Digital library
Digital libraryDigital library
Digital library
 
Web ontology language (owl)
Web ontology language (owl)Web ontology language (owl)
Web ontology language (owl)
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval Models
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 

Viewers also liked

Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 
Document similarity with vector space model
Document similarity with vector space modelDocument similarity with vector space model
Document similarity with vector space model
dalal404
 

Viewers also liked (16)

Lec 4,5
Lec 4,5Lec 4,5
Lec 4,5
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
Document similarity with vector space model
Document similarity with vector space modelDocument similarity with vector space model
Document similarity with vector space model
 
Ir models
Ir modelsIr models
Ir models
 
Search: Probabilistic Information Retrieval
Search: Probabilistic Information RetrievalSearch: Probabilistic Information Retrieval
Search: Probabilistic Information Retrieval
 
Research IT at the University of Bristol
Research IT at the University of BristolResearch IT at the University of Bristol
Research IT at the University of Bristol
 
SubSift: a novel application of the vector space model to support the academi...
SubSift: a novel application of the vector space model to support the academi...SubSift: a novel application of the vector space model to support the academi...
SubSift: a novel application of the vector space model to support the academi...
 
Probabilistic Information Retrieval
Probabilistic Information RetrievalProbabilistic Information Retrieval
Probabilistic Information Retrieval
 
SAX-VSM
SAX-VSMSAX-VSM
SAX-VSM
 
Ir 08
Ir   08Ir   08
Ir 08
 
Fuzzy Logic ppt
Fuzzy Logic pptFuzzy Logic ppt
Fuzzy Logic ppt
 
similarity measure
similarity measure similarity measure
similarity measure
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
Genetic Algorithm by Example
Genetic Algorithm by ExampleGenetic Algorithm by Example
Genetic Algorithm by Example
 
Extending BM25 with multiple query operators
Extending BM25 with multiple query operatorsExtending BM25 with multiple query operators
Extending BM25 with multiple query operators
 

Similar to Probabilistic Retrieval

CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
zukun
 
Lecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsLecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhs
Stéphane Canu
 
Data Exchange over RDF
Data Exchange over RDFData Exchange over RDF
Data Exchange over RDF
net2-project
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01
Deb Roy
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2
Robin Ryder
 
A note on arithmetic progressions in sets of integers
A note on arithmetic progressions in sets of integersA note on arithmetic progressions in sets of integers
A note on arithmetic progressions in sets of integers
Lukas Nabergall
 
Engr 371 final exam april 2010
Engr 371 final exam april 2010Engr 371 final exam april 2010
Engr 371 final exam april 2010
amnesiann
 
Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11
Traian Rebedea
 

Similar to Probabilistic Retrieval (20)

Probabilistic Retrieval TFIDF
Probabilistic Retrieval TFIDFProbabilistic Retrieval TFIDF
Probabilistic Retrieval TFIDF
 
Ml4nlp04 1
Ml4nlp04 1Ml4nlp04 1
Ml4nlp04 1
 
Inductive Triple Graphs: A purely functional approach to represent RDF
Inductive Triple Graphs: A purely functional approach to represent RDFInductive Triple Graphs: A purely functional approach to represent RDF
Inductive Triple Graphs: A purely functional approach to represent RDF
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: MixturesCVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
 
Newfile6
Newfile6Newfile6
Newfile6
 
Class 18: Measuring Cost
Class 18: Measuring CostClass 18: Measuring Cost
Class 18: Measuring Cost
 
Analysis of algo
Analysis of algoAnalysis of algo
Analysis of algo
 
Lista exercintegrais
Lista exercintegraisLista exercintegrais
Lista exercintegrais
 
Lecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsLecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhs
 
Data Exchange over RDF
Data Exchange over RDFData Exchange over RDF
Data Exchange over RDF
 
Volume and edge skeleton computation in high dimensions
Volume and edge skeleton computation in high dimensionsVolume and edge skeleton computation in high dimensions
Volume and edge skeleton computation in high dimensions
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2
 
Problem
ProblemProblem
Problem
 
Scope Graphs: A fresh look at name binding in programming languages
Scope Graphs: A fresh look at name binding in programming languagesScope Graphs: A fresh look at name binding in programming languages
Scope Graphs: A fresh look at name binding in programming languages
 
S 7
S 7S 7
S 7
 
A note on arithmetic progressions in sets of integers
A note on arithmetic progressions in sets of integersA note on arithmetic progressions in sets of integers
A note on arithmetic progressions in sets of integers
 
Engr 371 final exam april 2010
Engr 371 final exam april 2010Engr 371 final exam april 2010
Engr 371 final exam april 2010
 
Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11
 
Codes and Isogenies
Codes and IsogeniesCodes and Isogenies
Codes and Isogenies
 

More from otisg

UIMA
UIMAUIMA
UIMA
otisg
 

More from otisg (6)

Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)
 
Lucandra
LucandraLucandra
Lucandra
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Lucene
 
UIMA
UIMAUIMA
UIMA
 
Faceted Search and Solr
Faceted Search and SolrFaceted Search and Solr
Faceted Search and Solr
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Probabilistic Retrieval

  • 1. INCORPORATING PROBABILISTIC RETRIEVAL KNOWLEDGE INTO TFIDF-BASED SEARCH ENGINE Alex Lin Senior Architect Intelligent Mining alin at IntelligentMinining.com
  • 2. Overview of Retrieval Models   Boolean Retrieval   Vector Space Model   Probabilistic Model   Language Model
  • 3. Boolean Retrieval   lincolnAND NOT (car AND automobile)   The earliest model and still in use today   The result is very easy to explain to users   Highly efficient computationally   The major drawback – lack of sophisticated ranking algorithm.
  • 4. Vector Space Model Term2 Doc1 Doc2 t Query ∑d ij *qj j=1 Cos(Di ,Q) = t t Term3 ∑ d * ∑q2 ij 2 j j=1 j=1 Major flaws: It lacks guidance on the details of € how weighting and ranking algorithms are related to relevance
  • 5. Probabilistic Retrieval Model Relevant P(R|D) Document Non- Relevant P(NR|D) P(D | R)P(R) Bayes’ Rule P(R | D) = P(D) €
  • 6. Probabilistic Retrieval Model P(D | R)P(R) P(D | NR)P(NR) P(R | D) = P(NR | D) = P(D) P(D)   IfP(D | R)P(R) > P(D | NR)P(NR) € € then classify D as relevant €
  • 7. Estimate P(D|R) and P(D|NR)   Define D = (d1,d2 ,...,dt ) t then P(D | R) = ∏ P(di | R) i=1 t € P(D | NR) = ∏ P(di | NR) i=1 €   Binary Independence Model € term independence + binary features in documents
  • 8. Likelihood Ratio   Likelihood ratio: P(D | R) P(NR) > P(D | NR) P(R) si: in non-relevant set, the probability of term i occurring pi: in relevant set, the probability of term i occurring P(D | R) pi 1− pi pi (1− si ) =∏ ⋅ ∏ = ∑ log € P(D | NR) i:d i =1 si i:d i = 0 1− si i:d i =1 si (1− pi ) (ri + 0.5) /(R − ri + 0.5) = ∑ log i:d i = q i =1 (n i − ri + 0.5) /(N − n i − R + ri + 0.5) € N: total number of Non-relevant documents ni: number of non-relevant documents that contain a term ri: number of relevant documents that contain a term R: total number of Relevant documents €
  • 9. Combine with BM25 Ranking Algorithm   BM25 extends the scoring function for the binary independence model to include document and query term weight.   It performs very well in TREC experiments (ri + 0.5) /(R − ri + 0.5) (k + 1) f i (k 2 + 1)qf i R(q,D) = ∑ log ⋅ i ⋅ i∈Q (n i − ri + 0.5) /(N − n i − R + ri + 0.5) K + f i k 2 + qf i dl K = k1 ((1− b) + b ⋅ ) avgdl € k1 k2 b: tuning parameters dl: document length avgdl: average document length in data set € qf: term frequency in query terms
  • 10. Weighted Fields Boolean Search doc-id field0 field1 … text 1 2 3 … n R(q,D) = ∑ ∑w f mi i∈q f ∈ fileds €
  • 11. Apply Probabilistic Knowledge into Fields Higher gradient Lower doc-id field0 field1 … Text 1 Lightyear Buzz 2 3 … n Relevant P(R|D) Document Non- Relevant P(NR|D)
  • 12. Use the Knowledge during Ranking doc-id field0 field1 … Text 1 Lightyear Buzz 2 3 … n   The goal is: t t P(D | R) = ∏ P(di | R) = ∑ log(P(di | R)) ≈ ∑ ∑ w f mi i=1 i=1 i∈q f ∈F Learnable €
  • 13. Comparison of Approaches f ik N RTF −IDF = tf ik ⋅ idf i = t ⋅ log nk ∑f ij j=1 (k1 + 1) f i (k2 + 1)qf i dl Rbm 25 (q,D) = ⋅ K = k1 ((1− b) + b ⋅ ) K + fi k 2 + qf i avgdl € (ri + 0.5) /(R − ri + 0.5) (k + 1) f i (k 2 + 1)qf i R(q,D) = ∑ log ⋅ 1 ⋅ i∈Q (n i − ri + 0.5) /(N − n i − R + ri + 0.5) K + f i k 2 + qf i € € IDF TF € (k1 + 1) f i (k 2 + 1)qf i R(q,D) = ∑ ∑ w f mi ⋅ ⋅ i∈q f ∈F K + fi k 2 + qf i IDF TF €
  • 14. Other Considerations   Thisis not a formal model   Require user relevance feedback (search log)   Harder to handle real-time search queries   How to Prevent Love/Hate attacks

Editor's Notes

  1. Si: in non-relevant set, the probability of term i occurringPi: inrelevant set, the probability of term i occurringN: total number of Non-relevant documentsni: number of non-relevant documents that contain a termri: number of relevant documents that contain a term R: total number of Relevant documents