SlideShare a Scribd company logo
1 of 40
Download to read offline
Is This Entity Relevant
          to Your Needs?
                       David Carmel
            IBM Research - Haifa, Israel




IBM Research - Haifa                       © 2012 IBM Corporation
IBM Research - Haifa


Outline
    Some Open Questions in Entity Oriented Search (EoS)
        What makes an entity relevant to the user needs?
        Is it the same relevance that the IR community deals with
        Can we adopt exiting IR models into this new area
    The classical model of relevance in IR
        User based relevance
        Topical based relevance (Aboutness)
        Similarity based relevance measurements
    Supportive evidence as indication of relevancy
        For Q&A
        For EoS
    Relevance Estimation approaches for EOS
    Exploration & Discovery in EoS
    Summary

2                       Is This entity Relevant?                    © 2012 IBM Corporation
IBM Research - Haifa


  Entity Oriented Search (EoS)

     When people use retrieval systems they are often not searching for
     documents or text passages

     Often named entities play a central role in answering such information
     needs
          persons, organizations, locations, products…



     At least 20-30% of the queries submitted
     to Web SE are simply name entities

   ~71% of Web search queries contain
   named entities
(Named entity recognition in query, Guo et al, SIGIR09)




 3                         Is This entity Relevant?                © 2012 IBM Corporation
IBM Research - Haifa


    Popular Entity Oriented Search tools
     Product Search
         On-line Shopping (books, movies, electronic devices…)
               Amazon, eBay…
         Travel (places, hotels, flights…)
               Yahoo! Travel, Kayak…
         Multi-media (Music, Video, Images…)
               Last.fm, YouYube, Flickr…
     People Search
         Expert Search (for a specific topic)
               LinkedIn, ArnetMiner…
         Friends (colleagues, other people with mutual interests,
         lost friends …)
               Facebook…

     Location Search
         Addresses
         Businesses
         Proximity Search (Find close sites to the current searcher’s
         location)

4                         Is This entity Relevant?                      © 2012 IBM Corporation
IBM Research - Haifa




5                    Is This entity Relevant?   © 2012 IBM Corporation
IBM Research - Haifa


 Expert Search

The task:
     Identify people who are
     knowledgeable on a
     specific topic
     Find people who have
     skills and experience
     on a given topic

     How knowledgeable can
     be measured?
     How persons should be
     ranked, in response to a
     query, such that those
     with relevant expertise
     are ranked first?

 6                        Is This entity Relevant?   © 2012 IBM Corporation
IBM Research - Haifa



Are those entities satisfy our needs?
    What makes an entity relevant to the user’s need?
    What is the meaning of relevance in this context?
    Is it the same relevance that the IR community deals with for many
    decades in the context of document retrieval?
    Can we adopt exiting IR models into this new area of Entity oriented
    Search in a straight forward manner?


    In this talk I’ll try to deal with some of those questions

    I’ll overview how the same questions are handled in related
    areas, (especially in Q&A)

    I’ll raise some research directions that might lead to a better
    understanding of the concept of relevance in EoS
7                       Is This entity Relevant?                 © 2012 IBM Corporation
IBM Research - Haifa


What is an Entity?
    Entity: an object or a “thing” that can be uniquely identified in the world
         An entity must be distinguished from other entities
         Can be anything (including an abstract thing!)
    Attributes: Used to describe entities
         An attribute contains a single piece of information
         Key - A minimal set of attributes that uniquely identify an entity
    Entity set: a set of entities of the same type and attributes


                 id                                      birthday
                                       Actor
             name                                          address



8                       Is This entity Relevant?                          © 2012 IBM Corporation
IBM Research - Haifa



What is a Relationship?
     Relationship: Association among two or more entities
            A Relationship also may have attributes
     Relationship Set: Set of relationships of the same type



                            code              Medication      name



     id
                     Patient                   Prescription      Physician          id

    name
                                                  Date



9                           Is This entity Relevant?                         © 2012 IBM Corporation
IBM Research - Haifa



Example: ERD for Social Search in the Enterprise




                                   Creator




10                    Is This entity Relevant?     © 2012 IBM Corporation
IBM Research - Haifa


Entity Relationship Graph (ERG)
     Represents
          Entity instances as graph nodes
          Binary relationships as (weighted) edges
                   N-ary relations are broken into binary ones




11                         Is This entity Relevant?              © 2012 IBM Corporation
IBM Research - Haifa


Entity Oriented Search (EoS)


                                                                           Entity
                                                                        Relationship
                                                  Entities,
                                                  Relations
                                                                            Index
           Entity Relationship
                  Data



     Query Examples:
     • Nikon D40
     • Teammates of Michael Schumacher
                             Query
     • “Data mining”         (Free Text, Entity, Hybrid query)
                                                                        Runtime
                                      Related Entities, Relationships    Ranking
                                                                        Navigation
                                                                        Exploration




12                          Is This entity Relevant?                                   © 2012 IBM Corporation
The concept of Relevance in IR




IBM Research - Haifa                           © 2012 IBM Corporation
IBM Research - Haifa


 The Classical Concept of Relevance in IR (Saracevic76, Mizzaro96)

     Problem                               Request                                 Judgment

P: The user has                    R: The user expresses                     J: The same user
problem to solve                   IN explicitly, usually                    Judges the
or an aim to                       In natural language,                      RELEVANCE
 achieve                           (sometimes with the                       of search results
                                   help of an intermediary)
                   Information                                Query
                       Need

                IN: The user builds                     Q: Formalization: R is
                mental, implicit                        translated to a formal
                representation of P                     query understandable by
                (may be incorrect or                    the search system
                Incomplete)


14                        Is This entity Relevant?                                © 2012 IBM Corporation
IBM Research - Haifa


User-based (Subjective) Relevance
     Relevance is a dynamic concept that depends on the
     user’s subjective judgment

     Subjective Relevance judgment may depend on:
         User’s characteristics and perceptions
             Gender, age, education, income, occupation…
             Preferences, Interests,
             State of mind
         The context of search
             Level of the user’s expertise (regarding the topic of interests)
             Current Time
             Current Location
         Session status
             Dependencies between retrieved items to the
              • specific query
              • sequential queries during the session


15                        Is This entity Relevant?                              © 2012 IBM Corporation
IBM Research - Haifa


Topical-based relevance judgment
     How well the topic of the information retrieved matches the topic
     of the request
         An object is objectively relevant to a request if
         it deals with the topic of the request (Aboutness)


     TREC working definition for relevance assessment:
      If you are writing a report on the topic and would use the
          information contained in the document in the report –
          then the document is considered relevant to the topic…

     A document is judged relevant if any piece of it is relevant regardless of
     how small that piece is in relation to the rest of the document




16                       Is This entity Relevant?                  © 2012 IBM Corporation
IBM Research - Haifa



Probability Ranking Principal
     Given a set of documents that “match” the entity-oriented query
       How do we rank them for the user?

 The Probability Ranking Principal (PRP) for Document Retrieval
     (Robertson 71):
     ``If a retrieval system's response to each request is a ranking of the documents
     in the collection in order of decreasing probability of relevance to the user who
     submitted the request,
            where the probabilities are estimated as accurately as possible on the basis of whatever data
          have been made available to the system for this purpose,
     The overall effectiveness of the system to its user will be the best…''

                Pr( R = 1 | d , q)                         Pr( R = 1 | e, q )
       We need a reliable and coherent methodology for measuring
       the probability of relevance of an entity to a query

17                        Is This entity Relevant?                                       © 2012 IBM Corporation
IBM Research - Haifa


 Relevance estimation in classic Document Retrieval
      Most relevance approximation approaches for
      document retrieval are based on measuring
      some kind of similarity between the user's query
      and retrieved documents
          Vector Space:
              The Cosine of the angle between two vectors
          Concept space:
              similarity in the latent concept space
                • e.g. LDA, LSI, ESA
          Language models:
              Similarity between the
              documents and the query term distributions

Can we use similar approaches for EoS?

 18                        Is This entity Relevant?         © 2012 IBM Corporation
IBM Research - Haifa


Entity Similarity
       While similarity plays a central role in document retrieval for relevance
       estimation many relevant entities are not similar to the queried entity
             At least according to standard definitions of similarity


       This problem is well known in the Question Answering domain
             The answer is not necessarily “similar” to the question
             The supportive passage is not always similar to the question

             Example: Who killed JFK?
     John F. Kennedy (JFK), the thirty-fifth President of the United States, was
     assassinated at 12:30 p.m. Central Standard Time (18:30 UTC) on Friday,
     November 22, 1963, in Dealey Plaza, Dallas, Texas.

     The ten-month investigation of the Warren Commission of 1963–1964 concluded
     that the President was assassinated by Lee Harvey Oswald.

19                          Is This entity Relevant?                               © 2012 IBM Corporation
IBM Research - Haifa


Relevance Judgment in Question Answering
     In QA we usually assume a question that identifies the information need
     “precisely”
          Who was the first American in space?
          How many calories are there in a Big Mac?
          How many Grand Slam titles did Bjorn Borg win?

     When an answer will be considered relevant to the question?
          It must be correct!
                i.e. it Must has supportive evidences (from reliable sources)

      A prominent factor in answering a question is not so much in finding an answer but in
          validating whether the candidate answer is correct
                Therefore supportive evidence is essential

     Assessment instructions from the TREC’s QA track:
          Assessors read each candidate answer and make a binary
          decision as to whether or not the candidate is actually an
          answer to the question
           in the context provided by the supportive document
20                         Is This entity Relevant?                              © 2012 IBM Corporation
IBM Research - Haifa


 What do you mean the answer is correct?
As in Document retrieval – correctness/relevance in QA might be subjective
    and user dependent
    Where is the Taj Mahal?
                Agra, India?                           The famous temple
                Atlantic-City, NJ?                     Casino?

      In TREC, it is common to consider each candidate answer with (relevant) supportive
      evidences as correct one

      This leads to the understanding how various candidate answers can be ranked:
           i.e. Relevance judgment is transformed to the judgment of the relevance of
           supporting evidences
      This approach can be applied to Entity oriented Search
           Rank retrieved entitles according to the amount and quality of their
           supportive evidences!
         Entity Ranking should be based on the supportive evidences
         for their relevance to the query
 21                         Is This entity Relevant?                             © 2012 IBM Corporation
Relevance Estimation Approaches
                       for EoS




IBM Research - Haifa                    © 2012 IBM Corporation
IBM Research - Haifa



The Expert Profile based Approach (Craswell et all 2001):
     Represent each person by a virtual document (a profile)
         Employee directory (in the enterprise)
         Concatenating all existing passages mentioning the person

     Rank those profiles according to their relevance to the query
             Using standard IR ranking techniques


     The user profile can be naturally used as supportive evidence to the user
     expertise


     Difficulties:
         Co-resolution and name disambiguation
         Privacy concerns

23                        Is This entity Relevant?                   © 2012 IBM Corporation
IBM Research - Haifa


EoS: Voting approach (Balog06, MacDonald09)
     Any relevant document is a “voter” for the entities it mentions / relates-to

                                                        p1
                                          d1

                      q                   d2             p2

                                          d3
                                                        p3

                 Score( p, q ) = ∑ Score(d , q ) ∗ Score( p, d )
                                      d

      What is the ratio behind?
          An entity mentioned many times in relevant (top retrieved) docs
       is more likely to be relevant on the given topic?

24                         Is This entity Relevant?                         © 2012 IBM Corporation
IBM Research - Haifa



Relevance Propagation (Serdyukov 2008)
     We should also consider entities that are indirectly related to the query
       Relevance is propagated through the entity relationship graph

                                                    p1
                           d1

        q                 d2                        p2                 p4

                           d3
                                                    p3


                                                    d4


      How relevance should be propagated in the graph?

25                       Is This entity Relevant?                           © 2012 IBM Corporation
IBM Research - Haifa


Proximity in the Entity Relationship Graph - Random walks
      Random walk approach
       The relationship strength between two
      nodes is reflected by the probability that a
      random surfer who starts at one node will
      visit the second one during the walk

      Justification
                                                        Popular Random Walk Approaches
        The more paths that connect the two
                                                           SimRank(u,v):
      entities in the graph                                  How soon two random surfers (starting at u,v) are
            the higher the probability that the             expected to meet at the same node

      surfer will visit the target entity                  Random walk with Restart (RWR) :
                                                             The surfer has a fixed restart probability to return to
              The higher the relationship strength          the source
            between the two                                Lazy Random Walk
                                                             The surfer has a fixed probability of halting the walk at
                                                            each step
                                                           Effective Conductance
                                                            Only simple (cycle free) paths –
                                                           treating edges as resistors


 26                          Is This entity Relevant?                                            © 2012 IBM Corporation
IBM Research - Haifa



Markov Random Fields for EoS (Raviv, Carmel, Kurland, 2012)




                            Q =< {q1...qn }, T >

             P( E | Q)                 ∑
                                  P∈{ D ,T , N }
                                                   λE P( EP | Q)
                                                     P



27                    Is This entity Relevant?                     © 2012 IBM Corporation
IBM Research - Haifa



MRF based Entity Document Scoring                                                 P(ED|Q)
     We consider three types cliques
       Full Independent
       Sequential dependent
       Full dependent
     The feature function over cliques
       measures how well the clique's terms represent the entity document
       Based on Dirichlet smoothed language model
                   T                  tf (qi , ED ) + µ ⋅ cf (qi )/ | C | 
                  f (qi , ED )   log                                      
                                                  | ED | + µ
                   D
                                                                          
       For dependent models we replace qi with
         #1(qi..qi+k) and #uwN({qi,.. qj}) respectively
     The entity document scoring function aggregates the feature functions
     over all clique types

                               P( ED | Q)              ∑
                                                    I ∈{T ,O ,U }
                                                                    λ
                                                                    I
                                                                    ED    ∑
                                                                         c∈I ED
                                                                                    I
                                                                                  f (c )
                                                                                   D


28                       Is This entity Relevant?                                       © 2012 IBM Corporation
IBM Research - Haifa



Entity type Scoring                         P(ET|Q)
       We measure the “similarity” between the query type and the entity type

                                                 e −α d (QT , ET )            
             P ( ET | Q) = fT (c)           log            −α d ( QT , E 'T ) 
                                                 ∑ E '∈R e
                                                                              
                                                                               
     d(QT,ET) - the type distance,
     is domain dependent
     In our experiments we
     measured the distance in the
     Wikipedia category graph
          The minimal path length
          between all pairs of the
          query and the entity’s
          page categories


29                        Is This entity Relevant?                                 © 2012 IBM Corporation
IBM Research - Haifa



Entity Name Scoring P(EN|Q)
     We measure the dependency between the query term(s) and
     the entity name
      Globally
          Measure the proximity between the query term(s) and the entity name in
        the whole collection
            • We use pointwise mutual information (PMI) – the likelihood of finding
            one term in proximity to another term
      Locally
          Measure the proximity between the query terms and the entity name in the
        top retrieved documents
                     P( EN | Q) =         ∑ λE
                                         X ∈A
                                             X
                                                    N    ∑
                                                        c∈X EN
                                                                 f NX (c)

                     A = {S , T , O , U , PMI T , PMI O , PMI U }

30                       Is This entity Relevant?                           © 2012 IBM Corporation
IBM Research - Haifa


   Experimental Results over INEX Entity track (2007-2009)
                        Full Independence                                            Sequential dependence

 0.4                                                               0.4
0.35                                                              0.35
 0.3                                                               0.3
0.25                                                              0.25                                                         2007
                                                           2007
 0.2                                                       2008    0.2                                                         2008
                                                           2009                                                                2009
0.15                                                              0.15
 0.1                                                               0.1
0.05                                                              0.05
  0                                                                 0
       S(ED)        S(ED,ET)      S(ED,ET,EN)   INEX top                    S(ED)   S(ED,ET)     S(ED,ET,EN)      INEX top



                                                                         Results are improved significantly
                         Full dependence

 0.4
                                                                         when type and name scoring were
0.35                                                                     added
 0.3
0.25                                                       2007
 0.2                                                       2008
                                                           2009
                                                                         Final Results are superior to top INEX
0.15
 0.1
                                                                         results at 2007,2008, and comparable
0.05                                                                     to 2009
  0
       S(ED)        S(ED,ET)      S(ED,ET,EN)   INEX top


                                                                         Dependence models have not
                                                                         improved over Independence model??
  31                                 Is This entity Relevant?                                             © 2012 IBM Corporation
IBM Research - Haifa



Exploratory EoS
     When only an entity is given as input, the information need is quite fuzzy
         Any related entity has a potential to be relevant
               Therefore any related entity should be retrieved!
         High diversity in search results (entity types, relationship types)

     How can we ease the user to find the most relevant answers?

     Iterative IR – let the user navigate and explore the ER graph
         Facet search:
               Categorize the search results according to their facets (entity types/attributes..)
               Let the user drill down: restrict retrieved entities to a specific facet
               NOTE: We still need to rank the search results in each of the facets!
         Graph navigation:
               Let the user explore the graph by using a retrieved entity as a pivot to a new
               search
               Query reformulation


32                       Is This entity Relevant?                                  © 2012 IBM Corporation
IBM Research - Haifa


Search over Social Media Data (SaND) – (Carmel 2009, Guy 2010)
     SaND provides social aggregation over social
     data

     SaND builds an entity-entity relationship
     matrix that maps a given entity to all related
     entities, weighted by their relationship
     strength
        Direct relations of a user to:
              document – as an author, tagger
            and commenter
              another user – as a friend or as a
            manager/employee
              tag – she used, or tagged by others
              group –as a member/owner
        Indirect relations:
              Two entities are indirectly related if
            both are directly related to the same
            entity

     The overall relationship strength between two
     entities is determined by a linear combination
     of their direct and indirect relationship
     strengths

33                         Is This entity Relevant?    © 2012 IBM Corporation
IBM Research - Haifa


                                Search for the term ‘social’




                  Related People – Ranked list of
                  people that are related to the topic
                  and to the result set, in one or more
                  relationship types (author,
                  commenter, tagger, etc.)


                                                  Results contain different types of
                                                  entities – Blogs, Communities,
                                                  bookmarked documents etc..
                                                  Popular, higher ranked results
Related Tags – Ranked tag cloud for               appear higher in the result set.
this result set.
34                     Is This entity Relevant?                            © 2012 IBM Corporation
IBM Research - Haifa




 Narrowing the search to Luis
 Suarez’ related results                           Hovering over a result, highlights
                                                   the related people and tags




35                      Is This entity Relevant?                              © 2012 IBM Corporation
IBM Research - Haifa

                            Viewing results for query ‘social’
                            and person ‘Luis Suarez’




                      Viewing Luis’ business card, and
                      results related to him




36                    Is This entity Relevant?                   © 2012 IBM Corporation
IBM Research - Haifa


Summary
     In this talk we raised several questions related to the concept of relevance
     in EoS:
         What makes an entity relevant to the user’s need?
         What is the meaning of relevance in this context?
         Is it the same notion of relevance used in document retrieval?
     We argue that the relevance of an entity can be estimated, according to
     supportive evidences provided by the search system
     We talked on EoS common retrieval techniques:
         Profile based approach
         The Voting approach
         Relevance propagation
     We discussed several examples of EoS systems and how relevance
     estimation can be applied in these domains
     We claimed that the scale and diversity of EoS search results demand
     Exploratory search techniques such as Facet search and Graph
     navigation
37                       Is This entity Relevant?                         © 2012 IBM Corporation
IBM Research - Haifa



Open Questions and Challenges
     Entity Similarity
          While in document retrieval similarity plays a central role
          in relevance judgment, entity similarity measurement
          should still be better understood
              Attribute based similarity, Evidence based similarity
              Graph proximity
              Hybrid approaches
          The clustering hypothesis:
               Are two “similar” entities likely being relevant to the same information need?

     Challenges
          to what extent relevant entities are indeed similar to each other
              and according to which similarity measurement
          Relevance propagation: What relationship types provide effective relevance
          propagation channels?
              Do your friends inherit your own expertise?
              Which relationship types contribute to relevance propagation?

38                         Is This entity Relevant?                                © 2012 IBM Corporation
IBM Research - Haifa




     Thank You!

     Questions?
39                    Is This entity Relevant?   © 2012 IBM Corporation
Is This Entity Relevant
          to Your Needs?
                       David Carmel
            IBM Research - Haifa, Israel




IBM Research - Haifa                       © 2012 IBM Corporation

More Related Content

Similar to Is this Entitity Relevant to your Needs - CIKM2012

Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Bianca Pereira
 
PhD Day: Entity Linking using Generic Linked Data Datasets
PhD Day: Entity Linking using Generic Linked Data DatasetsPhD Day: Entity Linking using Generic Linked Data Datasets
PhD Day: Entity Linking using Generic Linked Data DatasetsBianca Pereira
 
Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)Helen Mitchell
 
Elements of Social Software and Social Search
Elements of Social Software and Social SearchElements of Social Software and Social Search
Elements of Social Software and Social SearchThomas Vander Wal
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Lak12 - Leeds - Deriving Group Profiles from Social Media
Lak12 - Leeds - Deriving Group Profiles from Social Media Lak12 - Leeds - Deriving Group Profiles from Social Media
Lak12 - Leeds - Deriving Group Profiles from Social Media lydia-lau
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
 
Big Data, Watson & The Future of Sourcing
Big Data, Watson & The Future of SourcingBig Data, Watson & The Future of Sourcing
Big Data, Watson & The Future of SourcingKevin Wheeler
 
랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표
랭킹 최적화를 넘어 인간적인 검색으로  - 서울대 융합기술원 발표랭킹 최적화를 넘어 인간적인 검색으로  - 서울대 융합기술원 발표
랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표Jin Young Kim
 
2007 Opinion Mining
2007 Opinion Mining2007 Opinion Mining
2007 Opinion MiningGeorge Ang
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion MiningGeorge Ang
 
Linked Data: How it is changing the way data is published and accessed on web
Linked Data: How it is changing the way data is published and accessed on webLinked Data: How it is changing the way data is published and accessed on web
Linked Data: How it is changing the way data is published and accessed on webRavish Bhagdev
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the webJose Manuel Gómez-Pérez
 
SF Women in eDiscovery Sept 2011
SF Women in eDiscovery Sept 2011SF Women in eDiscovery Sept 2011
SF Women in eDiscovery Sept 2011Sonya Sigler
 
Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Dave King
 
Semantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCubeSemantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCubeAmit Sheth
 
전문가토크릴레이 2탄 Open data and linked data (김학래 박사)
전문가토크릴레이 2탄 Open data and linked data (김학래 박사)전문가토크릴레이 2탄 Open data and linked data (김학래 박사)
전문가토크릴레이 2탄 Open data and linked data (김학래 박사)Saltlux zinyus
 
Intro to Vita Beans
Intro to Vita BeansIntro to Vita Beans
Intro to Vita Beansamruth
 
Responsive Media
Responsive MediaResponsive Media
Responsive Mediabo begole
 

Similar to Is this Entitity Relevant to your Needs - CIKM2012 (20)

Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)
 
PhD Day: Entity Linking using Generic Linked Data Datasets
PhD Day: Entity Linking using Generic Linked Data DatasetsPhD Day: Entity Linking using Generic Linked Data Datasets
PhD Day: Entity Linking using Generic Linked Data Datasets
 
Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)
 
Elements of Social Software and Social Search
Elements of Social Software and Social SearchElements of Social Software and Social Search
Elements of Social Software and Social Search
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Lak12 - Leeds - Deriving Group Profiles from Social Media
Lak12 - Leeds - Deriving Group Profiles from Social Media Lak12 - Leeds - Deriving Group Profiles from Social Media
Lak12 - Leeds - Deriving Group Profiles from Social Media
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
 
Big Data, Watson & The Future of Sourcing
Big Data, Watson & The Future of SourcingBig Data, Watson & The Future of Sourcing
Big Data, Watson & The Future of Sourcing
 
랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표
랭킹 최적화를 넘어 인간적인 검색으로  - 서울대 융합기술원 발표랭킹 최적화를 넘어 인간적인 검색으로  - 서울대 융합기술원 발표
랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표
 
2007 Opinion Mining
2007 Opinion Mining2007 Opinion Mining
2007 Opinion Mining
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
 
Linked Data: How it is changing the way data is published and accessed on web
Linked Data: How it is changing the way data is published and accessed on webLinked Data: How it is changing the way data is published and accessed on web
Linked Data: How it is changing the way data is published and accessed on web
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the web
 
SF Women in eDiscovery Sept 2011
SF Women in eDiscovery Sept 2011SF Women in eDiscovery Sept 2011
SF Women in eDiscovery Sept 2011
 
Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2
 
Semantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCubeSemantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCube
 
Enterprise Social Search
Enterprise Social SearchEnterprise Social Search
Enterprise Social Search
 
전문가토크릴레이 2탄 Open data and linked data (김학래 박사)
전문가토크릴레이 2탄 Open data and linked data (김학래 박사)전문가토크릴레이 2탄 Open data and linked data (김학래 박사)
전문가토크릴레이 2탄 Open data and linked data (김학래 박사)
 
Intro to Vita Beans
Intro to Vita BeansIntro to Vita Beans
Intro to Vita Beans
 
Responsive Media
Responsive MediaResponsive Media
Responsive Media
 

Is this Entitity Relevant to your Needs - CIKM2012

  • 1. Is This Entity Relevant to Your Needs? David Carmel IBM Research - Haifa, Israel IBM Research - Haifa © 2012 IBM Corporation
  • 2. IBM Research - Haifa Outline Some Open Questions in Entity Oriented Search (EoS) What makes an entity relevant to the user needs? Is it the same relevance that the IR community deals with Can we adopt exiting IR models into this new area The classical model of relevance in IR User based relevance Topical based relevance (Aboutness) Similarity based relevance measurements Supportive evidence as indication of relevancy For Q&A For EoS Relevance Estimation approaches for EOS Exploration & Discovery in EoS Summary 2 Is This entity Relevant? © 2012 IBM Corporation
  • 3. IBM Research - Haifa Entity Oriented Search (EoS) When people use retrieval systems they are often not searching for documents or text passages Often named entities play a central role in answering such information needs persons, organizations, locations, products… At least 20-30% of the queries submitted to Web SE are simply name entities ~71% of Web search queries contain named entities (Named entity recognition in query, Guo et al, SIGIR09) 3 Is This entity Relevant? © 2012 IBM Corporation
  • 4. IBM Research - Haifa Popular Entity Oriented Search tools Product Search On-line Shopping (books, movies, electronic devices…) Amazon, eBay… Travel (places, hotels, flights…) Yahoo! Travel, Kayak… Multi-media (Music, Video, Images…) Last.fm, YouYube, Flickr… People Search Expert Search (for a specific topic) LinkedIn, ArnetMiner… Friends (colleagues, other people with mutual interests, lost friends …) Facebook… Location Search Addresses Businesses Proximity Search (Find close sites to the current searcher’s location) 4 Is This entity Relevant? © 2012 IBM Corporation
  • 5. IBM Research - Haifa 5 Is This entity Relevant? © 2012 IBM Corporation
  • 6. IBM Research - Haifa Expert Search The task: Identify people who are knowledgeable on a specific topic Find people who have skills and experience on a given topic How knowledgeable can be measured? How persons should be ranked, in response to a query, such that those with relevant expertise are ranked first? 6 Is This entity Relevant? © 2012 IBM Corporation
  • 7. IBM Research - Haifa Are those entities satisfy our needs? What makes an entity relevant to the user’s need? What is the meaning of relevance in this context? Is it the same relevance that the IR community deals with for many decades in the context of document retrieval? Can we adopt exiting IR models into this new area of Entity oriented Search in a straight forward manner? In this talk I’ll try to deal with some of those questions I’ll overview how the same questions are handled in related areas, (especially in Q&A) I’ll raise some research directions that might lead to a better understanding of the concept of relevance in EoS 7 Is This entity Relevant? © 2012 IBM Corporation
  • 8. IBM Research - Haifa What is an Entity? Entity: an object or a “thing” that can be uniquely identified in the world An entity must be distinguished from other entities Can be anything (including an abstract thing!) Attributes: Used to describe entities An attribute contains a single piece of information Key - A minimal set of attributes that uniquely identify an entity Entity set: a set of entities of the same type and attributes id birthday Actor name address 8 Is This entity Relevant? © 2012 IBM Corporation
  • 9. IBM Research - Haifa What is a Relationship? Relationship: Association among two or more entities A Relationship also may have attributes Relationship Set: Set of relationships of the same type code Medication name id Patient Prescription Physician id name Date 9 Is This entity Relevant? © 2012 IBM Corporation
  • 10. IBM Research - Haifa Example: ERD for Social Search in the Enterprise Creator 10 Is This entity Relevant? © 2012 IBM Corporation
  • 11. IBM Research - Haifa Entity Relationship Graph (ERG) Represents Entity instances as graph nodes Binary relationships as (weighted) edges N-ary relations are broken into binary ones 11 Is This entity Relevant? © 2012 IBM Corporation
  • 12. IBM Research - Haifa Entity Oriented Search (EoS) Entity Relationship Entities, Relations Index Entity Relationship Data Query Examples: • Nikon D40 • Teammates of Michael Schumacher Query • “Data mining” (Free Text, Entity, Hybrid query) Runtime Related Entities, Relationships Ranking Navigation Exploration 12 Is This entity Relevant? © 2012 IBM Corporation
  • 13. The concept of Relevance in IR IBM Research - Haifa © 2012 IBM Corporation
  • 14. IBM Research - Haifa The Classical Concept of Relevance in IR (Saracevic76, Mizzaro96) Problem Request Judgment P: The user has R: The user expresses J: The same user problem to solve IN explicitly, usually Judges the or an aim to In natural language, RELEVANCE achieve (sometimes with the of search results help of an intermediary) Information Query Need IN: The user builds Q: Formalization: R is mental, implicit translated to a formal representation of P query understandable by (may be incorrect or the search system Incomplete) 14 Is This entity Relevant? © 2012 IBM Corporation
  • 15. IBM Research - Haifa User-based (Subjective) Relevance Relevance is a dynamic concept that depends on the user’s subjective judgment Subjective Relevance judgment may depend on: User’s characteristics and perceptions Gender, age, education, income, occupation… Preferences, Interests, State of mind The context of search Level of the user’s expertise (regarding the topic of interests) Current Time Current Location Session status Dependencies between retrieved items to the • specific query • sequential queries during the session 15 Is This entity Relevant? © 2012 IBM Corporation
  • 16. IBM Research - Haifa Topical-based relevance judgment How well the topic of the information retrieved matches the topic of the request An object is objectively relevant to a request if it deals with the topic of the request (Aboutness) TREC working definition for relevance assessment: If you are writing a report on the topic and would use the information contained in the document in the report – then the document is considered relevant to the topic… A document is judged relevant if any piece of it is relevant regardless of how small that piece is in relation to the rest of the document 16 Is This entity Relevant? © 2012 IBM Corporation
  • 17. IBM Research - Haifa Probability Ranking Principal Given a set of documents that “match” the entity-oriented query How do we rank them for the user? The Probability Ranking Principal (PRP) for Document Retrieval (Robertson 71): ``If a retrieval system's response to each request is a ranking of the documents in the collection in order of decreasing probability of relevance to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose, The overall effectiveness of the system to its user will be the best…'' Pr( R = 1 | d , q) Pr( R = 1 | e, q ) We need a reliable and coherent methodology for measuring the probability of relevance of an entity to a query 17 Is This entity Relevant? © 2012 IBM Corporation
  • 18. IBM Research - Haifa Relevance estimation in classic Document Retrieval Most relevance approximation approaches for document retrieval are based on measuring some kind of similarity between the user's query and retrieved documents Vector Space: The Cosine of the angle between two vectors Concept space: similarity in the latent concept space • e.g. LDA, LSI, ESA Language models: Similarity between the documents and the query term distributions Can we use similar approaches for EoS? 18 Is This entity Relevant? © 2012 IBM Corporation
  • 19. IBM Research - Haifa Entity Similarity While similarity plays a central role in document retrieval for relevance estimation many relevant entities are not similar to the queried entity At least according to standard definitions of similarity This problem is well known in the Question Answering domain The answer is not necessarily “similar” to the question The supportive passage is not always similar to the question Example: Who killed JFK? John F. Kennedy (JFK), the thirty-fifth President of the United States, was assassinated at 12:30 p.m. Central Standard Time (18:30 UTC) on Friday, November 22, 1963, in Dealey Plaza, Dallas, Texas. The ten-month investigation of the Warren Commission of 1963–1964 concluded that the President was assassinated by Lee Harvey Oswald. 19 Is This entity Relevant? © 2012 IBM Corporation
  • 20. IBM Research - Haifa Relevance Judgment in Question Answering In QA we usually assume a question that identifies the information need “precisely” Who was the first American in space? How many calories are there in a Big Mac? How many Grand Slam titles did Bjorn Borg win? When an answer will be considered relevant to the question? It must be correct! i.e. it Must has supportive evidences (from reliable sources) A prominent factor in answering a question is not so much in finding an answer but in validating whether the candidate answer is correct Therefore supportive evidence is essential Assessment instructions from the TREC’s QA track: Assessors read each candidate answer and make a binary decision as to whether or not the candidate is actually an answer to the question in the context provided by the supportive document 20 Is This entity Relevant? © 2012 IBM Corporation
  • 21. IBM Research - Haifa What do you mean the answer is correct? As in Document retrieval – correctness/relevance in QA might be subjective and user dependent Where is the Taj Mahal? Agra, India? The famous temple Atlantic-City, NJ? Casino? In TREC, it is common to consider each candidate answer with (relevant) supportive evidences as correct one This leads to the understanding how various candidate answers can be ranked: i.e. Relevance judgment is transformed to the judgment of the relevance of supporting evidences This approach can be applied to Entity oriented Search Rank retrieved entitles according to the amount and quality of their supportive evidences! Entity Ranking should be based on the supportive evidences for their relevance to the query 21 Is This entity Relevant? © 2012 IBM Corporation
  • 22. Relevance Estimation Approaches for EoS IBM Research - Haifa © 2012 IBM Corporation
  • 23. IBM Research - Haifa The Expert Profile based Approach (Craswell et all 2001): Represent each person by a virtual document (a profile) Employee directory (in the enterprise) Concatenating all existing passages mentioning the person Rank those profiles according to their relevance to the query Using standard IR ranking techniques The user profile can be naturally used as supportive evidence to the user expertise Difficulties: Co-resolution and name disambiguation Privacy concerns 23 Is This entity Relevant? © 2012 IBM Corporation
  • 24. IBM Research - Haifa EoS: Voting approach (Balog06, MacDonald09) Any relevant document is a “voter” for the entities it mentions / relates-to p1 d1 q d2 p2 d3 p3 Score( p, q ) = ∑ Score(d , q ) ∗ Score( p, d ) d What is the ratio behind? An entity mentioned many times in relevant (top retrieved) docs is more likely to be relevant on the given topic? 24 Is This entity Relevant? © 2012 IBM Corporation
  • 25. IBM Research - Haifa Relevance Propagation (Serdyukov 2008) We should also consider entities that are indirectly related to the query Relevance is propagated through the entity relationship graph p1 d1 q d2 p2 p4 d3 p3 d4 How relevance should be propagated in the graph? 25 Is This entity Relevant? © 2012 IBM Corporation
  • 26. IBM Research - Haifa Proximity in the Entity Relationship Graph - Random walks Random walk approach The relationship strength between two nodes is reflected by the probability that a random surfer who starts at one node will visit the second one during the walk Justification Popular Random Walk Approaches The more paths that connect the two SimRank(u,v): entities in the graph How soon two random surfers (starting at u,v) are the higher the probability that the expected to meet at the same node surfer will visit the target entity Random walk with Restart (RWR) : The surfer has a fixed restart probability to return to The higher the relationship strength the source between the two Lazy Random Walk The surfer has a fixed probability of halting the walk at each step Effective Conductance Only simple (cycle free) paths – treating edges as resistors 26 Is This entity Relevant? © 2012 IBM Corporation
  • 27. IBM Research - Haifa Markov Random Fields for EoS (Raviv, Carmel, Kurland, 2012) Q =< {q1...qn }, T > P( E | Q) ∑ P∈{ D ,T , N } λE P( EP | Q) P 27 Is This entity Relevant? © 2012 IBM Corporation
  • 28. IBM Research - Haifa MRF based Entity Document Scoring P(ED|Q) We consider three types cliques Full Independent Sequential dependent Full dependent The feature function over cliques measures how well the clique's terms represent the entity document Based on Dirichlet smoothed language model T  tf (qi , ED ) + µ ⋅ cf (qi )/ | C |  f (qi , ED ) log   | ED | + µ D   For dependent models we replace qi with #1(qi..qi+k) and #uwN({qi,.. qj}) respectively The entity document scoring function aggregates the feature functions over all clique types P( ED | Q) ∑ I ∈{T ,O ,U } λ I ED ∑ c∈I ED I f (c ) D 28 Is This entity Relevant? © 2012 IBM Corporation
  • 29. IBM Research - Haifa Entity type Scoring P(ET|Q) We measure the “similarity” between the query type and the entity type  e −α d (QT , ET )  P ( ET | Q) = fT (c) log  −α d ( QT , E 'T )   ∑ E '∈R e    d(QT,ET) - the type distance, is domain dependent In our experiments we measured the distance in the Wikipedia category graph The minimal path length between all pairs of the query and the entity’s page categories 29 Is This entity Relevant? © 2012 IBM Corporation
  • 30. IBM Research - Haifa Entity Name Scoring P(EN|Q) We measure the dependency between the query term(s) and the entity name Globally Measure the proximity between the query term(s) and the entity name in the whole collection • We use pointwise mutual information (PMI) – the likelihood of finding one term in proximity to another term Locally Measure the proximity between the query terms and the entity name in the top retrieved documents P( EN | Q) = ∑ λE X ∈A X N ∑ c∈X EN f NX (c) A = {S , T , O , U , PMI T , PMI O , PMI U } 30 Is This entity Relevant? © 2012 IBM Corporation
  • 31. IBM Research - Haifa Experimental Results over INEX Entity track (2007-2009) Full Independence Sequential dependence 0.4 0.4 0.35 0.35 0.3 0.3 0.25 0.25 2007 2007 0.2 2008 0.2 2008 2009 2009 0.15 0.15 0.1 0.1 0.05 0.05 0 0 S(ED) S(ED,ET) S(ED,ET,EN) INEX top S(ED) S(ED,ET) S(ED,ET,EN) INEX top Results are improved significantly Full dependence 0.4 when type and name scoring were 0.35 added 0.3 0.25 2007 0.2 2008 2009 Final Results are superior to top INEX 0.15 0.1 results at 2007,2008, and comparable 0.05 to 2009 0 S(ED) S(ED,ET) S(ED,ET,EN) INEX top Dependence models have not improved over Independence model?? 31 Is This entity Relevant? © 2012 IBM Corporation
  • 32. IBM Research - Haifa Exploratory EoS When only an entity is given as input, the information need is quite fuzzy Any related entity has a potential to be relevant Therefore any related entity should be retrieved! High diversity in search results (entity types, relationship types) How can we ease the user to find the most relevant answers? Iterative IR – let the user navigate and explore the ER graph Facet search: Categorize the search results according to their facets (entity types/attributes..) Let the user drill down: restrict retrieved entities to a specific facet NOTE: We still need to rank the search results in each of the facets! Graph navigation: Let the user explore the graph by using a retrieved entity as a pivot to a new search Query reformulation 32 Is This entity Relevant? © 2012 IBM Corporation
  • 33. IBM Research - Haifa Search over Social Media Data (SaND) – (Carmel 2009, Guy 2010) SaND provides social aggregation over social data SaND builds an entity-entity relationship matrix that maps a given entity to all related entities, weighted by their relationship strength Direct relations of a user to: document – as an author, tagger and commenter another user – as a friend or as a manager/employee tag – she used, or tagged by others group –as a member/owner Indirect relations: Two entities are indirectly related if both are directly related to the same entity The overall relationship strength between two entities is determined by a linear combination of their direct and indirect relationship strengths 33 Is This entity Relevant? © 2012 IBM Corporation
  • 34. IBM Research - Haifa Search for the term ‘social’ Related People – Ranked list of people that are related to the topic and to the result set, in one or more relationship types (author, commenter, tagger, etc.) Results contain different types of entities – Blogs, Communities, bookmarked documents etc.. Popular, higher ranked results Related Tags – Ranked tag cloud for appear higher in the result set. this result set. 34 Is This entity Relevant? © 2012 IBM Corporation
  • 35. IBM Research - Haifa Narrowing the search to Luis Suarez’ related results Hovering over a result, highlights the related people and tags 35 Is This entity Relevant? © 2012 IBM Corporation
  • 36. IBM Research - Haifa Viewing results for query ‘social’ and person ‘Luis Suarez’ Viewing Luis’ business card, and results related to him 36 Is This entity Relevant? © 2012 IBM Corporation
  • 37. IBM Research - Haifa Summary In this talk we raised several questions related to the concept of relevance in EoS: What makes an entity relevant to the user’s need? What is the meaning of relevance in this context? Is it the same notion of relevance used in document retrieval? We argue that the relevance of an entity can be estimated, according to supportive evidences provided by the search system We talked on EoS common retrieval techniques: Profile based approach The Voting approach Relevance propagation We discussed several examples of EoS systems and how relevance estimation can be applied in these domains We claimed that the scale and diversity of EoS search results demand Exploratory search techniques such as Facet search and Graph navigation 37 Is This entity Relevant? © 2012 IBM Corporation
  • 38. IBM Research - Haifa Open Questions and Challenges Entity Similarity While in document retrieval similarity plays a central role in relevance judgment, entity similarity measurement should still be better understood Attribute based similarity, Evidence based similarity Graph proximity Hybrid approaches The clustering hypothesis: Are two “similar” entities likely being relevant to the same information need? Challenges to what extent relevant entities are indeed similar to each other and according to which similarity measurement Relevance propagation: What relationship types provide effective relevance propagation channels? Do your friends inherit your own expertise? Which relationship types contribute to relevance propagation? 38 Is This entity Relevant? © 2012 IBM Corporation
  • 39. IBM Research - Haifa Thank You! Questions? 39 Is This entity Relevant? © 2012 IBM Corporation
  • 40. Is This Entity Relevant to Your Needs? David Carmel IBM Research - Haifa, Israel IBM Research - Haifa © 2012 IBM Corporation