SlideShare a Scribd company logo
1 of 29
Download to read offline
Dr. Searcher and Mr. Browser:
       A unified hyperlink-click graph
Barbara Poblete, Carlos Castillo, Aristides Gionis
    University Pompeu Fabra, Yahoo! Research
                                Barcelona, Spain
                                      CIKM 2008
In this talk




       New unified web graph: hyperlink-click graph
       Union of the hyperlink graph and the click graph
       Random walk generates robust results that match or are
       better than the results obtained on either individual graph
Motivation



     Two types of web graphs: click and hyperlink graph
     Represent two most common tasks of users:
     searching and browsing
     Correspond to two prototypical ways of looking for
     information: asking and exploring
     Edges capture semantic relations among nodes:
     e.g., similarity and authority endorsement
Motivation...


  Drawbacks:
    § Hyperlink graph: adversarial increase in PageRank scores,
      i.e., spam or link farms
    § Click graph: sparsity, inherent bias in web-search engine
      ranking, dependency on textual match and click spam

  Our claim: Hyperlink and click graphs are complementary and
  can be used to alleviate the shortcomings of each other
Motivation...


  Drawbacks:
    § Hyperlink graph: adversarial increase in PageRank scores,
      i.e., spam or link farms
    § Click graph: sparsity, inherent bias in web-search engine
      ranking, dependency on textual match and click spam

  Our claim: Hyperlink and click graphs are complementary and
  can be used to alleviate the shortcomings of each other
Contribution

  Introduce the hyperlink-click graph,
  union of hyperlink and click graph
  Consider random walk on the hyperlink-click graph

    1   Model user behavior more accurately
    2   Show that ranking with the hyperlink-click graph is similar
        to the best performance by either of the two graphs
    3   The unified graph compensates where either graph
        performs poorly
    4   more robust and fail-safe
Applications



  Uses of the hyperlink-click graph:

      Ranking of documents
      Query ranking and query recommendation
      Similarity search
      Spam detection
Web graphs
                               hyperlink-click graph


    clicked                     clicked     neighbor documents
    queries                   documents




              frequency(q)




                click graph

                                            hyperlink graph
Random walks on web graphs


  Random walk on the hyperlink graph
      PH = αNH + (1 − α)1H
  Random walk on the click graph
      PC = αNC + (1 − α)1C
  Random walk on the hyperlink-click graph
      PHC = αβNC + α(1 − β)NH + (1 − α)1
Random walks on web graphs


  Random walk on the hyperlink graph
      PH = αNH + (1 − α)1H
  Random walk on the click graph
      PC = αNC + (1 − α)1C
  Random walk on the hyperlink-click graph
      PHC = αβNC + α(1 − β)NH + (1 − α)1
Random walks on web graphs


  Random walk on the hyperlink graph
      PH = αNH + (1 − α)1H
  Random walk on the click graph
      PC = αNC + (1 − α)1C
  Random walk on the hyperlink-click graph
      PHC = αβNC + α(1 − β)NH + (1 − α)1
Web graphs
                               hyperlink-click graph


    clicked                     clicked     neighbor documents
    queries                   documents




              frequency(q)




                click graph

                                            hyperlink graph
Evaluation and results




      Validate utility of the random walk scores on the
      hyperlink-click graph
      Compare scores with those produced from the hyperlink
      and the click graph
Evaluation and results...



   We focus on two tasks in which a good ranking method
   should perform well:

       ranking high-quality documents and
       ranking pairs of documents

   The evaluation is centered on analyzing the dissimilarities
   among the different models
Evaluation and results...


   Dataset:
       Yahoo! query log
       9 K seed documents extracted from the query log
       - documents with at least 10 clicks
       61 K queries with at least one click to the seed documents
       Using a web crawl expanded to 144 million documents
       The expansion considers all in- and out-neighbors of all
       documents in the seed set seed set.
Evaluation and results...




   Two types of data:
       Without sponsored results: query log only with
       algorithmic results
       With sponsored results:
Evaluation and results...


   Random walk evaluation
      we compute scores on the complete datasets, but
      we evaluate only on documents in the intersection of the
      three graphs
      Two tasks:
        1   ranking high-quality documents (dmoz),
        2   ranking pairs of documents (user evaluation).
Evaluation and results...


   dmoz:
   ΠZ : Our first measure is the normalized sum of the π scores
        of DZ documents.

                                   d∈DZπ(d)
                          ΠZ =
                                  d∈DC π(d)

   ΓZ : Goodman-Kruskal Gamma Γ measure between the
        rankings
                               D −A
                           Γ=
                               D +A
Evaluation and results...


   User study:

       13 users
       1 710 accessments
       users expressed not neutral opinion in 32% of document
       pairs
       Use Γ to measure pairs of documents on which rankings
       agree with users
Evaluation and results...


   dmoz:
      Macro-evaluation: captures the overall scores of
      high-quality documents
       ΠZ and ΓZ are computed considering all the documents in
       the evaluation set
       Micro-evaluation: at query level
       ΠZ and ΓZ in the evaluation set are reduced to only those
       documents clicked from a particular query.

   (User study: micro-evaluation)
Evaluation and results...


   dmoz:
      Macro-evaluation: captures the overall scores of
      high-quality documents
       ΠZ and ΓZ are computed considering all the documents in
       the evaluation set
       Micro-evaluation: at query level
       ΠZ and ΓZ in the evaluation set are reduced to only those
       documents clicked from a particular query.

   (User study: micro-evaluation)
Evaluation and results...


   dmoz
     metric                        macro
              without sponsored results with sponsored results
       ΓZ         GC ≈ GHC > GH           GC > GHC > GH
       ΠZ         GH ≈ GHC > GC           GH ≈ GHC > GC

     metric                        micro
              without sponsored results with sponsored results
       ΓZ         GC ≈ GHC > GH           GHC ≈ GC > GH
       ΠZ         GHC ≈ GC > GH           GHC ≈ GH ≈ GC
Evaluation and results...




   User study

      metric    without sponsored results   with sponsored results
        Γ           GC > GHC > GH             GH > GHC > GC
Evaluation and results...
   Exclude documents with very similar scores
Evaluation and results...

         Summary of ΓZ values for dmoz:
         1.000
                                                                                                   click graph
                                                                                                   hyperlink-click graph
                                                                                                   hyperlink graph
         0.800




         0.600




         0.400
 value




         0.200




         0.000




         -0.200
                  Macro (with variations)   Macro (no variations)   Micro (with variations)   Micro (no variations)
Evaluation and results...

         Summary of ΠZ values for dmoz:
         0.900
                                                                                                  click graph
                                                                                                  hyperlink-click graph
         0.800                                                                                    hyperlink graph




         0.700



         0.600
 value




         0.500



         0.400



         0.300



         0.200
                 Macro (with variations)   Macro (no variations)   Micro (with variations)   Micro (no variations)
Conclusions



      We study random walk on a unified web graph
      Intended to model user searching and browsing behavior
      Several combinations of metrics are used for evaluation

  Experimental evaluation shows:
      The unified graph is always close to the best performance
      of either the click or the hyperlink graph
Future work



     Analyze how to deal with the inherent bias that exists in
     any ranking technique based on usage mining
     Study other web mining applications
       1   Link and click spam detection
       2   Similarity search
       3   Query recommendation
Thank you!

More Related Content

Viewers also liked

Kdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iKdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iLaks Lakshmanan
 
Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivLaks Lakshmanan
 
Kdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiKdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiLaks Lakshmanan
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaMuhammad Imran
 
What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...Carlos Castillo (ChaTo)
 
Emotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of WikipediaEmotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of WikipediaDavid Laniado
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiLaks Lakshmanan
 
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...Artificial Intelligence Institute at UofSC
 

Viewers also liked (12)

Kdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iKdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-i
 
Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-iv
 
Kdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiKdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-ii
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social Media
 
What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...What to Expect When the Unexpected Happens: Social Media Communications Acros...
What to Expect When the Unexpected Happens: Social Media Communications Acros...
 
Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Crisis Computing
Crisis ComputingCrisis Computing
Crisis Computing
 
Emotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of WikipediaEmotions and dialogue in a peer-production community: the case of Wikipedia
Emotions and dialogue in a peer-production community: the case of Wikipedia
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iii
 
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
 
Social Media Mining and Retrieval
Social Media Mining and RetrievalSocial Media Mining and Retrieval
Social Media Mining and Retrieval
 
Discrimination Discovery
Discrimination DiscoveryDiscrimination Discovery
Discrimination Discovery
 

Similar to Dr. Searcher and Mr. Browser: A unified hyperlink-click graph

Web Crawling and Reinforcement Learning
Web Crawling and Reinforcement LearningWeb Crawling and Reinforcement Learning
Web Crawling and Reinforcement LearningFrancesco Gadaleta
 
Web Page Ranking using Machine Learning
Web Page Ranking using Machine LearningWeb Page Ranking using Machine Learning
Web Page Ranking using Machine LearningPradip Rahul
 
page rank explication et exemple formule
page rank explication et exemple  formulepage rank explication et exemple  formule
page rank explication et exemple formuleRamiHarrathi1
 
Markov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdfMarkov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdfrayyverma
 
Optimized interleaving for online retrieval evaluation
Optimized interleaving for online retrieval evaluationOptimized interleaving for online retrieval evaluation
Optimized interleaving for online retrieval evaluationHan Jiang
 
Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Margaret Wang
 
Linkanalysis handout
Linkanalysis handoutLinkanalysis handout
Linkanalysis handoutcsedays
 
Page rank talk at NTU-EE
Page rank talk at NTU-EEPage rank talk at NTU-EE
Page rank talk at NTU-EEPing Yeh
 
Search engine page rank demystification
Search engine page rank demystificationSearch engine page rank demystification
Search engine page rank demystificationRaja R
 
PageRank & Searching
PageRank & SearchingPageRank & Searching
PageRank & Searchingrahulbindra
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsWQ Fan
 
Click Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval MetricsClick Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval MetricsAleksandr Chuklin
 
Page rank by university of michagain.ppt
Page rank by university of michagain.pptPage rank by university of michagain.ppt
Page rank by university of michagain.pptrayyverma
 
Internet 信息检索中的数学
Internet 信息检索中的数学Internet 信息检索中的数学
Internet 信息检索中的数学Xu jiakon
 

Similar to Dr. Searcher and Mr. Browser: A unified hyperlink-click graph (20)

Web Crawling and Reinforcement Learning
Web Crawling and Reinforcement LearningWeb Crawling and Reinforcement Learning
Web Crawling and Reinforcement Learning
 
Web Page Ranking using Machine Learning
Web Page Ranking using Machine LearningWeb Page Ranking using Machine Learning
Web Page Ranking using Machine Learning
 
page rank explication et exemple formule
page rank explication et exemple  formulepage rank explication et exemple  formule
page rank explication et exemple formule
 
Markov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdfMarkov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdf
 
Optimized interleaving for online retrieval evaluation
Optimized interleaving for online retrieval evaluationOptimized interleaving for online retrieval evaluation
Optimized interleaving for online retrieval evaluation
 
Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461
 
Linkanalysis handout
Linkanalysis handoutLinkanalysis handout
Linkanalysis handout
 
Social (1)
Social (1)Social (1)
Social (1)
 
Page rank talk at NTU-EE
Page rank talk at NTU-EEPage rank talk at NTU-EE
Page rank talk at NTU-EE
 
Dm page rank
Dm page rankDm page rank
Dm page rank
 
Search engine page rank demystification
Search engine page rank demystificationSearch engine page rank demystification
Search engine page rank demystification
 
PageRank & Searching
PageRank & SearchingPageRank & Searching
PageRank & Searching
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for Recommendations
 
Page rank2
Page rank2Page rank2
Page rank2
 
GitConnect
GitConnectGitConnect
GitConnect
 
Click Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval MetricsClick Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval Metrics
 
Mazhiming
MazhimingMazhiming
Mazhiming
 
Page rank by university of michagain.ppt
Page rank by university of michagain.pptPage rank by university of michagain.ppt
Page rank by university of michagain.ppt
 
Internet 信息检索中的数学
Internet 信息检索中的数学Internet 信息检索中的数学
Internet 信息检索中的数学
 
I04015559
I04015559I04015559
I04015559
 

More from Carlos Castillo (ChaTo)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social MediaCarlos Castillo (ChaTo)
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Carlos Castillo (ChaTo)
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Carlos Castillo (ChaTo)
 

More from Carlos Castillo (ChaTo) (20)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social Media
 
When no clicks are good news
When no clicks are good newsWhen no clicks are good news
When no clicks are good news
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Link prediction
Link predictionLink prediction
Link prediction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 

Recently uploaded

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

Dr. Searcher and Mr. Browser: A unified hyperlink-click graph

  • 1. Dr. Searcher and Mr. Browser: A unified hyperlink-click graph Barbara Poblete, Carlos Castillo, Aristides Gionis University Pompeu Fabra, Yahoo! Research Barcelona, Spain CIKM 2008
  • 2. In this talk New unified web graph: hyperlink-click graph Union of the hyperlink graph and the click graph Random walk generates robust results that match or are better than the results obtained on either individual graph
  • 3. Motivation Two types of web graphs: click and hyperlink graph Represent two most common tasks of users: searching and browsing Correspond to two prototypical ways of looking for information: asking and exploring Edges capture semantic relations among nodes: e.g., similarity and authority endorsement
  • 4. Motivation... Drawbacks: § Hyperlink graph: adversarial increase in PageRank scores, i.e., spam or link farms § Click graph: sparsity, inherent bias in web-search engine ranking, dependency on textual match and click spam Our claim: Hyperlink and click graphs are complementary and can be used to alleviate the shortcomings of each other
  • 5. Motivation... Drawbacks: § Hyperlink graph: adversarial increase in PageRank scores, i.e., spam or link farms § Click graph: sparsity, inherent bias in web-search engine ranking, dependency on textual match and click spam Our claim: Hyperlink and click graphs are complementary and can be used to alleviate the shortcomings of each other
  • 6. Contribution Introduce the hyperlink-click graph, union of hyperlink and click graph Consider random walk on the hyperlink-click graph 1 Model user behavior more accurately 2 Show that ranking with the hyperlink-click graph is similar to the best performance by either of the two graphs 3 The unified graph compensates where either graph performs poorly 4 more robust and fail-safe
  • 7. Applications Uses of the hyperlink-click graph: Ranking of documents Query ranking and query recommendation Similarity search Spam detection
  • 8. Web graphs hyperlink-click graph clicked clicked neighbor documents queries documents frequency(q) click graph hyperlink graph
  • 9. Random walks on web graphs Random walk on the hyperlink graph PH = αNH + (1 − α)1H Random walk on the click graph PC = αNC + (1 − α)1C Random walk on the hyperlink-click graph PHC = αβNC + α(1 − β)NH + (1 − α)1
  • 10. Random walks on web graphs Random walk on the hyperlink graph PH = αNH + (1 − α)1H Random walk on the click graph PC = αNC + (1 − α)1C Random walk on the hyperlink-click graph PHC = αβNC + α(1 − β)NH + (1 − α)1
  • 11. Random walks on web graphs Random walk on the hyperlink graph PH = αNH + (1 − α)1H Random walk on the click graph PC = αNC + (1 − α)1C Random walk on the hyperlink-click graph PHC = αβNC + α(1 − β)NH + (1 − α)1
  • 12. Web graphs hyperlink-click graph clicked clicked neighbor documents queries documents frequency(q) click graph hyperlink graph
  • 13. Evaluation and results Validate utility of the random walk scores on the hyperlink-click graph Compare scores with those produced from the hyperlink and the click graph
  • 14. Evaluation and results... We focus on two tasks in which a good ranking method should perform well: ranking high-quality documents and ranking pairs of documents The evaluation is centered on analyzing the dissimilarities among the different models
  • 15. Evaluation and results... Dataset: Yahoo! query log 9 K seed documents extracted from the query log - documents with at least 10 clicks 61 K queries with at least one click to the seed documents Using a web crawl expanded to 144 million documents The expansion considers all in- and out-neighbors of all documents in the seed set seed set.
  • 16. Evaluation and results... Two types of data: Without sponsored results: query log only with algorithmic results With sponsored results:
  • 17. Evaluation and results... Random walk evaluation we compute scores on the complete datasets, but we evaluate only on documents in the intersection of the three graphs Two tasks: 1 ranking high-quality documents (dmoz), 2 ranking pairs of documents (user evaluation).
  • 18. Evaluation and results... dmoz: ΠZ : Our first measure is the normalized sum of the π scores of DZ documents. d∈DZπ(d) ΠZ = d∈DC π(d) ΓZ : Goodman-Kruskal Gamma Γ measure between the rankings D −A Γ= D +A
  • 19. Evaluation and results... User study: 13 users 1 710 accessments users expressed not neutral opinion in 32% of document pairs Use Γ to measure pairs of documents on which rankings agree with users
  • 20. Evaluation and results... dmoz: Macro-evaluation: captures the overall scores of high-quality documents ΠZ and ΓZ are computed considering all the documents in the evaluation set Micro-evaluation: at query level ΠZ and ΓZ in the evaluation set are reduced to only those documents clicked from a particular query. (User study: micro-evaluation)
  • 21. Evaluation and results... dmoz: Macro-evaluation: captures the overall scores of high-quality documents ΠZ and ΓZ are computed considering all the documents in the evaluation set Micro-evaluation: at query level ΠZ and ΓZ in the evaluation set are reduced to only those documents clicked from a particular query. (User study: micro-evaluation)
  • 22. Evaluation and results... dmoz metric macro without sponsored results with sponsored results ΓZ GC ≈ GHC > GH GC > GHC > GH ΠZ GH ≈ GHC > GC GH ≈ GHC > GC metric micro without sponsored results with sponsored results ΓZ GC ≈ GHC > GH GHC ≈ GC > GH ΠZ GHC ≈ GC > GH GHC ≈ GH ≈ GC
  • 23. Evaluation and results... User study metric without sponsored results with sponsored results Γ GC > GHC > GH GH > GHC > GC
  • 24. Evaluation and results... Exclude documents with very similar scores
  • 25. Evaluation and results... Summary of ΓZ values for dmoz: 1.000 click graph hyperlink-click graph hyperlink graph 0.800 0.600 0.400 value 0.200 0.000 -0.200 Macro (with variations) Macro (no variations) Micro (with variations) Micro (no variations)
  • 26. Evaluation and results... Summary of ΠZ values for dmoz: 0.900 click graph hyperlink-click graph 0.800 hyperlink graph 0.700 0.600 value 0.500 0.400 0.300 0.200 Macro (with variations) Macro (no variations) Micro (with variations) Micro (no variations)
  • 27. Conclusions We study random walk on a unified web graph Intended to model user searching and browsing behavior Several combinations of metrics are used for evaluation Experimental evaluation shows: The unified graph is always close to the best performance of either the click or the hyperlink graph
  • 28. Future work Analyze how to deal with the inherent bias that exists in any ranking technique based on usage mining Study other web mining applications 1 Link and click spam detection 2 Similarity search 3 Query recommendation