SlideShare a Scribd company logo
1 of 26
Download to read offline
Local Ranking Problem
Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco
on the BrowseGraph
1
“when the centrality-like
rank computed on a local
graph differ from the ones
on the global graph”
0.4
0.6
0.5
0.1
0.2
0.3
0.01
0.01
0.1
Local Ranking Problem
- Bressan et al. in WWW 2013, “The Power of Local Information in PageRank”

- Bar-Yossef and Mashiach in CIKM 2008, “Local Approximation of PageRank and
reverse PageRank”

- Chen et al. in CIKM 2004, “Local Methods for Estimating PageRank Values”
0.4
0.6
0.5
0.1
0.2
0.3
0.01
0.01
0.1
0.3
0.6
0.3
0.3
0.2
0.4
0.3
0.6
0.2
2
The BrowseGraph
user session
BrowseGraph
3
“a graph where nodes are webpages 

and edges are browsing transitions”
user navigation

(e.g. Flickr)
construction
Centrality Metrics applied to
the BrowseGraph
Increasing popularity in recent years

- Chiarandini et al. in ICWSM 2013, “Leveraging browsing patterns for topic
discovery and photostream recommendation”

- Trevisiol et al. in SIGIR 2012, “Image ranking based on user browsing
behavior”

- Liu et al. in CIKM 2011, “User browsing behavior-driven web crawling”
Provide higher-quality rankings 

compared to standard hyperlinks graphs

- Y. Liu et al. in SIGIR 2008, “Browserank: letting web users vote for page
importance.”
4
Local Ranking Problem
on the BrowseGraph
WHY?
5
Local Ranking Problem
on the BrowseGraph
WHY?
Image Ranking in Flickr in SIGIR 2012

We compared different ranking approaches on the BrowseGraph
(PageRank and BrowseRank among others)
How much our rank could
vary having more
information (i.e. nodes)?
6
BrowseGraph and ReferrerGraphs
ReferrerGraphs: Domain-dependent Browse Graph
Construct different
BrowseGraphs based 

on the referrer domain
Recommend news articles
following the ReferrerGraphs
BrowseGraph
Twitter ReferrerGraph
Facebook ReferrerGraph
7
Can we rely on 

centrality-based algorithms
to infer news importance?
Local Ranking Problem
on the BrowseGraph
Study of the LRP on the BrowseGraph by incrementally
expand the local graph (“Growing Rings” experiment)
How to estimate the “distance” between the local and
global PageRank exploiting the structural properties of the
local graph
Discover the referrer domain when it is not available 

(not discussed in the presentation—please see the paper)
8
Social Networks Search Engines
News
Homepage
Yahoo News
BrowseGraph
~500M pageviews
Local Ranking Problem on the BrowseGraph
1. Construct the BrowseGraph (our “global graph”)
2. Construct the ReferrerGraphs (our “local graphs”)
9
Very different dimensions
Subgraph Comparison
Very well connected 

(also Reddit—the smallest one)
10
Cross-distance Kendall-tau among common nodes (min overlap 1k)
In general the similarities are very low (<0.3)

~different content or different users’ interest
Search engines are the most similar (>0.5)
Subgraph Comparison
11
1. For each ReferrerGraph
2. Compare the PageRank values with the
global one (Kendall-tau)
3. Expand with the next neighborhood of
nodes
4. Iterate till the convergence is closer to 1
Growing Rings Experiment
Study of the LRP on the BrowseGraph 

by incrementally expand the local graph
K(local+0, global) ~0.307
K(local+1, global) ~0.524
K(local+2, global) ~0.740
K(local+3, global) ~0.912
12
Referrer-based (RB) : the 7 ReferrerGraphs
(Facebook, Twitter, Reddit, Homepage, Yahoo, Google, Bing)
Growing Rings Experiment
13
Same size referrer-based (SRB) to measure the
impact of the graph size
Random (R) : 7 random graphs reflecting the
size of the original RB graphs
Growing Rings Experiment
14
ReferrerGraphs
Growing Rings Experiment
15
same size RGs RandomReferrerGraphs
Hypothesis 1 : adding all the nodes mean to
add more information, therefore it should lead to
a faster convergence (Boldi et al. [6] in the paper)
Hypothesis 2 : the most representative nodes
bring less noise and therefore a quicker
convergence (Cho et al. [13] in the paper)
How does the expansion influences
convergence if only few more
representative nodes are selected ?
Growing Rings Experiment with Selection of Nodes
16
Growing Rings Experiment with Selection of Nodes
• 5
• 10
• 30
• 50
• 100
• 100
• 50
• 30
• 10
• 5
fewer more representative nodes
lead to a better estimation of
PageRank values in the first
iteration
in the long run, expansions with
the highest number of nodes
present the best convergence
17
Growing Rings Expansion
..with Selected Nodes
~1 or 2 steps can be enough
to estimate the PageRank
score of the global graph
Predicting Kendall-tau Distance
Can we estimate the “distance”
between the local and global PageRank
only considering information available
in the local graph ?
18
Hypothesis : some structural properties of the
graph could be a good proxies for the tau value
difference between local and global ranks.
Predicting Kendall-tau Distance
Can we estimate the distance

between the local and global PageRank
only considering information available
in the local graph ?
19
Training Set Construction
Predicting Kendall-tau Distance
ReferrerGraph
Jackknife resampling 

(1%, 5%, 10%, 20%)
homepage
Kendall-tau distance

between ReferrerGraph

and reduced subgraphs
20
Size and Connectivity (S) : basic statistics
Assortativity (A) : tendency of node with a certain degree to be
linked with nodes with similar degree
Degree (D) : statistics on the degree distribution
Weighted degree (W) : same as degree but considering the
weight on edges (transitions)
Local PageRank (P) : stats on the PageRank values
Closeness centralization (C) : statistics on the distance (no hops)
• A. Barrat et al. in Cambridge Univ. Press 2008, “Dynamical Processes on Complex Networks”
• S. Wasserman and K. Faust in Cambridge Univ. Press 1994, “Social Network Analysis: Methods and Applications”
Predicting Kendall-tau Distance
We compute 62
structural graphs
metrics for each
training instance
Extract Structural Properties of each Graph
21
Regression Analysis (RF) in a five-fold CV over 10 iterations
weighted degree : most predictive features
~better than using all the features
assortativity : less predictive power 

~too many features and too little training data?
22
Predicting Kendall-tau Distance
Predicting Kendall-tau Distance
Most importance features in weighted degree :
features based on the distribution
of in- and out- degree:
very straightforward to compute
information alway available in the
local graph
23
YES.

With just few structural properties
features of the of the local graph.
Predicting Kendall-tau Distance
Can we estimate the distance

between the local and global PageRank
only considering information available
in the local graph ?
24
Summary
How the LRP behaves on the BrowseGraph:
expanding the local graph with the whole
neighborhoods (“Growing Rings” experiment)
or with the most representative nodes

(“Growing Rings with Selection of Nodes”)
It is possible to estimate the “distance” between the local
and global PageRank exploiting the structural properties of
the local graph
25
Local Ranking Problem
Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco
on the BrowseGraph
26
Thanks.

More Related Content

Similar to Presentation @SIGIR2015

IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdfIGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
noureddinebassa1
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
Ioan Toma
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Gezim Sejdiu
 

Similar to Presentation @SIGIR2015 (20)

IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdfIGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
 
Analysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAnalysis of different similarity measures: Simrank
Analysis of different similarity measures: Simrank
 
Ranking spatial data by quality preferences ppt
Ranking spatial data by quality preferences  pptRanking spatial data by quality preferences  ppt
Ranking spatial data by quality preferences ppt
 
Manos
ManosManos
Manos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
 
Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...
 
IEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al Jawarneh
 
Multi Valued Vectors Lucene
Multi Valued Vectors LuceneMulti Valued Vectors Lucene
Multi Valued Vectors Lucene
 
Introducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneIntroducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache Lucene
 
GraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceGraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data Science
 
How Graph Technology is Changing AI
How Graph Technology is Changing AIHow Graph Technology is Changing AI
How Graph Technology is Changing AI
 
Comparison of papers NN-filter
Comparison of papers NN-filterComparison of papers NN-filter
Comparison of papers NN-filter
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POI
 
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
 
How Graphs are Changing AI
How Graphs are Changing AIHow Graphs are Changing AI
How Graphs are Changing AI
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Presentation @SIGIR2015

  • 1. Local Ranking Problem Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco on the BrowseGraph 1
  • 2. “when the centrality-like rank computed on a local graph differ from the ones on the global graph” 0.4 0.6 0.5 0.1 0.2 0.3 0.01 0.01 0.1 Local Ranking Problem - Bressan et al. in WWW 2013, “The Power of Local Information in PageRank”
 - Bar-Yossef and Mashiach in CIKM 2008, “Local Approximation of PageRank and reverse PageRank”
 - Chen et al. in CIKM 2004, “Local Methods for Estimating PageRank Values” 0.4 0.6 0.5 0.1 0.2 0.3 0.01 0.01 0.1 0.3 0.6 0.3 0.3 0.2 0.4 0.3 0.6 0.2 2
  • 3. The BrowseGraph user session BrowseGraph 3 “a graph where nodes are webpages 
 and edges are browsing transitions” user navigation
 (e.g. Flickr) construction
  • 4. Centrality Metrics applied to the BrowseGraph Increasing popularity in recent years
 - Chiarandini et al. in ICWSM 2013, “Leveraging browsing patterns for topic discovery and photostream recommendation”
 - Trevisiol et al. in SIGIR 2012, “Image ranking based on user browsing behavior”
 - Liu et al. in CIKM 2011, “User browsing behavior-driven web crawling” Provide higher-quality rankings 
 compared to standard hyperlinks graphs
 - Y. Liu et al. in SIGIR 2008, “Browserank: letting web users vote for page importance.” 4
  • 5. Local Ranking Problem on the BrowseGraph WHY? 5
  • 6. Local Ranking Problem on the BrowseGraph WHY? Image Ranking in Flickr in SIGIR 2012 We compared different ranking approaches on the BrowseGraph (PageRank and BrowseRank among others) How much our rank could vary having more information (i.e. nodes)? 6
  • 7. BrowseGraph and ReferrerGraphs ReferrerGraphs: Domain-dependent Browse Graph Construct different BrowseGraphs based 
 on the referrer domain Recommend news articles following the ReferrerGraphs BrowseGraph Twitter ReferrerGraph Facebook ReferrerGraph 7 Can we rely on 
 centrality-based algorithms to infer news importance?
  • 8. Local Ranking Problem on the BrowseGraph Study of the LRP on the BrowseGraph by incrementally expand the local graph (“Growing Rings” experiment) How to estimate the “distance” between the local and global PageRank exploiting the structural properties of the local graph Discover the referrer domain when it is not available 
 (not discussed in the presentation—please see the paper) 8
  • 9. Social Networks Search Engines News Homepage Yahoo News BrowseGraph ~500M pageviews Local Ranking Problem on the BrowseGraph 1. Construct the BrowseGraph (our “global graph”) 2. Construct the ReferrerGraphs (our “local graphs”) 9
  • 10. Very different dimensions Subgraph Comparison Very well connected 
 (also Reddit—the smallest one) 10
  • 11. Cross-distance Kendall-tau among common nodes (min overlap 1k) In general the similarities are very low (<0.3)
 ~different content or different users’ interest Search engines are the most similar (>0.5) Subgraph Comparison 11
  • 12. 1. For each ReferrerGraph 2. Compare the PageRank values with the global one (Kendall-tau) 3. Expand with the next neighborhood of nodes 4. Iterate till the convergence is closer to 1 Growing Rings Experiment Study of the LRP on the BrowseGraph 
 by incrementally expand the local graph K(local+0, global) ~0.307 K(local+1, global) ~0.524 K(local+2, global) ~0.740 K(local+3, global) ~0.912 12
  • 13. Referrer-based (RB) : the 7 ReferrerGraphs (Facebook, Twitter, Reddit, Homepage, Yahoo, Google, Bing) Growing Rings Experiment 13 Same size referrer-based (SRB) to measure the impact of the graph size Random (R) : 7 random graphs reflecting the size of the original RB graphs
  • 15. Growing Rings Experiment 15 same size RGs RandomReferrerGraphs
  • 16. Hypothesis 1 : adding all the nodes mean to add more information, therefore it should lead to a faster convergence (Boldi et al. [6] in the paper) Hypothesis 2 : the most representative nodes bring less noise and therefore a quicker convergence (Cho et al. [13] in the paper) How does the expansion influences convergence if only few more representative nodes are selected ? Growing Rings Experiment with Selection of Nodes 16
  • 17. Growing Rings Experiment with Selection of Nodes • 5 • 10 • 30 • 50 • 100 • 100 • 50 • 30 • 10 • 5 fewer more representative nodes lead to a better estimation of PageRank values in the first iteration in the long run, expansions with the highest number of nodes present the best convergence 17
  • 18. Growing Rings Expansion ..with Selected Nodes ~1 or 2 steps can be enough to estimate the PageRank score of the global graph Predicting Kendall-tau Distance Can we estimate the “distance” between the local and global PageRank only considering information available in the local graph ? 18
  • 19. Hypothesis : some structural properties of the graph could be a good proxies for the tau value difference between local and global ranks. Predicting Kendall-tau Distance Can we estimate the distance
 between the local and global PageRank only considering information available in the local graph ? 19
  • 20. Training Set Construction Predicting Kendall-tau Distance ReferrerGraph Jackknife resampling 
 (1%, 5%, 10%, 20%) homepage Kendall-tau distance
 between ReferrerGraph
 and reduced subgraphs 20
  • 21. Size and Connectivity (S) : basic statistics Assortativity (A) : tendency of node with a certain degree to be linked with nodes with similar degree Degree (D) : statistics on the degree distribution Weighted degree (W) : same as degree but considering the weight on edges (transitions) Local PageRank (P) : stats on the PageRank values Closeness centralization (C) : statistics on the distance (no hops) • A. Barrat et al. in Cambridge Univ. Press 2008, “Dynamical Processes on Complex Networks” • S. Wasserman and K. Faust in Cambridge Univ. Press 1994, “Social Network Analysis: Methods and Applications” Predicting Kendall-tau Distance We compute 62 structural graphs metrics for each training instance Extract Structural Properties of each Graph 21
  • 22. Regression Analysis (RF) in a five-fold CV over 10 iterations weighted degree : most predictive features ~better than using all the features assortativity : less predictive power 
 ~too many features and too little training data? 22 Predicting Kendall-tau Distance
  • 23. Predicting Kendall-tau Distance Most importance features in weighted degree : features based on the distribution of in- and out- degree: very straightforward to compute information alway available in the local graph 23
  • 24. YES.
 With just few structural properties features of the of the local graph. Predicting Kendall-tau Distance Can we estimate the distance
 between the local and global PageRank only considering information available in the local graph ? 24
  • 25. Summary How the LRP behaves on the BrowseGraph: expanding the local graph with the whole neighborhoods (“Growing Rings” experiment) or with the most representative nodes
 (“Growing Rings with Selection of Nodes”) It is possible to estimate the “distance” between the local and global PageRank exploiting the structural properties of the local graph 25
  • 26. Local Ranking Problem Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco on the BrowseGraph 26 Thanks.