Presentation @SIGIR2015

Local Ranking Problem
Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco
on the BrowseGraph
1

“when the centrality-like
rank computed on a local
graph differ from the ones
on the global graph”
0.4
0.6
0.5
0.1
0.2
0.3
0.01
0.01
0.1
- Bressan et al. in WWW 2013, “The Power of Local Information in PageRank” 
- Bar-Yossef and Mashiach in CIKM 2008, “Local Approximation of PageRank and
reverse PageRank” 
- Chen et al. in CIKM 2004, “Local Methods for Estimating PageRank Values”
0.4
0.6
0.5
0.1
0.2
0.3
0.01
0.01
0.1
0.3
0.6
0.3
0.3
0.2
0.4
0.3
0.6
0.2
2

The BrowseGraph
user session
BrowseGraph
3
“a graph where nodes are webpages  
and edges are browsing transitions”
user navigation 
(e.g. Flickr)
construction

Centrality Metrics applied to
the BrowseGraph
Increasing popularity in recent years 
- Chiarandini et al. in ICWSM 2013, “Leveraging browsing patterns for topic
discovery and photostream recommendation” 
- Trevisiol et al. in SIGIR 2012, “Image ranking based on user browsing
behavior” 
- Liu et al. in CIKM 2011, “User browsing behavior-driven web crawling”
Provide higher-quality rankings  
compared to standard hyperlinks graphs 
- Y. Liu et al. in SIGIR 2008, “Browserank: letting web users vote for page
importance.”
4

on the BrowseGraph
WHY?
5

on the BrowseGraph
WHY?
Image Ranking in Flickr in SIGIR 2012

We compared different ranking approaches on the BrowseGraph
(PageRank and BrowseRank among others)
How much our rank could
vary having more
information (i.e. nodes)?
6

BrowseGraph and ReferrerGraphs
ReferrerGraphs: Domain-dependent Browse Graph
Construct different
BrowseGraphs based  
on the referrer domain
Recommend news articles
following the ReferrerGraphs
BrowseGraph
Twitter ReferrerGraph
Facebook ReferrerGraph
7
Can we rely on  
centrality-based algorithms
to infer news importance?

on the BrowseGraph
Study of the LRP on the BrowseGraph by incrementally
expand the local graph (“Growing Rings” experiment)
How to estimate the “distance” between the local and
global PageRank exploiting the structural properties of the
local graph
Discover the referrer domain when it is not available  
(not discussed in the presentation—please see the paper)
8

Social Networks Search Engines
News
Homepage
Yahoo News
BrowseGraph
~500M pageviews
Local Ranking Problem on the BrowseGraph
1. Construct the BrowseGraph (our “global graph”)
2. Construct the ReferrerGraphs (our “local graphs”)
9

Very different dimensions
Subgraph Comparison
Very well connected  
(also Reddit—the smallest one)
10

Cross-distance Kendall-tau among common nodes (min overlap 1k)
In general the similarities are very low (<0.3) 
~different content or different users’ interest
Search engines are the most similar (>0.5)
Subgraph Comparison
11

1. For each ReferrerGraph
2. Compare the PageRank values with the
global one (Kendall-tau)
3. Expand with the next neighborhood of
nodes
4. Iterate till the convergence is closer to 1
Growing Rings Experiment
Study of the LRP on the BrowseGraph  
by incrementally expand the local graph
K(local+0, global) ~0.307
12

Referrer-based (RB) : the 7 ReferrerGraphs
(Facebook, Twitter, Reddit, Homepage, Yahoo, Google, Bing)
13
Same size referrer-based (SRB) to measure the
impact of the graph size
Random (R) : 7 random graphs reﬂecting the
size of the original RB graphs

14
ReferrerGraphs

15
same size RGs RandomReferrerGraphs

Hypothesis 1 : adding all the nodes mean to
add more information, therefore it should lead to
a faster convergence (Boldi et al. [6] in the paper)
Hypothesis 2 : the most representative nodes
bring less noise and therefore a quicker
convergence (Cho et al. [13] in the paper)
How does the expansion inﬂuences
convergence if only few more
representative nodes are selected ?
Growing Rings Experiment with Selection of Nodes
16

Growing Rings Experiment with Selection of Nodes
• 5
• 10
• 30
• 50
• 100
• 100
• 50
• 30
• 10
• 5
fewer more representative nodes
lead to a better estimation of
PageRank values in the ﬁrst
iteration
in the long run, expansions with
the highest number of nodes
present the best convergence
17

Growing Rings Expansion
..with Selected Nodes
~1 or 2 steps can be enough
to estimate the PageRank
score of the global graph
Predicting Kendall-tau Distance
Can we estimate the “distance”
between the local and global PageRank
only considering information available
in the local graph ?
18

Hypothesis : some structural properties of the
graph could be a good proxies for the tau value
difference between local and global ranks.
Can we estimate the distance 
19

Training Set Construction
ReferrerGraph
Jackknife resampling  
(1%, 5%, 10%, 20%)
homepage
Kendall-tau distance 
between ReferrerGraph 
and reduced subgraphs
20

Size and Connectivity (S) : basic statistics
Assortativity (A) : tendency of node with a certain degree to be
linked with nodes with similar degree
Degree (D) : statistics on the degree distribution
Weighted degree (W) : same as degree but considering the
weight on edges (transitions)
Local PageRank (P) : stats on the PageRank values
Closeness centralization (C) : statistics on the distance (no hops)
• A. Barrat et al. in Cambridge Univ. Press 2008, “Dynamical Processes on Complex Networks”
• S. Wasserman and K. Faust in Cambridge Univ. Press 1994, “Social Network Analysis: Methods and Applications”
We compute 62
structural graphs
metrics for each
training instance
Extract Structural Properties of each Graph
21

Regression Analysis (RF) in a ﬁve-fold CV over 10 iterations
weighted degree : most predictive features
~better than using all the features
assortativity : less predictive power  
~too many features and too little training data?
22

Most importance features in weighted degree :
features based on the distribution
of in- and out- degree:
very straightforward to compute
information alway available in the
local graph
23

YES. 
With just few structural properties
features of the of the local graph.
Can we estimate the distance 
24

Summary
How the LRP behaves on the BrowseGraph:
expanding the local graph with the whole
neighborhoods (“Growing Rings” experiment)
or with the most representative nodes 
(“Growing Rings with Selection of Nodes”)
It is possible to estimate the “distance” between the local
and global PageRank exploiting the structural properties of
the local graph
25

Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco
on the BrowseGraph
26
Thanks.

Presentation @SIGIR2015

Recommended

Recommended

More Related Content

Similar to Presentation @SIGIR2015

Similar to Presentation @SIGIR2015 (20)

Recently uploaded

Recently uploaded (20)

Presentation @SIGIR2015