SlideShare a Scribd company logo
1 of 76
Download to read offline
Prof. Pier Luca Lanzi
Graph Mining!
Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
Prof. Pier Luca Lanzi
References
•  Jure Leskovec, Anand Rajaraman, Jeff Ullman. Mining of Massive
Datasets, Chapter 5 & Chapter 10
•  Book and slides are available from http://www.mmds.org
2
Prof. Pier Luca Lanzi
Facebook social graph
4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna, 2011]
Prof. Pier Luca Lanzi 4
Connections between political blogs
Polarization of the network [Adamic-Glance, 2005]
Prof. Pier Luca Lanzi 5
Citation networks and Maps of science
[Börner et al., 2012]
Prof. Pier Luca Lanzi
I teach a
class on
Networks.
CS224W:
Classes are
in the !
Gates
building
Computer
Science
Department
at Stanford
Stanford
University
Web as a graph: pages are nodes, edges are links
Prof. Pier Luca Lanzi
Web as a graph: pages are nodes, edges are links
Prof. Pier Luca Lanzi
How is the Web Organized?
•  Initial approaches
! Human curated Web directories
! Yahoo, DMOZ, LookSmart
•  Then, Web search
! Information Retrieval investigates:!
Find relevant docs in a small !
and trusted set
! Newspaper articles, Patents, etc.
8
Web is huge, full of untrusted documents,!
random things, web spam, etc.
Prof. Pier Luca Lanzi
Web Search Challenges
•  Web contains many sources of information
! Who should we “trust”?
! Trick: Trustworthy pages may point to each other!
•  What is the “best” answer to query “newspaper”?
! No single right answer
! Trick: Pages that actually know about newspapers!
might all be pointing to many newspapers
9
Prof. Pier Luca Lanzi
Page Rank
Prof. Pier Luca Lanzi
Page Rank Algorithm
•  The underlying idea is to look at links as votes
•  A page is more important if it has more links
! In-coming links? Out-going links?
•  Intuition
! www.stanford.edu has 23,400 in-links
! www.joe-schmoe.com has one in-link
•  Are all in-links are equal?
! Links from important pages count more
! Recursive question!
11
Prof. Pier Luca Lanzi
B
38.4
C
34.3
E
8.1
F
3.9
D
3.9
A
3.3
1.6
1.6 1.6 1.6 1.6
Prof. Pier Luca Lanzi
Simple Recursive Formulation
•  Each link’s vote is proportional!
to the importance of its source page
•  If page j with importance rj has n!
out-links, each link gets rj/n votes
•  Page j’s own importance is!
the sum of the votes on its in-links
13
j
ki
rj/3
rj/3rj/3
rj = ri/3+rk/4
ri/3 rk/4
Prof. Pier Luca Lanzi
The “Flow” Model
•  A “vote” from an important page is worth more
•  A page is important if it is pointed to by other important pages
•  Define a “rank” rj for page j!
!
!
!
!
!
where di is the out-degree of node i
•  “Flow” equations
ry = ry /2 + ra /2
ra = ry /2 + rm
rm = ra /2
14
∑→
=
ji
i
j
r
r
id y
ma
a/2
y/2
a/2
m
y/2
Prof. Pier Luca Lanzi
Solving the Flow Equations
•  The equations, three unknowns variables, no constant
! No unique solution
! All solutions equivalent modulo the scale factor
•  An additional constraint (ry+ra+rm=1) forces uniqueness
•  Gaussian elimination method works for small examples, but we
need a better method for large web-size graphs
15
We need a different formulation that scales up!
Prof. Pier Luca Lanzi
The Matrix Formulation
•  Represent the graph as a transition matrix M
! Suppose page i has di out-links
! If page i is linked to page j Mji is set to 1/di else Mji=0
! M is a “column stochastic matrix” since!
the columns sum up to 1
•  Given the rank vector r with an entry per page, where ri is the
importance of page i and the ri sum up to one
16
∑→
=
ji
i
j
r
r
id
The flow equation can be written as r = Mr
Prof. Pier Luca Lanzi
The Eigenvector Formulation
•  Since the flow equation can be written as r = Mr, the rank vector
r is also an eigenvector of M
•  Thus, we can solve for r using a simple iterative scheme!
(“power iteration”)
•  Power iteration: a simple iterative scheme
! Suppose there are N web pages
! Initialize: r(0) = [1/N,….,1/N]T
! Iterate: r(t+1) = M r(t)
! Stop when |r(t+1) – r(t)|1 < ε
17
Prof. Pier Luca Lanzi
The Random Walk Formulation
•  Suppose that a random surfer that at time t is on page i and will
continue it navigation by following one of the out-link at random
•  At time t+1, will end up on page j and from there it will continue
the random surfing indefinitely
•  Let p(t) the vector of probabilities pi(t) that the surfer is on !
page i at time t (p(t) is the probability distribution over pages)
•  Then, p(t+1) = Mp(t) so that
18
p(t) is the stationary distribution for the random walk
Prof. Pier Luca Lanzi
Existence and Uniqueness!
For graphs that satisfy certain conditions,!
the stationary distribution is unique and!
eventually will be reached no matter!
what the initial probability distribution is
Prof. Pier Luca Lanzi
Hubs and Authorities (HITS)
Prof. Pier Luca Lanzi
Hubs and Authorities
•  HITS (Hypertext-Induced Topic Selection)
! Is a measure of importance of pages and!
documents, similar to PageRank
! Proposed at around same time as PageRank (1998)
•  Goal: Say we want to find good newspapers
! Don’t just find newspapers. Find “experts”, that is, !
people who link in a coordinated way to good newspapers
•  The idea is similar, links are viewed as votes
! Page is more important if it has more links
! In-coming links? Out-going links?
21
Prof. Pier Luca Lanzi
Hubs and Authorities
•  Each page has 2 scores
•  Quality as an expert (hub)
! Total sum of votes of authorities pointed to
•  Quality as a content (authority)
! Total sum of votes coming from experts
•  Principle of repeated improvement
22
Prof. Pier Luca Lanzi
Hubs and Authorities
•  Authorities are pages containing !
useful information
! Newspaper home pages
! Course home pages
! Home pages of auto manufacturers
•  Hubs are pages that link to authorities
! List of newspapers
! Course bulletin
! List of US auto manufacturers
23
Prof. Pier Luca Lanzi
Counting in-links: Authority 24
(Note this is idealized example. In reality graph is not bipartite and each page
has both the hub and authority score)
Each page starts with hub score
1.Authorities collect their votes
Prof. Pier Luca Lanzi
Counting in-links: Authority
25
Sum of hub
scores of nodes
pointing to NYT.
Each page starts with hub score
1.Authorities collect their votes
25
Prof. Pier Luca Lanzi
Expert Quality: Hub
26
Hubs collect authority scores
Sum of authority
scores of nodes that
the node points to.
26
Prof. Pier Luca Lanzi
Reweighting
27
Authorities again collect !
the hub scores
27
Prof. Pier Luca Lanzi
Mutually Recursive Definition
•  A good hub links to many good authorities
•  A good authority is linked from many good hubs
•  Model using two scores for each node:
! Hub score and Authority score
! Represented as vectors and
28
28
Prof. Pier Luca Lanzi
The HITS Algorithm
•  Initialize scores
•  Iterate until convergence:
! Update authority scores
! Update hub scores
! Normalize
•  Two vectors a = (a1, …, an) and h=(h1, …, hn) and !
the adjacency matrix A, with Aij=1 is 1 if i connects to j !
are connected, 0 otherwise
29
Prof. Pier Luca Lanzi
The HITS Algorithm (vector notation)
•  Set ai = hi = 1/ n
•  Repeat until convergence
! h = Aa
! a = ATh
•  Convergence criteria!
!
!
!
•  Under reasonable assumptions about A, HITS converges to
vectors h* and a* where
! h* is the principal eigenvector of matrix A AT
! a* is the principal eigenvector of matrix AT A
30
Prof. Pier Luca Lanzi
PageRank vs HITS
•  PageRank and HITS are two solutions to the same problem
! What is the value of an in-link from u to v?
! In the PageRank model, the value of the link!
depends on the links into u
! In the HITS model, it depends on!
the value of the other links out of u
•  The destinies of PageRank and HITS !
after 1998 were very different
31
Prof. Pier Luca Lanzi
Community Detection
Prof. Pier Luca Lanzi
We often think of networks being
organized into modules, cluster, communities:
Prof. Pier Luca Lanzi
The goal is typically to find densely linked clusters
Prof. Pier Luca Lanzi
advertiser
query
Micro-Markets in Sponsored Search: find micro-markets by partitioning the!
query-to-advertiser graph (Andersen, Lang: Communities from seed sets, 2006)
Prof. Pier Luca Lanzi
Clusters in Movies-to-Actors graph!
(Andersen, Lang: Communities from seed sets, 2006)
Prof. Pier Luca Lanzi
Discovering social circles, circles of trust!
(McAuley, Leskovec: Discovering social circles in ego networks, 2012)
Prof. Pier Luca Lanzi
how can we identify communities?
Prof. Pier Luca Lanzi
Girvan-Newman Method
•  Define edge betweenness as the number of shortest paths
passing over the edge
•  Divisive hierarchical clustering based on the notion of edge
betweenness
•  The Algorithm
! Start with an undirected graph
! Repeat until no edges are left!
Calculate betweenness of edges!
Remove edges with highest betweenness
•  Connected components are communities
•  Gives a hierarchical decomposition of the network
39
Prof. Pier Luca Lanzi
Need to re-compute
betweenness at every step
49
33
121
Prof. Pier Luca Lanzi
Step 1: Step 2:
Step 3: Hierarchical network
decomposition
Prof. Pier Luca Lanzi
Communities in physics collaborations
Prof. Pier Luca Lanzi
how to select the number of clusters?
Prof. Pier Luca Lanzi
Network Communities
•  Communities are viewed as sets of tightly connected nodes
•  We define modularity as a measure !
of how well a network is partitioned !
into communities
•  Given a partitioning of the network!
into a set of groups S we define the !
modularity Q as
44
Need a null model!
Prof. Pier Luca Lanzi
Modularity is useful for selecting the number of clusters:
Q
Prof. Pier Luca Lanzi
Spectral Clustering
Prof. Pier Luca Lanzi
What Makes a Good Cluster?
•  Undirected graph G(V,E)
•  Partitioning task
! Divide the vertices into two disjoint!
groups A, B=VA
•  Questions
! How can we define a “good partition” of G?
! How can we efficiently identify such a partition?
47
1
3
2
5
4
6
1
3
2
5
4
6
A B
Prof. Pier Luca Lanzi
1
3
2
5
4
6
What makes a good partition?
Maximize the number of within-group connections
Minimize the number of between-group connections
Prof. Pier Luca Lanzi
Graph Cuts
•  Express partitioning objectives as a function of !
the “edge cut” of the partition
•  Cut is defined as the set of edges with only one vertex in a group
•  The cut of the set A, B is cut(A,B) = 2 or in more general
49
1
3
2
5
4
6
A B
Prof. Pier Luca Lanzi
Graph Cut Criterion
•  Partition quality
! Minimize weight of connections between groups,!
i.e., arg minA,B cut(A,B)!
•  Degenerate case:
•  Problems
! Only considers external cluster connections
! Does not consider internal cluster connectivity
50
“Optimal cut”
Minimum cut
Prof. Pier Luca Lanzi
Graph Partitioning Criteria:!
Normalized cut (Conductance)!
•  Connectivity of the group to the rest of the network should be
relative to the density of the group
•  Where vol(A) is the total weight of the edges that have at least
one endpoint in A
51
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Spectral Graph Partitioning
•  Let A be the adjacent matrix of the graph G with n nodes
! Aij is 1 if there is an edge between i and j, 0 otherwise
! x a vector of n components (x1, …, xn) that represents labels/
values assigned to each node of G
! Ax returns a vector in which each component j is the sum of
the labels of the neighbors of node j
•  Spectral Graph Theory
! Analyze the spectrum of G, that is, the eigenvectors xi of the
graph corresponding to the eigenvalues Λ of G sorted in
increasing order
! Λ = { λ1, …, λn} such that λ1≤λ2 ≤… ≤λn
54
Prof. Pier Luca Lanzi
Example: d-regular Graph
•  Suppose that all the nodes in G have degree d and G is
connected
•  What are the eigenvalues/eigenvectors of G? Ax=λx
! Ax returns the sum of the labels of each node’s neighbors and
since each node has exactly d neighbors, x = (1, …, 1) is an
eigenvector and d is an eigenvalue
•  What if G is not connected but still d-regular
•  A vector with all the ones is A and all the zeros in B (or
viceversa) is still an eigenvector of A with eigenvalue d
55
A B
Prof. Pier Luca Lanzi
Example: d-regular Graph (not connected)
•  What if G has two separate components!
but it is still d-regular
•  A vector with all the ones is A and all the !
zeros in B (or viceversa) is still an eigenvector!
of A with eigenvalue d
•  Underlying intuition
56
A B
A B A B
λ1=λ2
λ1≈λ2
Prof. Pier Luca Lanzi
Spectral Graph Partitioning
•  Adjacency matrix A (nxn)
! Symmetric
! Real and orthogonal!
eigenvectors
•  Degree Matrix
! nxn diagonal matrix
! Dii = degree of node i
57
1
3
2
5
4 6
1 2 3 4 5 6
1 0 1 1 0 1 0
2 1 0 1 0 0 0
3 1 1 0 1 0 0
4 0 0 1 0 1 1
5 1 0 0 1 0 1
6 0 0 0 1 1 0
1 2 3 4 5 6
1 3 0 0 0 0 0
2 0 2 0 0 0 0
3 0 0 3 0 0 0
4 0 0 0 3 0 0
5 0 0 0 0 3 0
6 0 0 0 0 0 2
Prof. Pier Luca Lanzi
Graph Laplacian Matrix
•  Computed as L = D-A
! nxn symmetric matrix
! x=(1,…,1) is a trivial eigenvector since Lx=0 so λ1=0
•  Important properties of L
! Eigenvalues are non-negative!
real numbers
! Eigenvectors are real!
and orthogonal!
58
1 2 3 4 5 6
1 3 -1 -1 0 -1 0
2 -1 2 -1 0 0 0
3 -1 -1 3 -1 0 0
4 0 0 -1 3 -1 -1
5 -1 0 0 -1 3 -1
6 0 0 0 -1 -1 2
Prof. Pier Luca Lanzi
2 as optimization problem
•  For symmetric matrix M,
•  What is the meaning of xTLx on G? We can show that,
•  So that, considering that the second eigenvector x is the unit
vector, and x is orthogonal to the unit vector (1, …, 1)
59
Prof. Pier Luca Lanzi
2 as optimization problem
•  So that, considering that the second eigenvector x is the unit
vector, and x is orthogonal to the unit vector (1, …, 1)
•  Such that,
60
Prof. Pier Luca Lanzi
0 x
λ2 and its eigenvector x balance to minimize
xi xj
Prof. Pier Luca Lanzi
Finding the Optimal Cut
•  Express the partition (A,B) as a vector y where,
! yi = +1 if node i belongs to A
! yi = -1 if node i belongs to B
•  We can minimize the cut of the partition by finding a non-trivial
vector that minimizes
62
Can’t solve exactly! Let’s relax y and
allow y to take any real value.
Prof. Pier Luca Lanzi
Rayleigh Theorem
•  We know that,
•  The minimum value of f(y) is given by the second smallest
eigenvalue λ2 of the Laplacian matrix L
•  Thus, the optimal solution for y is given by the corresponding
eigenvector x, referred as the Fiedler vector
63
Prof. Pier Luca Lanzi
Spectral Clustering Algorithms
1. Pre-processing
! Construct a matrix representation of the graph
2. Decomposition
! Compute eigenvalues and eigenvectors of the matrix
! Map each point to a lower-dimensional representation based
on one or more eigenvectors
3. Grouping
! Assign points to two or more clusters, based on the new
representation
64
Prof. Pier Luca Lanzi
Spectral Partitioning Algorithm
•  Pre-processing:
! Build Laplacian !
matrix L of the !
graph
•  Decomposition:
! Find eigenvalues λ!
and eigenvectors x !
of the matrix L
! Map vertices to !
corresponding !
components of λ2
65
0.0-0.4-0.40.4-0.60.4
0.50.4-0.2-0.5-0.30.4
-0.50.40.60.1-0.30.4
0.5-0.40.60.10.30.4
0.00.4-0.40.40.60.4
-0.5-0.4-0.2-0.50.30.4
5.0
4.0
3.0
3.0
1.0
0.0
λ= X =
-0.66
-0.35
-0.34
0.33
0.62
0.31
1 2 3 4 5 6
1 3 -1 -1 0 -1 0
2 -1 2 -1 0 0 0
3 -1 -1 3 -1 0 0
4 0 0 -1 3 -1 -1
5 -1 0 0 -1 3 -1
6 0 0 0 -1 -1 2
Prof. Pier Luca Lanzi
Spectral Partitioning Algorithm
•  Grouping:
! Sort components of reduced 1-dimensional vector
! Identify clusters by splitting the sorted vector in two
•  How to choose a splitting point?
! Naïve approaches: split at 0 or median value
! More expensive approaches: Attempt to minimize normalized
cut in 1-dimension (sweep over ordering of nodes induced by
the eigenvector)
66
66
-0.66
-0.35
-0.34
0.33
0.62
0.31
Split at 0:
Cluster A: Positive points
Cluster B: Negative points
0.33
0.62
0.31
-0.66
-0.35
-0.34
A B
Prof. Pier Luca Lanzi
Example: Spectral Partitioning 67
Rank in x2
Valueofx2
Prof. Pier Luca Lanzi
Example: Spectral Partitioning
68
Rank in x2
Valueofx2
Components of x2
68
Prof. Pier Luca Lanzi
Example: Spectral partitioning
69
Components of x1
Components of x3
69
Prof. Pier Luca Lanzi
How Do We Partition a Graph into k Clusters?
•  Two basic approaches:
•  Recursive bi-partitioning [Hagen et al., ’92]
! Recursively apply bi-partitioning algorithm!
in a hierarchical divisive manner
! Disadvantages: inefficient, unstable
•  Cluster multiple eigenvectors [Shi-Malik, ’00]
! Build a reduced space from multiple eigenvectors
! Commonly used in recent papers
! A preferable approach…
70
70
Prof. Pier Luca Lanzi
Why Use Multiple Eigenvectors?
•  Approximates the optimal cut [Shi-Malik, ’00]
! Can be used to approximate optimal k-way normalized cut
•  Emphasizes cohesive clusters
! Increases the unevenness in the distribution of the data
! Associations between similar points are amplified, associations
between dissimilar points are attenuated
! The data begins to “approximate a clustering”
•  Well-separated space
! Transforms data to a new “embedded space”, !
consisting of k orthogonal basis vectors
•  Multiple eigenvectors prevent instability due to information loss
71
Prof. Pier Luca Lanzi
Searching for Small Communities!
(Trawling)
Prof. Pier Luca Lanzi
Searching for small communities in the Web
graph (Trawling)
•  Trawling
! What is the signature of a community in a Web graph?
! The underlying intuition, that small communities involve
many people talking about the same things
! Use this to define “topics”: what the same people on
the left talk about on the right?
•  More formally
! Enumerate complete bipartite subgraphs Ks,t
! Ks,t has s nodes on the “left” and t nodes on the “right”
! The left nodes link to the same node of on the right,
forming a fully connected bipartite graph
73
[Kumar et al. ‘99]
Dense 2-layer
…
…
K3,4
X Y
Prof. Pier Luca Lanzi
Mining Bipartite Ks,t using Frequent Itemsets
•  Searching for such complete bipartite graphs can be viewed as a
frequent itemset mining problem
•  View each node i as a !
set Si of nodes i points to
•  Ks,t = a set Y of size t !
that occurs in s sets Si
•  Looking for Ks,t is equivalento to!
settting the frequency threshold !
to s and look at layer t !
(i.e., all frequent sets of size t)
74
[Kumar et al. ‘99]
Si={a,b,c,d}
X Y
s = minimum support (|X|=s)
t = itemset size (|Y|=t)
Prof. Pier Luca Lanzi
Si={a,b,c,d}
X Y
Find frequent itemsets:
s … minimum support
t … itemset size
We found Ks,t!
Ks,t = a set Y of size t !
that occurs in s sets Si
View each node i as a !
set Si of nodes i points to
Say we find a frequent itemset
Y={a,b,c} of supp s; so, there are s
nodes that link to all of {a,b,c}:
Prof. Pier Luca Lanzi
Example
•  Support threshold s=2
! {b,d}: support 3
! {e,f}: support 2
•  And we just found 2 bipartite subgraphs:
76
c
a b
d
f
e
c
a b
d
e
c
d
f
e
•  Itemsets
! a = {b,c,d}
! b = {d}
! c = {b,d,e,f}
! d = {e,f}
! e = {b,d}
! f = {}

More Related Content

What's hot

Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...Traian Rebedea
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingJonathan Mugan
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Daniele Di Mitri
 
Using Stanza NLP and TensorFlow to create a summary of a book
Using Stanza NLP and TensorFlow to create a summary of a bookUsing Stanza NLP and TensorFlow to create a summary of a book
Using Stanza NLP and TensorFlow to create a summary of a bookOlusola Amusan
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 
From NLP to text mining
From NLP to text mining From NLP to text mining
From NLP to text mining Yi-Shin Chen
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
 
Natural Language Processing with Graphs
Natural Language Processing with GraphsNatural Language Processing with Graphs
Natural Language Processing with GraphsNeo4j
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systemsQi He
 
Natural language search using Neo4j
Natural language search using Neo4jNatural language search using Neo4j
Natural language search using Neo4jKenny Bastani
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLPAnuj Gupta
 

What's hot (16)

Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
 
Using Stanza NLP and TensorFlow to create a summary of a book
Using Stanza NLP and TensorFlow to create a summary of a bookUsing Stanza NLP and TensorFlow to create a summary of a book
Using Stanza NLP and TensorFlow to create a summary of a book
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
From NLP to text mining
From NLP to text mining From NLP to text mining
From NLP to text mining
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
Natural Language Processing with Graphs
Natural Language Processing with GraphsNatural Language Processing with Graphs
Natural Language Processing with Graphs
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
 
Natural language search using Neo4j
Natural language search using Neo4jNatural language search using Neo4j
Natural language search using Neo4j
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLP
 

Viewers also liked

Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Pier Luca Lanzi
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationPier Luca Lanzi
 
DMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringDMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringPier Luca Lanzi
 
DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesDMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesPier Luca Lanzi
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsPier Luca Lanzi
 
DMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesDMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesPier Luca Lanzi
 
DMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesDMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesPier Luca Lanzi
 
DMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringDMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringPier Luca Lanzi
 
DMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringDMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringPier Luca Lanzi
 
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsDMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsPier Luca Lanzi
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationPier Luca Lanzi
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionPier Luca Lanzi
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningPier Luca Lanzi
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesPier Luca Lanzi
 
Lecture 5b graphs and hashing
Lecture 5b graphs and hashingLecture 5b graphs and hashing
Lecture 5b graphs and hashingVictor Palmar
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesPier Luca Lanzi
 
DMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringDMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringPier Luca Lanzi
 
DMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationDMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationPier Luca Lanzi
 

Viewers also liked (20)

Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to Classification
 
DMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based ClusteringDMTM 2015 - 08 Representative-Based Clustering
DMTM 2015 - 08 Representative-Based Clustering
 
DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesDMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification Ensembles
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification Models
 
DMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision TreesDMTM 2015 - 11 Decision Trees
DMTM 2015 - 11 Decision Trees
 
DMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification RulesDMTM 2015 - 12 Classification Rules
DMTM 2015 - 12 Classification Rules
 
DMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringDMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical Clustering
 
DMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringDMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based Clustering
 
Course Introduction
Course IntroductionCourse Introduction
Course Introduction
 
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsDMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data Representation
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course Introduction
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data Mining
 
Course Organization
Course OrganizationCourse Organization
Course Organization
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association Rules
 
Lecture 5b graphs and hashing
Lecture 5b graphs and hashingLecture 5b graphs and hashing
Lecture 5b graphs and hashing
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification Rules
 
DMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringDMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to Clustering
 
DMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationDMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data Exploration
 

Similar to DMTM 2015 - 19 Graph Mining

DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningPier Luca Lanzi
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
 
Page rank talk at NTU-EE
Page rank talk at NTU-EEPage rank talk at NTU-EE
Page rank talk at NTU-EEPing Yeh
 
Link analysis for web search
Link analysis for web searchLink analysis for web search
Link analysis for web searchEmrullah Delibas
 
Knowledge base system appl. p 1,2-ver1
Knowledge base system appl.  p 1,2-ver1Knowledge base system appl.  p 1,2-ver1
Knowledge base system appl. p 1,2-ver1Taymoor Nazmy
 
Gaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communitiesGaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communitiesjoinson
 
Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Shenghui Wang
 
How Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondHow Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondNeo4j
 
Social Network Analysis (Part 1)
Social Network Analysis (Part 1)Social Network Analysis (Part 1)
Social Network Analysis (Part 1)Vala Ali Rohani
 
2013 CrossRef Annual Meeting Agile Publishing Kristen Ratan
2013 CrossRef Annual Meeting Agile Publishing Kristen Ratan2013 CrossRef Annual Meeting Agile Publishing Kristen Ratan
2013 CrossRef Annual Meeting Agile Publishing Kristen RatanCrossref
 
Markov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdfMarkov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdfrayyverma
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic DataGong Cheng
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaCharalampos Chelmis
 
4C13 J.15 Larson "Twitter based discourse community"
4C13 J.15 Larson "Twitter based discourse community"4C13 J.15 Larson "Twitter based discourse community"
4C13 J.15 Larson "Twitter based discourse community"rhetoricked
 
Power Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer PhenomenaPower Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer PhenomenaAi Sha
 
D Whitelock LAK presentation open_essayistfv
D Whitelock LAK presentation  open_essayistfvD Whitelock LAK presentation  open_essayistfv
D Whitelock LAK presentation open_essayistfvDenise Whitelock
 

Similar to DMTM 2015 - 19 Graph Mining (20)

DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph mining
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Page rank talk at NTU-EE
Page rank talk at NTU-EEPage rank talk at NTU-EE
Page rank talk at NTU-EE
 
Link analysis for web search
Link analysis for web searchLink analysis for web search
Link analysis for web search
 
Knowledge base system appl. p 1,2-ver1
Knowledge base system appl.  p 1,2-ver1Knowledge base system appl.  p 1,2-ver1
Knowledge base system appl. p 1,2-ver1
 
Social (1)
Social (1)Social (1)
Social (1)
 
Gaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communitiesGaining, retaining and losing influence in online communities
Gaining, retaining and losing influence in online communities
 
Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...
 
How Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondHow Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and Beyond
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Social Network Analysis (Part 1)
Social Network Analysis (Part 1)Social Network Analysis (Part 1)
Social Network Analysis (Part 1)
 
2013 CrossRef Annual Meeting Agile Publishing Kristen Ratan
2013 CrossRef Annual Meeting Agile Publishing Kristen Ratan2013 CrossRef Annual Meeting Agile Publishing Kristen Ratan
2013 CrossRef Annual Meeting Agile Publishing Kristen Ratan
 
Markov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdfMarkov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdf
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic Data
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
4C13 J.15 Larson "Twitter based discourse community"
4C13 J.15 Larson "Twitter based discourse community"4C13 J.15 Larson "Twitter based discourse community"
4C13 J.15 Larson "Twitter based discourse community"
 
Power Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer PhenomenaPower Laws and Rich-Get-Richer Phenomena
Power Laws and Rich-Get-Richer Phenomena
 
D Whitelock LAK presentation open_essayistfv
D Whitelock LAK presentation  open_essayistfvD Whitelock LAK presentation  open_essayistfv
D Whitelock LAK presentation open_essayistfv
 

More from Pier Luca Lanzi

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i VideogiochiPier Luca Lanzi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiPier Luca Lanzi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomePier Luca Lanzi
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaPier Luca Lanzi
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Pier Luca Lanzi
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationPier Luca Lanzi
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesPier Luca Lanzi
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationPier Luca Lanzi
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringPier Luca Lanzi
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringPier Luca Lanzi
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesPier Luca Lanzi
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsPier Luca Lanzi
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesPier Luca Lanzi
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesPier Luca Lanzi
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationPier Luca Lanzi
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationPier Luca Lanzi
 

More from Pier Luca Lanzi (20)

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei Videogiochi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning Welcome
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di apertura
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rules
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clustering
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 Clustering
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensembles
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethods
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rules
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision trees
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representation
 

Recently uploaded

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 

Recently uploaded (20)

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 

DMTM 2015 - 19 Graph Mining

  • 1. Prof. Pier Luca Lanzi Graph Mining! Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
  • 2. Prof. Pier Luca Lanzi References •  Jure Leskovec, Anand Rajaraman, Jeff Ullman. Mining of Massive Datasets, Chapter 5 & Chapter 10 •  Book and slides are available from http://www.mmds.org 2
  • 3. Prof. Pier Luca Lanzi Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna, 2011]
  • 4. Prof. Pier Luca Lanzi 4 Connections between political blogs Polarization of the network [Adamic-Glance, 2005]
  • 5. Prof. Pier Luca Lanzi 5 Citation networks and Maps of science [Börner et al., 2012]
  • 6. Prof. Pier Luca Lanzi I teach a class on Networks. CS224W: Classes are in the ! Gates building Computer Science Department at Stanford Stanford University Web as a graph: pages are nodes, edges are links
  • 7. Prof. Pier Luca Lanzi Web as a graph: pages are nodes, edges are links
  • 8. Prof. Pier Luca Lanzi How is the Web Organized? •  Initial approaches ! Human curated Web directories ! Yahoo, DMOZ, LookSmart •  Then, Web search ! Information Retrieval investigates:! Find relevant docs in a small ! and trusted set ! Newspaper articles, Patents, etc. 8 Web is huge, full of untrusted documents,! random things, web spam, etc.
  • 9. Prof. Pier Luca Lanzi Web Search Challenges •  Web contains many sources of information ! Who should we “trust”? ! Trick: Trustworthy pages may point to each other! •  What is the “best” answer to query “newspaper”? ! No single right answer ! Trick: Pages that actually know about newspapers! might all be pointing to many newspapers 9
  • 10. Prof. Pier Luca Lanzi Page Rank
  • 11. Prof. Pier Luca Lanzi Page Rank Algorithm •  The underlying idea is to look at links as votes •  A page is more important if it has more links ! In-coming links? Out-going links? •  Intuition ! www.stanford.edu has 23,400 in-links ! www.joe-schmoe.com has one in-link •  Are all in-links are equal? ! Links from important pages count more ! Recursive question! 11
  • 12. Prof. Pier Luca Lanzi B 38.4 C 34.3 E 8.1 F 3.9 D 3.9 A 3.3 1.6 1.6 1.6 1.6 1.6
  • 13. Prof. Pier Luca Lanzi Simple Recursive Formulation •  Each link’s vote is proportional! to the importance of its source page •  If page j with importance rj has n! out-links, each link gets rj/n votes •  Page j’s own importance is! the sum of the votes on its in-links 13 j ki rj/3 rj/3rj/3 rj = ri/3+rk/4 ri/3 rk/4
  • 14. Prof. Pier Luca Lanzi The “Flow” Model •  A “vote” from an important page is worth more •  A page is important if it is pointed to by other important pages •  Define a “rank” rj for page j! ! ! ! ! ! where di is the out-degree of node i •  “Flow” equations ry = ry /2 + ra /2 ra = ry /2 + rm rm = ra /2 14 ∑→ = ji i j r r id y ma a/2 y/2 a/2 m y/2
  • 15. Prof. Pier Luca Lanzi Solving the Flow Equations •  The equations, three unknowns variables, no constant ! No unique solution ! All solutions equivalent modulo the scale factor •  An additional constraint (ry+ra+rm=1) forces uniqueness •  Gaussian elimination method works for small examples, but we need a better method for large web-size graphs 15 We need a different formulation that scales up!
  • 16. Prof. Pier Luca Lanzi The Matrix Formulation •  Represent the graph as a transition matrix M ! Suppose page i has di out-links ! If page i is linked to page j Mji is set to 1/di else Mji=0 ! M is a “column stochastic matrix” since! the columns sum up to 1 •  Given the rank vector r with an entry per page, where ri is the importance of page i and the ri sum up to one 16 ∑→ = ji i j r r id The flow equation can be written as r = Mr
  • 17. Prof. Pier Luca Lanzi The Eigenvector Formulation •  Since the flow equation can be written as r = Mr, the rank vector r is also an eigenvector of M •  Thus, we can solve for r using a simple iterative scheme! (“power iteration”) •  Power iteration: a simple iterative scheme ! Suppose there are N web pages ! Initialize: r(0) = [1/N,….,1/N]T ! Iterate: r(t+1) = M r(t) ! Stop when |r(t+1) – r(t)|1 < ε 17
  • 18. Prof. Pier Luca Lanzi The Random Walk Formulation •  Suppose that a random surfer that at time t is on page i and will continue it navigation by following one of the out-link at random •  At time t+1, will end up on page j and from there it will continue the random surfing indefinitely •  Let p(t) the vector of probabilities pi(t) that the surfer is on ! page i at time t (p(t) is the probability distribution over pages) •  Then, p(t+1) = Mp(t) so that 18 p(t) is the stationary distribution for the random walk
  • 19. Prof. Pier Luca Lanzi Existence and Uniqueness! For graphs that satisfy certain conditions,! the stationary distribution is unique and! eventually will be reached no matter! what the initial probability distribution is
  • 20. Prof. Pier Luca Lanzi Hubs and Authorities (HITS)
  • 21. Prof. Pier Luca Lanzi Hubs and Authorities •  HITS (Hypertext-Induced Topic Selection) ! Is a measure of importance of pages and! documents, similar to PageRank ! Proposed at around same time as PageRank (1998) •  Goal: Say we want to find good newspapers ! Don’t just find newspapers. Find “experts”, that is, ! people who link in a coordinated way to good newspapers •  The idea is similar, links are viewed as votes ! Page is more important if it has more links ! In-coming links? Out-going links? 21
  • 22. Prof. Pier Luca Lanzi Hubs and Authorities •  Each page has 2 scores •  Quality as an expert (hub) ! Total sum of votes of authorities pointed to •  Quality as a content (authority) ! Total sum of votes coming from experts •  Principle of repeated improvement 22
  • 23. Prof. Pier Luca Lanzi Hubs and Authorities •  Authorities are pages containing ! useful information ! Newspaper home pages ! Course home pages ! Home pages of auto manufacturers •  Hubs are pages that link to authorities ! List of newspapers ! Course bulletin ! List of US auto manufacturers 23
  • 24. Prof. Pier Luca Lanzi Counting in-links: Authority 24 (Note this is idealized example. In reality graph is not bipartite and each page has both the hub and authority score) Each page starts with hub score 1.Authorities collect their votes
  • 25. Prof. Pier Luca Lanzi Counting in-links: Authority 25 Sum of hub scores of nodes pointing to NYT. Each page starts with hub score 1.Authorities collect their votes 25
  • 26. Prof. Pier Luca Lanzi Expert Quality: Hub 26 Hubs collect authority scores Sum of authority scores of nodes that the node points to. 26
  • 27. Prof. Pier Luca Lanzi Reweighting 27 Authorities again collect ! the hub scores 27
  • 28. Prof. Pier Luca Lanzi Mutually Recursive Definition •  A good hub links to many good authorities •  A good authority is linked from many good hubs •  Model using two scores for each node: ! Hub score and Authority score ! Represented as vectors and 28 28
  • 29. Prof. Pier Luca Lanzi The HITS Algorithm •  Initialize scores •  Iterate until convergence: ! Update authority scores ! Update hub scores ! Normalize •  Two vectors a = (a1, …, an) and h=(h1, …, hn) and ! the adjacency matrix A, with Aij=1 is 1 if i connects to j ! are connected, 0 otherwise 29
  • 30. Prof. Pier Luca Lanzi The HITS Algorithm (vector notation) •  Set ai = hi = 1/ n •  Repeat until convergence ! h = Aa ! a = ATh •  Convergence criteria! ! ! ! •  Under reasonable assumptions about A, HITS converges to vectors h* and a* where ! h* is the principal eigenvector of matrix A AT ! a* is the principal eigenvector of matrix AT A 30
  • 31. Prof. Pier Luca Lanzi PageRank vs HITS •  PageRank and HITS are two solutions to the same problem ! What is the value of an in-link from u to v? ! In the PageRank model, the value of the link! depends on the links into u ! In the HITS model, it depends on! the value of the other links out of u •  The destinies of PageRank and HITS ! after 1998 were very different 31
  • 32. Prof. Pier Luca Lanzi Community Detection
  • 33. Prof. Pier Luca Lanzi We often think of networks being organized into modules, cluster, communities:
  • 34. Prof. Pier Luca Lanzi The goal is typically to find densely linked clusters
  • 35. Prof. Pier Luca Lanzi advertiser query Micro-Markets in Sponsored Search: find micro-markets by partitioning the! query-to-advertiser graph (Andersen, Lang: Communities from seed sets, 2006)
  • 36. Prof. Pier Luca Lanzi Clusters in Movies-to-Actors graph! (Andersen, Lang: Communities from seed sets, 2006)
  • 37. Prof. Pier Luca Lanzi Discovering social circles, circles of trust! (McAuley, Leskovec: Discovering social circles in ego networks, 2012)
  • 38. Prof. Pier Luca Lanzi how can we identify communities?
  • 39. Prof. Pier Luca Lanzi Girvan-Newman Method •  Define edge betweenness as the number of shortest paths passing over the edge •  Divisive hierarchical clustering based on the notion of edge betweenness •  The Algorithm ! Start with an undirected graph ! Repeat until no edges are left! Calculate betweenness of edges! Remove edges with highest betweenness •  Connected components are communities •  Gives a hierarchical decomposition of the network 39
  • 40. Prof. Pier Luca Lanzi Need to re-compute betweenness at every step 49 33 121
  • 41. Prof. Pier Luca Lanzi Step 1: Step 2: Step 3: Hierarchical network decomposition
  • 42. Prof. Pier Luca Lanzi Communities in physics collaborations
  • 43. Prof. Pier Luca Lanzi how to select the number of clusters?
  • 44. Prof. Pier Luca Lanzi Network Communities •  Communities are viewed as sets of tightly connected nodes •  We define modularity as a measure ! of how well a network is partitioned ! into communities •  Given a partitioning of the network! into a set of groups S we define the ! modularity Q as 44 Need a null model!
  • 45. Prof. Pier Luca Lanzi Modularity is useful for selecting the number of clusters: Q
  • 46. Prof. Pier Luca Lanzi Spectral Clustering
  • 47. Prof. Pier Luca Lanzi What Makes a Good Cluster? •  Undirected graph G(V,E) •  Partitioning task ! Divide the vertices into two disjoint! groups A, B=VA •  Questions ! How can we define a “good partition” of G? ! How can we efficiently identify such a partition? 47 1 3 2 5 4 6 1 3 2 5 4 6 A B
  • 48. Prof. Pier Luca Lanzi 1 3 2 5 4 6 What makes a good partition? Maximize the number of within-group connections Minimize the number of between-group connections
  • 49. Prof. Pier Luca Lanzi Graph Cuts •  Express partitioning objectives as a function of ! the “edge cut” of the partition •  Cut is defined as the set of edges with only one vertex in a group •  The cut of the set A, B is cut(A,B) = 2 or in more general 49 1 3 2 5 4 6 A B
  • 50. Prof. Pier Luca Lanzi Graph Cut Criterion •  Partition quality ! Minimize weight of connections between groups,! i.e., arg minA,B cut(A,B)! •  Degenerate case: •  Problems ! Only considers external cluster connections ! Does not consider internal cluster connectivity 50 “Optimal cut” Minimum cut
  • 51. Prof. Pier Luca Lanzi Graph Partitioning Criteria:! Normalized cut (Conductance)! •  Connectivity of the group to the rest of the network should be relative to the density of the group •  Where vol(A) is the total weight of the edges that have at least one endpoint in A 51
  • 54. Prof. Pier Luca Lanzi Spectral Graph Partitioning •  Let A be the adjacent matrix of the graph G with n nodes ! Aij is 1 if there is an edge between i and j, 0 otherwise ! x a vector of n components (x1, …, xn) that represents labels/ values assigned to each node of G ! Ax returns a vector in which each component j is the sum of the labels of the neighbors of node j •  Spectral Graph Theory ! Analyze the spectrum of G, that is, the eigenvectors xi of the graph corresponding to the eigenvalues Λ of G sorted in increasing order ! Λ = { λ1, …, λn} such that λ1≤λ2 ≤… ≤λn 54
  • 55. Prof. Pier Luca Lanzi Example: d-regular Graph •  Suppose that all the nodes in G have degree d and G is connected •  What are the eigenvalues/eigenvectors of G? Ax=λx ! Ax returns the sum of the labels of each node’s neighbors and since each node has exactly d neighbors, x = (1, …, 1) is an eigenvector and d is an eigenvalue •  What if G is not connected but still d-regular •  A vector with all the ones is A and all the zeros in B (or viceversa) is still an eigenvector of A with eigenvalue d 55 A B
  • 56. Prof. Pier Luca Lanzi Example: d-regular Graph (not connected) •  What if G has two separate components! but it is still d-regular •  A vector with all the ones is A and all the ! zeros in B (or viceversa) is still an eigenvector! of A with eigenvalue d •  Underlying intuition 56 A B A B A B λ1=λ2 λ1≈λ2
  • 57. Prof. Pier Luca Lanzi Spectral Graph Partitioning •  Adjacency matrix A (nxn) ! Symmetric ! Real and orthogonal! eigenvectors •  Degree Matrix ! nxn diagonal matrix ! Dii = degree of node i 57 1 3 2 5 4 6 1 2 3 4 5 6 1 0 1 1 0 1 0 2 1 0 1 0 0 0 3 1 1 0 1 0 0 4 0 0 1 0 1 1 5 1 0 0 1 0 1 6 0 0 0 1 1 0 1 2 3 4 5 6 1 3 0 0 0 0 0 2 0 2 0 0 0 0 3 0 0 3 0 0 0 4 0 0 0 3 0 0 5 0 0 0 0 3 0 6 0 0 0 0 0 2
  • 58. Prof. Pier Luca Lanzi Graph Laplacian Matrix •  Computed as L = D-A ! nxn symmetric matrix ! x=(1,…,1) is a trivial eigenvector since Lx=0 so λ1=0 •  Important properties of L ! Eigenvalues are non-negative! real numbers ! Eigenvectors are real! and orthogonal! 58 1 2 3 4 5 6 1 3 -1 -1 0 -1 0 2 -1 2 -1 0 0 0 3 -1 -1 3 -1 0 0 4 0 0 -1 3 -1 -1 5 -1 0 0 -1 3 -1 6 0 0 0 -1 -1 2
  • 59. Prof. Pier Luca Lanzi 2 as optimization problem •  For symmetric matrix M, •  What is the meaning of xTLx on G? We can show that, •  So that, considering that the second eigenvector x is the unit vector, and x is orthogonal to the unit vector (1, …, 1) 59
  • 60. Prof. Pier Luca Lanzi 2 as optimization problem •  So that, considering that the second eigenvector x is the unit vector, and x is orthogonal to the unit vector (1, …, 1) •  Such that, 60
  • 61. Prof. Pier Luca Lanzi 0 x λ2 and its eigenvector x balance to minimize xi xj
  • 62. Prof. Pier Luca Lanzi Finding the Optimal Cut •  Express the partition (A,B) as a vector y where, ! yi = +1 if node i belongs to A ! yi = -1 if node i belongs to B •  We can minimize the cut of the partition by finding a non-trivial vector that minimizes 62 Can’t solve exactly! Let’s relax y and allow y to take any real value.
  • 63. Prof. Pier Luca Lanzi Rayleigh Theorem •  We know that, •  The minimum value of f(y) is given by the second smallest eigenvalue λ2 of the Laplacian matrix L •  Thus, the optimal solution for y is given by the corresponding eigenvector x, referred as the Fiedler vector 63
  • 64. Prof. Pier Luca Lanzi Spectral Clustering Algorithms 1. Pre-processing ! Construct a matrix representation of the graph 2. Decomposition ! Compute eigenvalues and eigenvectors of the matrix ! Map each point to a lower-dimensional representation based on one or more eigenvectors 3. Grouping ! Assign points to two or more clusters, based on the new representation 64
  • 65. Prof. Pier Luca Lanzi Spectral Partitioning Algorithm •  Pre-processing: ! Build Laplacian ! matrix L of the ! graph •  Decomposition: ! Find eigenvalues λ! and eigenvectors x ! of the matrix L ! Map vertices to ! corresponding ! components of λ2 65 0.0-0.4-0.40.4-0.60.4 0.50.4-0.2-0.5-0.30.4 -0.50.40.60.1-0.30.4 0.5-0.40.60.10.30.4 0.00.4-0.40.40.60.4 -0.5-0.4-0.2-0.50.30.4 5.0 4.0 3.0 3.0 1.0 0.0 λ= X = -0.66 -0.35 -0.34 0.33 0.62 0.31 1 2 3 4 5 6 1 3 -1 -1 0 -1 0 2 -1 2 -1 0 0 0 3 -1 -1 3 -1 0 0 4 0 0 -1 3 -1 -1 5 -1 0 0 -1 3 -1 6 0 0 0 -1 -1 2
  • 66. Prof. Pier Luca Lanzi Spectral Partitioning Algorithm •  Grouping: ! Sort components of reduced 1-dimensional vector ! Identify clusters by splitting the sorted vector in two •  How to choose a splitting point? ! Naïve approaches: split at 0 or median value ! More expensive approaches: Attempt to minimize normalized cut in 1-dimension (sweep over ordering of nodes induced by the eigenvector) 66 66 -0.66 -0.35 -0.34 0.33 0.62 0.31 Split at 0: Cluster A: Positive points Cluster B: Negative points 0.33 0.62 0.31 -0.66 -0.35 -0.34 A B
  • 67. Prof. Pier Luca Lanzi Example: Spectral Partitioning 67 Rank in x2 Valueofx2
  • 68. Prof. Pier Luca Lanzi Example: Spectral Partitioning 68 Rank in x2 Valueofx2 Components of x2 68
  • 69. Prof. Pier Luca Lanzi Example: Spectral partitioning 69 Components of x1 Components of x3 69
  • 70. Prof. Pier Luca Lanzi How Do We Partition a Graph into k Clusters? •  Two basic approaches: •  Recursive bi-partitioning [Hagen et al., ’92] ! Recursively apply bi-partitioning algorithm! in a hierarchical divisive manner ! Disadvantages: inefficient, unstable •  Cluster multiple eigenvectors [Shi-Malik, ’00] ! Build a reduced space from multiple eigenvectors ! Commonly used in recent papers ! A preferable approach… 70 70
  • 71. Prof. Pier Luca Lanzi Why Use Multiple Eigenvectors? •  Approximates the optimal cut [Shi-Malik, ’00] ! Can be used to approximate optimal k-way normalized cut •  Emphasizes cohesive clusters ! Increases the unevenness in the distribution of the data ! Associations between similar points are amplified, associations between dissimilar points are attenuated ! The data begins to “approximate a clustering” •  Well-separated space ! Transforms data to a new “embedded space”, ! consisting of k orthogonal basis vectors •  Multiple eigenvectors prevent instability due to information loss 71
  • 72. Prof. Pier Luca Lanzi Searching for Small Communities! (Trawling)
  • 73. Prof. Pier Luca Lanzi Searching for small communities in the Web graph (Trawling) •  Trawling ! What is the signature of a community in a Web graph? ! The underlying intuition, that small communities involve many people talking about the same things ! Use this to define “topics”: what the same people on the left talk about on the right? •  More formally ! Enumerate complete bipartite subgraphs Ks,t ! Ks,t has s nodes on the “left” and t nodes on the “right” ! The left nodes link to the same node of on the right, forming a fully connected bipartite graph 73 [Kumar et al. ‘99] Dense 2-layer … … K3,4 X Y
  • 74. Prof. Pier Luca Lanzi Mining Bipartite Ks,t using Frequent Itemsets •  Searching for such complete bipartite graphs can be viewed as a frequent itemset mining problem •  View each node i as a ! set Si of nodes i points to •  Ks,t = a set Y of size t ! that occurs in s sets Si •  Looking for Ks,t is equivalento to! settting the frequency threshold ! to s and look at layer t ! (i.e., all frequent sets of size t) 74 [Kumar et al. ‘99] Si={a,b,c,d} X Y s = minimum support (|X|=s) t = itemset size (|Y|=t)
  • 75. Prof. Pier Luca Lanzi Si={a,b,c,d} X Y Find frequent itemsets: s … minimum support t … itemset size We found Ks,t! Ks,t = a set Y of size t ! that occurs in s sets Si View each node i as a ! set Si of nodes i points to Say we find a frequent itemset Y={a,b,c} of supp s; so, there are s nodes that link to all of {a,b,c}:
  • 76. Prof. Pier Luca Lanzi Example •  Support threshold s=2 ! {b,d}: support 3 ! {e,f}: support 2 •  And we just found 2 bipartite subgraphs: 76 c a b d f e c a b d e c d f e •  Itemsets ! a = {b,c,d} ! b = {d} ! c = {b,d,e,f} ! d = {e,f} ! e = {b,d} ! f = {}