6. Many systems use hash partitioning
● Results in many edges being “cut”
Given a graph G and an integer k, partition the vertices into k disjoint sets such
that:
● as few cuts as possible
● as balanced as possible
Graph Partitioning
NP Hard
8. The only constant is change.
-------- Heraclitus
To Make the Problem more Complicated
Social graphs: new people and friendships
Semantic Web graphs: new knowledge
Web graphs: new websites and links
10. Repartitioning the entire graph upon every change is way too expensive
New Framework
Leopard:
● Locally reassess partitioning as a result of changes
without a full re-partitioning
● Integrates consideration of replication with partitioning
14. Compute the Partition for B
A
B
Partition 1 Partition 2# neighbours: 1
# vertices: 5
# neighbours: 3
# vertices: 3
Goals: (1) few cuts and (2) balanced
Heuristic: # neighbours * (1 - #vertices/capacity)
1 * (1 - 5/6) = 0.17 3 * (1 - 3/6) = 1.5
Higher score
This heuristic is simple for
the sake of presentation.
More advanced heuristics
are discussed in the paper
15. Compute the Partition for A
A
B
Partition 1 Partition 2# neighbours: 1
# vertices: 4
# neighbours: 2
# vertices: 4
Goals: (1) few cuts and (2) balanced
Heuristic: # neighbours * (1 - #vertices/capacity)
1 * (1 - 4/6) = 0.33 2 * (1 - 4/6) = 0.66
Higher score
16. Example: Adding an Edge
B
Partition 1 Partition 2
A
(1) B stays put
(2) A moves to partition 2
18. Computation cost
For each new edge, must:
For both vertexes involved in the edge:
Calculate the heuristic for each partition
(May involve communication for remote vertex location lookup)
20. Computation Skipping
Basic Idea: Accumulate changes for a vertex, if the changes exceed a certain
threshold, recompute the partition for the vertex.
For example, threshold = # accumulated changes / # neighbors = 20%.
(1) Compute the partition when V has 10 neighbors. Then 2 new edges are
added for V: 2 / 12 = 17% < 20%. Don’t recompute
(2) When 1 more new edge is added for V: 3 / 13 = 23% > 20%. Recompute the
partition for V. Reset # accumulated changes to 0.
26. How Many Copies?
A
Partition 1 Partition 4Partition 3Partition 2
0.1 0.40.30.2
minimum = 2
average = 3
Scores of each partition
27. How Many Copies?
A
Partition 1 Partition 4Partition 3Partition 2
0.1 0.40.30.2
minimum = 2
average = 3
minimum requirementWhat about them?
28. Always keep the last n computed scores.
Comparing against Past Scores
0.220.290.30.40.870.9 0.2 0.11 0.1
High Low
... ... ... ... ....
minimum = 2
average = 3
cutoff: top avg-1/k-1 percent of scores
29. Comparing against Past Scores
0.220.290.30.40.870.9 0.2 0.11 0.1
High Low
... ... ... ... ....
minimum = 2
average = 3
30th 31th
# copies: 2
cutoff: 30th highest score
30. Comparing against Past Scores
0.220.290.30.40.870.9 0.2 0.11 0.1
High Low
... ... ... ... ....
minimum = 2
average = 3
30th 31th
# copies: 2
cutoff: 30th highest score
31. Comparing against Past Scores
0.220.290.30.40.870.9 0.2 0.11 0.1
High Low
... ... ... ... ....
minimum = 2
average = 3
30th 31th
# copies: 3
cutoff: 30th highest score
32. Comparing against Past Scores
0.220.290.30.40.870.9 0.2 0.11 0.1
High Low
... ... ... ... ....
minimum = 2
average = 3
30th
# copies: 4
cutoff: 30th highest score