SlideShare a Scribd company logo
1 of 38
Leopard: Lightweight Partitioning and Replication
for Dynamic Graphs
Jiewen Huang and Daniel Abadi
Yale University
Facebook Social Graph
Social Graphs
Web Graphs
Semantic Graphs
Many systems use hash partitioning
● Results in many edges being “cut”
Given a graph G and an integer k, partition the vertices into k disjoint sets such
that:
● as few cuts as possible
● as balanced as possible
Graph Partitioning
NP Hard
Multilevel scheme Coarsening phase
State of the Art
The only constant is change.
-------- Heraclitus
To Make the Problem more Complicated
Social graphs: new people and friendships
Semantic Web graphs: new knowledge
Web graphs: new websites and links
Dynamic Graphs
A
Partition 1 Partition 2
Is partition 1 still the
better partition for A?
Repartitioning the entire graph upon every change is way too expensive
New Framework
Leopard:
● Locally reassess partitioning as a result of changes
without a full re-partitioning
● Integrates consideration of replication with partitioning
Outline
Background and Motivation
LEOPARD
Overview
Computation Skipping
Replication
Experiments
Algorithm Overview
For each added/deleted edge <V1, V2>
Compute best partition for V1 using a heuristic
Re-assign V1 if needed
The same for V2
Example: Adding an Edge
A
B
Partition 1 Partition 2
Compute the Partition for B
A
B
Partition 1 Partition 2# neighbours: 1
# vertices: 5
# neighbours: 3
# vertices: 3
Goals: (1) few cuts and (2) balanced
Heuristic: # neighbours * (1 - #vertices/capacity)
1 * (1 - 5/6) = 0.17 3 * (1 - 3/6) = 1.5
Higher score
This heuristic is simple for
the sake of presentation.
More advanced heuristics
are discussed in the paper
Compute the Partition for A
A
B
Partition 1 Partition 2# neighbours: 1
# vertices: 4
# neighbours: 2
# vertices: 4
Goals: (1) few cuts and (2) balanced
Heuristic: # neighbours * (1 - #vertices/capacity)
1 * (1 - 4/6) = 0.33 2 * (1 - 4/6) = 0.66
Higher score
Example: Adding an Edge
B
Partition 1 Partition 2
A
(1) B stays put
(2) A moves to partition 2
Outline
Background and Motivation
Leopard
Overview
Computation Skipping
Replication
Experiments
Computation cost
For each new edge, must:
For both vertexes involved in the edge:
Calculate the heuristic for each partition
(May involve communication for remote vertex location lookup)
Computation Skipping
Observation: As the number of neighbors of a vertex increases, the influence of a
new neighbor decreases.
Computation Skipping
Basic Idea: Accumulate changes for a vertex, if the changes exceed a certain
threshold, recompute the partition for the vertex.
For example, threshold = # accumulated changes / # neighbors = 20%.
(1) Compute the partition when V has 10 neighbors. Then 2 new edges are
added for V: 2 / 12 = 17% < 20%. Don’t recompute
(2) When 1 more new edge is added for V: 3 / 13 = 23% > 20%. Recompute the
partition for V. Reset # accumulated changes to 0.
Outline
Background and Motivation
Leopard
Overview
Computation Skipping
Replication
Experiments
Goals of replication:
fault tolerance (k copies for each data
point/block)
further cut reduction
Replication
It takes two parameters:
● minimum: fault tolerance
● average: cut reduction
Minimum-Average Replication
Example
# copies vertices
2 A,C,D,E,H,J,K,L
3 F,I
4 B,G
min = 2
average = 2.5
first copy
replica
Example
# copies vertices
2 A,C,D,E,H,J,K,L
3 F,I
4 B,G
min = 2
average = 2.5
How Many Copies?
A
Partition 1 Partition 4Partition 3Partition 2
0.1 0.40.30.2
minimum = 2
average = 3
Scores of each partition
How Many Copies?
A
Partition 1 Partition 4Partition 3Partition 2
0.1 0.40.30.2
minimum = 2
average = 3
minimum requirementWhat about them?
Always keep the last n computed scores.
Comparing against Past Scores
0.220.290.30.40.870.9 0.2 0.11 0.1
High Low
... ... ... ... ....
minimum = 2
average = 3
cutoff: top avg-1/k-1 percent of scores
Comparing against Past Scores
0.220.290.30.40.870.9 0.2 0.11 0.1
High Low
... ... ... ... ....
minimum = 2
average = 3
30th 31th
# copies: 2
cutoff: 30th highest score
Comparing against Past Scores
0.220.290.30.40.870.9 0.2 0.11 0.1
High Low
... ... ... ... ....
minimum = 2
average = 3
30th 31th
# copies: 2
cutoff: 30th highest score
Comparing against Past Scores
0.220.290.30.40.870.9 0.2 0.11 0.1
High Low
... ... ... ... ....
minimum = 2
average = 3
30th 31th
# copies: 3
cutoff: 30th highest score
Comparing against Past Scores
0.220.290.30.40.870.9 0.2 0.11 0.1
High Low
... ... ... ... ....
minimum = 2
average = 3
30th
# copies: 4
cutoff: 30th highest score
Outline
Background and Motivation
Leopard
Experiments
Experiment Setup
● Comparison points
○ Leopard with FENNEL heustitics
○ One-pass FENNEL (no vertex reassignment)
○ METIS (static graphs)
○ ParMETIS (repartitioning for dynamic graphs)
○ Hash Partitioning
● Graph Datasets
○ Type: social graphs, collaboration graphs, Web graphs, email graphs, and synthetic graphs
○ Size: up to 66 million vertices and 1.8 billion edges
Edge Cut
Computation Skipping
Effect of Replication on Edge Cut
Thanks!
Q & A

More Related Content

What's hot

Data Structure and Algorithms Sorting
Data Structure and Algorithms SortingData Structure and Algorithms Sorting
Data Structure and Algorithms SortingManishPrajapati78
 
phase lag Design using Rout locous
phase lag Design using Rout locousphase lag Design using Rout locous
phase lag Design using Rout locousRajal Patel
 
SIGNIFICANCE OF BLOCK DIAGRAM AND SIGNAL FLOW GRAPH IN CONTROL SYSTEM
SIGNIFICANCE OF BLOCK DIAGRAM AND SIGNAL FLOW GRAPH IN CONTROL SYSTEMSIGNIFICANCE OF BLOCK DIAGRAM AND SIGNAL FLOW GRAPH IN CONTROL SYSTEM
SIGNIFICANCE OF BLOCK DIAGRAM AND SIGNAL FLOW GRAPH IN CONTROL SYSTEMDinesh Sharma
 
(slides 5) Visual Computing: Geometry, Graphics, and Vision
(slides 5) Visual Computing: Geometry, Graphics, and Vision(slides 5) Visual Computing: Geometry, Graphics, and Vision
(slides 5) Visual Computing: Geometry, Graphics, and VisionFrank Nielsen
 

What's hot (8)

Data Structure and Algorithms Sorting
Data Structure and Algorithms SortingData Structure and Algorithms Sorting
Data Structure and Algorithms Sorting
 
Hsieh etal spl
Hsieh etal splHsieh etal spl
Hsieh etal spl
 
Double patterning for 32nm and beyond
Double patterning for 32nm and beyondDouble patterning for 32nm and beyond
Double patterning for 32nm and beyond
 
phase lag Design using Rout locous
phase lag Design using Rout locousphase lag Design using Rout locous
phase lag Design using Rout locous
 
Abeer graph
Abeer graphAbeer graph
Abeer graph
 
SIGNIFICANCE OF BLOCK DIAGRAM AND SIGNAL FLOW GRAPH IN CONTROL SYSTEM
SIGNIFICANCE OF BLOCK DIAGRAM AND SIGNAL FLOW GRAPH IN CONTROL SYSTEMSIGNIFICANCE OF BLOCK DIAGRAM AND SIGNAL FLOW GRAPH IN CONTROL SYSTEM
SIGNIFICANCE OF BLOCK DIAGRAM AND SIGNAL FLOW GRAPH IN CONTROL SYSTEM
 
GRASSy GIS
GRASSy GISGRASSy GIS
GRASSy GIS
 
(slides 5) Visual Computing: Geometry, Graphics, and Vision
(slides 5) Visual Computing: Geometry, Graphics, and Vision(slides 5) Visual Computing: Geometry, Graphics, and Vision
(slides 5) Visual Computing: Geometry, Graphics, and Vision
 

Viewers also liked

SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 
Beckman abadi-5min-pres
Beckman abadi-5min-presBeckman abadi-5min-pres
Beckman abadi-5min-presDaniel Abadi
 
Daniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 PanelDaniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 PanelDaniel Abadi
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Daniel Abadi
 
Hadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and OpportunitiesHadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and OpportunitiesDaniel Abadi
 
Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13Daniel Abadi
 
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...Daniel Abadi
 
Consistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System DesignConsistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System DesignArinto Murdopo
 
VLDB 2009 Tutorial on Column-Stores
VLDB 2009 Tutorial on Column-StoresVLDB 2009 Tutorial on Column-Stores
VLDB 2009 Tutorial on Column-StoresDaniel Abadi
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database SystemsDaniel Abadi
 
Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010Daniel Abadi
 
CAP, PACELC, and Determinism
CAP, PACELC, and DeterminismCAP, PACELC, and Determinism
CAP, PACELC, and DeterminismDaniel Abadi
 
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Daniel Abadi
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...Kai Wähner
 

Viewers also liked (16)

SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
Beckman abadi-5min-pres
Beckman abadi-5min-presBeckman abadi-5min-pres
Beckman abadi-5min-pres
 
Invisible loading
Invisible loadingInvisible loading
Invisible loading
 
Daniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 PanelDaniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 Panel
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012
 
Hadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and OpportunitiesHadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and Opportunities
 
Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13
 
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
 
Consistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System DesignConsistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System Design
 
VLDB 2009 Tutorial on Column-Stores
VLDB 2009 Tutorial on Column-StoresVLDB 2009 Tutorial on Column-Stores
VLDB 2009 Tutorial on Column-Stores
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database Systems
 
Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010
 
CAP, PACELC, and Determinism
CAP, PACELC, and DeterminismCAP, PACELC, and Determinism
CAP, PACELC, and Determinism
 
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 

Similar to Leopard: Lightweight Partitioning and Replication for Dynamic Graphs

Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Double Patterning
Double PatterningDouble Patterning
Double PatterningDanny Luk
 
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01Hemant Jha
 
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Michael Mathioudakis
 
Analysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topologyAnalysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topologyAmir Masoud Sefidian
 
0015.register allocation-graph-coloring
0015.register allocation-graph-coloring0015.register allocation-graph-coloring
0015.register allocation-graph-coloringsean chen
 
Graph processing
Graph processingGraph processing
Graph processingyeahjs
 
Tree distance algorithm
Tree distance algorithmTree distance algorithm
Tree distance algorithmTrector Rancor
 
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4arogozhnikov
 
Signal flow graph Mason’s Gain Formula
Signal flow graph Mason’s Gain Formula Signal flow graph Mason’s Gain Formula
Signal flow graph Mason’s Gain Formula vishalgohel12195
 
Reducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology MappingReducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology Mappingsatrajit
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersJen Aman
 
ImageSegmentation (1).ppt
ImageSegmentation (1).pptImageSegmentation (1).ppt
ImageSegmentation (1).pptNoorUlHaq47
 
ImageSegmentation.ppt
ImageSegmentation.pptImageSegmentation.ppt
ImageSegmentation.pptAVUDAI1
 
ImageSegmentation.ppt
ImageSegmentation.pptImageSegmentation.ppt
ImageSegmentation.pptDEEPUKUMARR
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAprithan
 

Similar to Leopard: Lightweight Partitioning and Replication for Dynamic Graphs (20)

Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Like 2014214
Like 2014214Like 2014214
Like 2014214
 
Double Patterning
Double PatterningDouble Patterning
Double Patterning
 
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02
 
Floor planning ppt
Floor planning pptFloor planning ppt
Floor planning ppt
 
Analysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topologyAnalysis and design of a half hypercube interconnection network topology
Analysis and design of a half hypercube interconnection network topology
 
0015.register allocation-graph-coloring
0015.register allocation-graph-coloring0015.register allocation-graph-coloring
0015.register allocation-graph-coloring
 
Graph processing
Graph processingGraph processing
Graph processing
 
Tree distance algorithm
Tree distance algorithmTree distance algorithm
Tree distance algorithm
 
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
Signal flow graph Mason’s Gain Formula
Signal flow graph Mason’s Gain Formula Signal flow graph Mason’s Gain Formula
Signal flow graph Mason’s Gain Formula
 
Reducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology MappingReducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology Mapping
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
ImageSegmentation (1).ppt
ImageSegmentation (1).pptImageSegmentation (1).ppt
ImageSegmentation (1).ppt
 
ImageSegmentation.ppt
ImageSegmentation.pptImageSegmentation.ppt
ImageSegmentation.ppt
 
ImageSegmentation.ppt
ImageSegmentation.pptImageSegmentation.ppt
ImageSegmentation.ppt
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Leopard: Lightweight Partitioning and Replication for Dynamic Graphs

  • 1. Leopard: Lightweight Partitioning and Replication for Dynamic Graphs Jiewen Huang and Daniel Abadi Yale University
  • 6. Many systems use hash partitioning ● Results in many edges being “cut” Given a graph G and an integer k, partition the vertices into k disjoint sets such that: ● as few cuts as possible ● as balanced as possible Graph Partitioning NP Hard
  • 7. Multilevel scheme Coarsening phase State of the Art
  • 8. The only constant is change. -------- Heraclitus To Make the Problem more Complicated Social graphs: new people and friendships Semantic Web graphs: new knowledge Web graphs: new websites and links
  • 9. Dynamic Graphs A Partition 1 Partition 2 Is partition 1 still the better partition for A?
  • 10. Repartitioning the entire graph upon every change is way too expensive New Framework Leopard: ● Locally reassess partitioning as a result of changes without a full re-partitioning ● Integrates consideration of replication with partitioning
  • 12. Algorithm Overview For each added/deleted edge <V1, V2> Compute best partition for V1 using a heuristic Re-assign V1 if needed The same for V2
  • 13. Example: Adding an Edge A B Partition 1 Partition 2
  • 14. Compute the Partition for B A B Partition 1 Partition 2# neighbours: 1 # vertices: 5 # neighbours: 3 # vertices: 3 Goals: (1) few cuts and (2) balanced Heuristic: # neighbours * (1 - #vertices/capacity) 1 * (1 - 5/6) = 0.17 3 * (1 - 3/6) = 1.5 Higher score This heuristic is simple for the sake of presentation. More advanced heuristics are discussed in the paper
  • 15. Compute the Partition for A A B Partition 1 Partition 2# neighbours: 1 # vertices: 4 # neighbours: 2 # vertices: 4 Goals: (1) few cuts and (2) balanced Heuristic: # neighbours * (1 - #vertices/capacity) 1 * (1 - 4/6) = 0.33 2 * (1 - 4/6) = 0.66 Higher score
  • 16. Example: Adding an Edge B Partition 1 Partition 2 A (1) B stays put (2) A moves to partition 2
  • 18. Computation cost For each new edge, must: For both vertexes involved in the edge: Calculate the heuristic for each partition (May involve communication for remote vertex location lookup)
  • 19. Computation Skipping Observation: As the number of neighbors of a vertex increases, the influence of a new neighbor decreases.
  • 20. Computation Skipping Basic Idea: Accumulate changes for a vertex, if the changes exceed a certain threshold, recompute the partition for the vertex. For example, threshold = # accumulated changes / # neighbors = 20%. (1) Compute the partition when V has 10 neighbors. Then 2 new edges are added for V: 2 / 12 = 17% < 20%. Don’t recompute (2) When 1 more new edge is added for V: 3 / 13 = 23% > 20%. Recompute the partition for V. Reset # accumulated changes to 0.
  • 22. Goals of replication: fault tolerance (k copies for each data point/block) further cut reduction Replication
  • 23. It takes two parameters: ● minimum: fault tolerance ● average: cut reduction Minimum-Average Replication
  • 24. Example # copies vertices 2 A,C,D,E,H,J,K,L 3 F,I 4 B,G min = 2 average = 2.5 first copy replica
  • 25. Example # copies vertices 2 A,C,D,E,H,J,K,L 3 F,I 4 B,G min = 2 average = 2.5
  • 26. How Many Copies? A Partition 1 Partition 4Partition 3Partition 2 0.1 0.40.30.2 minimum = 2 average = 3 Scores of each partition
  • 27. How Many Copies? A Partition 1 Partition 4Partition 3Partition 2 0.1 0.40.30.2 minimum = 2 average = 3 minimum requirementWhat about them?
  • 28. Always keep the last n computed scores. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 cutoff: top avg-1/k-1 percent of scores
  • 29. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th 31th # copies: 2 cutoff: 30th highest score
  • 30. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th 31th # copies: 2 cutoff: 30th highest score
  • 31. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th 31th # copies: 3 cutoff: 30th highest score
  • 32. Comparing against Past Scores 0.220.290.30.40.870.9 0.2 0.11 0.1 High Low ... ... ... ... .... minimum = 2 average = 3 30th # copies: 4 cutoff: 30th highest score
  • 34. Experiment Setup ● Comparison points ○ Leopard with FENNEL heustitics ○ One-pass FENNEL (no vertex reassignment) ○ METIS (static graphs) ○ ParMETIS (repartitioning for dynamic graphs) ○ Hash Partitioning ● Graph Datasets ○ Type: social graphs, collaboration graphs, Web graphs, email graphs, and synthetic graphs ○ Size: up to 66 million vertices and 1.8 billion edges
  • 37. Effect of Replication on Edge Cut