Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud

•Download as PPTX, PDF•

4 likes•13,708 views

University of New South Wales

Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud Alexander G. Connor Panos K. Chrysanthis AlexandrosLabrinidis Advanced Data Management Technologies Laboratory Department of Computer Science University of Pittsburgh

Leveraging Locality Can we take advantage of the connections? What if we stored connected user’s profiles and data in the same place? Locality can be leveraged The number of connections is reduced User data can be pre-fetched We can think of this as a graph partitioning problem… Partitions = machines Vertices = user profiles, including update Edges = connections Objective: minimize the number of edges that cross partitions

Example – graph partitioning ,[object Object]

Accessing a vertex’s neighbors requires accessing many partitions

In a social network, requesting updates from followed users requires connecting to many machines

Accessing a vertex’s neighbors requires accessing few partitions

In a social network, fewer connections are made and related user data can be pre-fetched,[object Object]

Outline Introduction Data in Social Networks Leveraging Locality Key-Key-Value Stores System Model Client API Adding a Key-Key-Value Load management On-line partitioning algorithm Simulation Parameters Results Conclusion

Address Table: Mapping Store ,[object Object]

maps keys to virtual machinesPhysical Layer: Physical machines ,[object Object],Logical Layer: Virtual machines ,[object Object]

Can be moved between physical machines as neededApplication Layer: Client API ,[object Object]

cached dataApplication Sessions Address table Virtual hosts Physical hosts

Client API and Sessions Clients use a simple API that includes the get, put and sync commands Data is pulled from the logical layer in blocks Groups of related keys The client API keeps data in an in-memory cache Data is pushed out asynchronously to virtual nodes in blocks Push/pull can be done synchronously if requested by the client Offers stronger consistency at the cost of performance

Adding a key-key-value put(alice, bob, follows) The on-line partitioning algorithm moves Alice’s data to Bob’s node because they are connected Two users: Alice and Bob Write the data to that node Write the same data to that node Use the Address Table to determine the virtual machine (node) that hosts Alice’s data Use the address table to determine the node that hosts Bob’s data Address table bob 8,8 8,8 alice 1,1 Virtual hosts kv(bob, ...) ... kkv(alice, bob, follows) kv(alice, ...) ... kkv(alice, bob, follows) 1,1 8,8

Once the split is complete, new physical machines can be turned on ,[object Object],If one node becomes overloaded, it can initiate a split To maintain the grid structure, nodes in the same row and column must also split Virtual hosts Splitting a Node

What's hot

Webserver Administration: Apache as a case studyTata Consultancy Services

Network switchRavinder Kaur

System and network administration network servicesUc Man

Offloading in Mobile Cloud ComputingSaif Salah

network storagepranayakumar1986

Introduction to VimBrandon Liu

computer networksbhavanatmithun

Java Introductionsunmitraeducation

NETWORK COMPONENTSbwire sedrick

Components of client server applicationAshwin Ananthapadmanabhan

Basics of JavaSherihan Anver

Programming in JavaAbhilash Nair

Basic Server PPT (THDC)Vineet Pokhriyal

Virtualization in cloud computingMohammad Ilyas Malik

Clientserver PresentationTuhin_Das

Computer networks--networking hardwareMziaulla

CSharp.pptckthesolo

Introduction to ServerAnacrissa Soriano

Fundamentals of JAVAKUNAL GADHIA

Cloud computing pptYogi Dadhich

What's hot (20)

Webserver Administration: Apache as a case study

Network switch

System and network administration network services

Offloading in Mobile Cloud Computing

network storage

Introduction to Vim

computer networks

Java Introduction

NETWORK COMPONENTS

Components of client server application

Basics of Java

Programming in Java

Basic Server PPT (THDC)

Virtualization in cloud computing

Clientserver Presentation

Computer networks--networking hardware

CSharp.ppt

Introduction to Server

Fundamentals of JAVA

Cloud computing ppt

Viewers also liked

LinkedIn Graph PresentationAmy W. Tang

TAO: Facebook's Distributed Data Store for the Social GraphAdrian-Tudor Panescu

DexUniversity of New South Wales

Visualizing My Facebook NetworksAndy Carvin

Facebook's TAO & Unicorn data storage and search platformsNitish Upreti

Social Network Analysis at LinkedInMitul Tiwari

Graph database Use CasesMax De Marzi

Viewers also liked (7)

LinkedIn Graph Presentation

TAO: Facebook's Distributed Data Store for the Social Graph

Dex

Visualizing My Facebook Networks

Facebook's TAO & Unicorn data storage and search platforms

Social Network Analysis at LinkedIn

Graph database Use Cases

Similar to Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud

IEEE ICPADS 2008 - Kalman Graffi - SkyEye.KOM: An Information Management Over...Kalman Graffi

Predictive maintenance withsensors_in_utilities_Tina Zhang

Software architecture unit 4yawani05

Why Data Virtualization? An Introduction by DenodoJusto Hidalgo

Next-Generation Completeness and Consistency Management in the Digital Threa...Ákos Horváth

ML on Big Data: Real-Time Analysis on Time SeriesSigmoid

Future prediction-dsMuhammad Umar Farooq

Distributed information sysMeena Chauhan

IncQuery_presentation_Incose_EMEA_WSEC.pptxIncQuery Labs

Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesComunidade NetPonto

Introduction To Cloud ComputingRinat Shagisultanov

Porting Spring PetClinic to GigaSpacesUri Cohen

Integration Patterns for Big Data ApplicationsMichael Häusler

Linking Programming models between Grids, Web 2.0 and Multicore Geoffrey Fox

RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster

ASP.NET 3.5 SP1Dave Allen

Cloud to hybrid edge cloud evolution Jun112020.pptxMichel Burger

How to govern and secure a Data Mesh?confluent

Handling Data in Mega Scale SystemsDirecti Group

Presentation on Cloud MashupsMichael Heydt

Similar to Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud (20)

IEEE ICPADS 2008 - Kalman Graffi - SkyEye.KOM: An Information Management Over...

Predictive maintenance withsensors_in_utilities_

Software architecture unit 4

Why Data Virtualization? An Introduction by Denodo

Next-Generation Completeness and Consistency Management in the Digital Threa...

ML on Big Data: Real-Time Analysis on Time Series

Future prediction-ds

Distributed information sys

IncQuery_presentation_Incose_EMEA_WSEC.pptx

Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações

Introduction To Cloud Computing

Porting Spring PetClinic to GigaSpaces

Integration Patterns for Big Data Applications

Linking Programming models between Grids, Web 2.0 and Multicore

RAMSES: Robust Analytic Models for Science at Extreme Scales

ASP.NET 3.5 SP1

Cloud to hybrid edge cloud evolution Jun112020.pptx

How to govern and secure a Data Mesh?

Handling Data in Mega Scale Systems

Presentation on Cloud Mashups

More from University of New South Wales

Declarative analysis of noisy information networksUniversity of New South Wales

InfiniteGraphUniversity of New South Wales

Gremlin University of New South Wales

DHHT - Modeling beyond plain graphsUniversity of New South Wales

Ontological Conjunctive Query Answering over Large Knowledge BasesUniversity of New South Wales

AllegographUniversity of New South Wales

Neo4jUniversity of New South Wales

Dependable Cardinality Forecast for XQueryUniversity of New South Wales

GraphREL: A Relational Graph Query ProcessorUniversity of New South Wales

XML Compression BenchmarkUniversity of New South Wales

More from University of New South Wales (10)

Declarative analysis of noisy information networks

InfiniteGraph

Gremlin

DHHT - Modeling beyond plain graphs

Ontological Conjunctive Query Answering over Large Knowledge Bases

Allegograph

Neo4j

Dependable Cardinality Forecast for XQuery

GraphREL: A Relational Graph Query Processor

XML Compression Benchmark

Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud

1. Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud Alexander G. Connor Panos K. Chrysanthis AlexandrosLabrinidis Advanced Data Management Technologies Laboratory Department of Computer Science University of Pittsburgh

2. Data in social networks A social network manages user profiles, updates and connections How to manage this data in a scalable way? Key-value stores offer performance under high load Some observations about social networks A profile view usually includes data from a user’s friends Spatial locality A friend’s profile is often visited next Temporal locality Requests might ask for updates from several users Web pages might include pieces of several user profiles A single request requires connecting to many machines

3. Connections in a Social Network Alice

4. Leveraging Locality Can we take advantage of the connections? What if we stored connected user’s profiles and data in the same place? Locality can be leveraged The number of connections is reduced User data can be pre-fetched We can think of this as a graph partitioning problem… Partitions = machines Vertices = user profiles, including update Edges = connections Objective: minimize the number of edges that cross partitions

6. Accessing a vertex’s neighbors requires accessing many partitions

7. In a social network, requesting updates from followed users requires connecting to many machines

8. Far fewer edges cross partitions

9. Accessing a vertex’s neighbors requires accessing few partitions

10.

11. Outline Introduction Data in Social Networks Leveraging Locality Key-Key-Value Stores System Model Client API Adding a Key-Key-Value Load management On-line partitioning algorithm Simulation Parameters Results Conclusion

12.

13.

14. Run the KKV store software

15. Manage replication

16.

17. cached dataApplication Sessions Address table Virtual hosts Physical hosts

18. Client API and Sessions Clients use a simple API that includes the get, put and sync commands Data is pulled from the logical layer in blocks Groups of related keys The client API keeps data in an in-memory cache Data is pushed out asynchronously to virtual nodes in blocks Push/pull can be done synchronously if requested by the client Offers stronger consistency at the cost of performance

19. Adding a key-key-value put(alice, bob, follows) The on-line partitioning algorithm moves Alice’s data to Bob’s node because they are connected Two users: Alice and Bob Write the data to that node Write the same data to that node Use the Address Table to determine the virtual machine (node) that hosts Alice’s data Use the address table to determine the node that hosts Bob’s data Address table bob 8,8 8,8 alice 1,1 Virtual hosts kv(bob, ...) ... kkv(alice, bob, follows) kv(alice, ...) ... kkv(alice, bob, follows) 1,1 8,8

20.

21. Outline Introduction Data in Social Networks Leveraging Locality Key-Key-Value Stores System Model Client API Adding a Key-Key-Value Load management On-line Partitioning Algorithm Simulation Parameters Results Conclusion

22. On-line Partitioning Algorithm Runs periodically in parallel on each virtual node Also after a split or merge For each key stored on a node Determine the number of connections (key-key-values) with keys on other nodes Can also be sum of edge weights Find the node that has the most connections If that node is different than the current node If the number of connections to that node is greater than the number of connections to the current node If this margin is greater than some threshold Move the key to the other node Update the address table Designed to work in a distributed, dynamic setting NOT a replacement for off-line algorithms in static settings

23. Partitioning Example 2,1 1,1 1,2 NodeSum(Edges) 1,1 0 2,1 2 1,2 1

24. Partitioning Example 2,1 1,1 1,2

25. Experimental Parameters

26. Partitioning Quality Results % Edges in partition Vertices in graph On-line partitions as well as Kernighan-Lin

27. Partitioning Performance Results Vertices moved Vertices in graph On-line partitions 2x faster than Kernighan-Lin!

28. Conclusions Contributions: A novel model for scalable graph data stores that extends the key-value model Key-key-valuestore A high-level system design A novel on-line partitioning algorithm Preliminary experimental results Our proposed algorithm shows promise in the distributed, dynamic setting

29. What’s Ahead? Prototype system implementation Java, PostgreSQL Performance Analysis against MongoDB, Cassandra Sensitivity Analysis Cloud Deployment

30. Thank You! Acknowledgments Daniel Cole, Nick Farnan, Thao Pham, Sean Snyder ADMT Lab, CS Department, Pitt GPSA, Pitt A&S GSO, Pitt A&S PBC

Editor's Notes

Two users: Alice and BobPut command – store “Alice Follows Bob”Use the Address Table to determine the virtual machine (node) that hosts Alice’s dataWrite the data to that nodeUse the address table to determine the node that hosts Bob’s dataWrite the same data to that nodeThe on-line partitioning algorithm moves Alice’s data to Bob’s node because they are connected
Nodes in the logical layer have to handle varying demandsIf one node becomes overloaded, it can initiate a splitTo maintain the grid structure, nodes in the same row and column must also splitThe grid is used for replicationIt is used for efficient locking and messagingOnce the split is complete, new physical machines can be turned onVirtual nodes can be transferred to these new machinesSimilarly, as load decreases virtual nodes can be transferred off of physical machinesSome physical machines can then be shut down to save powerVirtual nodes can be merged back together
Works by improving partitions – doesn’t create them from scratchOn-line means that it works with a changing graph – structure frequently changes
The algorithm runs in parallel on each node When a split or merge occurs When load is below a thresholdEach vertex is considered in turn Find the number of edges to each node Edges can be weighted Find the node with the greatest no. edges If different, and the gain is > threshold, move vertex

Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud

Similar to Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud (20)

More from University of New South Wales

More from University of New South Wales (10)

Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud

Editor's Notes