SlideShare a Scribd company logo
1 of 21
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud Alexander G. Connor Panos K. Chrysanthis AlexandrosLabrinidis Advanced Data Management Technologies Laboratory Department of Computer Science University of Pittsburgh
Data in social networks A social network manages user profiles, updates and connections How to manage this data in a scalable way? Key-value stores offer performance under high load Some observations about social networks A profile view usually includes data from a user’s friends Spatial locality A friend’s profile is often visited next Temporal locality Requests might ask for updates from several users Web pages might include pieces of several user profiles A single request requires connecting to many machines
Connections in a Social Network Alice
Leveraging Locality Can we take advantage of the connections? What if we stored connected user’s profiles and data in the same place? Locality can be leveraged  The number of connections is reduced User data can be pre-fetched We can think of this as a graph partitioning problem… Partitions = machines Vertices = user profiles, including update Edges = connections Objective: minimize the number of edges that cross partitions
Example – graph partitioning ,[object Object]
Accessing a vertex’s neighbors requires accessing many partitions
In a social network, requesting updates from followed users requires connecting to many machines
Far fewer edges cross partitions
Accessing a vertex’s neighbors requires accessing few partitions
In a social network, fewer connections are made and related user data can be pre-fetched,[object Object]
Outline Introduction Data in Social Networks Leveraging Locality Key-Key-Value Stores System Model Client API Adding a Key-Key-Value Load management On-line partitioning algorithm Simulation Parameters Results Conclusion
Address Table: Mapping Store ,[object Object]
maps keys to virtual machinesPhysical Layer: Physical machines ,[object Object],Logical Layer: Virtual machines ,[object Object]
Run the KKV store software
Manage replication
Can be moved between physical machines as neededApplication Layer: Client API ,[object Object]
cached dataApplication Sessions Address table Virtual hosts Physical hosts
Client API and Sessions Clients use a simple API that includes the get, put and sync commands Data is pulled from the logical layer in blocks Groups of related keys The client API keeps data in an in-memory cache Data is pushed out asynchronously to virtual nodes in blocks Push/pull can be done synchronously if requested by the client Offers stronger consistency at the cost of performance
Adding a key-key-value put(alice, bob, follows) The on-line partitioning algorithm moves Alice’s data to Bob’s node because they are connected Two users: Alice and Bob Write the data to that node Write the same data to that node Use the Address Table to determine the virtual machine (node) that hosts Alice’s data Use the address table to determine the node that hosts Bob’s data Address table bob 8,8 8,8 alice 1,1 Virtual hosts kv(bob, ...) ... kkv(alice, bob, follows) kv(alice, ...) ... kkv(alice, bob, follows) 1,1 8,8
Once the split is complete, new physical machines can be turned on ,[object Object],If one node becomes overloaded, it can initiate a split To maintain the grid structure, nodes in the same row and column must also split Virtual hosts Splitting a Node
Outline Introduction Data in Social Networks Leveraging Locality Key-Key-Value Stores System Model Client API Adding a Key-Key-Value Load management On-line Partitioning Algorithm Simulation Parameters Results Conclusion

More Related Content

What's hot

Webserver Administration: Apache as a case study
Webserver Administration: Apache as a case studyWebserver Administration: Apache as a case study
Webserver Administration: Apache as a case studyTata Consultancy Services
 
System and network administration network services
System and network administration network servicesSystem and network administration network services
System and network administration network servicesUc Man
 
Offloading in Mobile Cloud Computing
Offloading in Mobile Cloud ComputingOffloading in Mobile Cloud Computing
Offloading in Mobile Cloud ComputingSaif Salah
 
Introduction to Vim
Introduction to VimIntroduction to Vim
Introduction to VimBrandon Liu
 
Clientserver Presentation
Clientserver PresentationClientserver Presentation
Clientserver PresentationTuhin_Das
 
Computer networks--networking hardware
Computer networks--networking hardwareComputer networks--networking hardware
Computer networks--networking hardwareMziaulla
 
Fundamentals of JAVA
Fundamentals of JAVAFundamentals of JAVA
Fundamentals of JAVAKUNAL GADHIA
 
Cloud computing ppt
Cloud computing pptCloud computing ppt
Cloud computing pptYogi Dadhich
 

What's hot (20)

Webserver Administration: Apache as a case study
Webserver Administration: Apache as a case studyWebserver Administration: Apache as a case study
Webserver Administration: Apache as a case study
 
Network switch
Network switchNetwork switch
Network switch
 
System and network administration network services
System and network administration network servicesSystem and network administration network services
System and network administration network services
 
Offloading in Mobile Cloud Computing
Offloading in Mobile Cloud ComputingOffloading in Mobile Cloud Computing
Offloading in Mobile Cloud Computing
 
network storage
network storagenetwork storage
network storage
 
Introduction to Vim
Introduction to VimIntroduction to Vim
Introduction to Vim
 
computer networks
computer networkscomputer networks
computer networks
 
Java Introduction
Java IntroductionJava Introduction
Java Introduction
 
NETWORK COMPONENTS
NETWORK COMPONENTSNETWORK COMPONENTS
NETWORK COMPONENTS
 
Components of client server application
Components of client server applicationComponents of client server application
Components of client server application
 
Basics of Java
Basics of JavaBasics of Java
Basics of Java
 
Programming in Java
Programming in JavaProgramming in Java
Programming in Java
 
Basic Server PPT (THDC)
Basic Server PPT (THDC)Basic Server PPT (THDC)
Basic Server PPT (THDC)
 
Virtualization in cloud computing
Virtualization in cloud computingVirtualization in cloud computing
Virtualization in cloud computing
 
Clientserver Presentation
Clientserver PresentationClientserver Presentation
Clientserver Presentation
 
Computer networks--networking hardware
Computer networks--networking hardwareComputer networks--networking hardware
Computer networks--networking hardware
 
CSharp.ppt
CSharp.pptCSharp.ppt
CSharp.ppt
 
Introduction to Server
Introduction to ServerIntroduction to Server
Introduction to Server
 
Fundamentals of JAVA
Fundamentals of JAVAFundamentals of JAVA
Fundamentals of JAVA
 
Cloud computing ppt
Cloud computing pptCloud computing ppt
Cloud computing ppt
 

Viewers also liked

LinkedIn Graph Presentation
LinkedIn Graph PresentationLinkedIn Graph Presentation
LinkedIn Graph PresentationAmy W. Tang
 
TAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social GraphTAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social GraphAdrian-Tudor Panescu
 
Visualizing My Facebook Networks
Visualizing My Facebook NetworksVisualizing My Facebook Networks
Visualizing My Facebook NetworksAndy Carvin
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsNitish Upreti
 
Social Network Analysis at LinkedIn
Social Network Analysis at LinkedInSocial Network Analysis at LinkedIn
Social Network Analysis at LinkedInMitul Tiwari
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 

Viewers also liked (7)

LinkedIn Graph Presentation
LinkedIn Graph PresentationLinkedIn Graph Presentation
LinkedIn Graph Presentation
 
TAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social GraphTAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social Graph
 
Dex
DexDex
Dex
 
Visualizing My Facebook Networks
Visualizing My Facebook NetworksVisualizing My Facebook Networks
Visualizing My Facebook Networks
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platforms
 
Social Network Analysis at LinkedIn
Social Network Analysis at LinkedInSocial Network Analysis at LinkedIn
Social Network Analysis at LinkedIn
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 

Similar to Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud

IEEE ICPADS 2008 - Kalman Graffi - SkyEye.KOM: An Information Management Over...
IEEE ICPADS 2008 - Kalman Graffi - SkyEye.KOM: An Information Management Over...IEEE ICPADS 2008 - Kalman Graffi - SkyEye.KOM: An Information Management Over...
IEEE ICPADS 2008 - Kalman Graffi - SkyEye.KOM: An Information Management Over...Kalman Graffi
 
Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Tina Zhang
 
Software architecture unit 4
Software architecture unit 4Software architecture unit 4
Software architecture unit 4yawani05
 
Why Data Virtualization? An Introduction by Denodo
Why Data Virtualization? An Introduction by DenodoWhy Data Virtualization? An Introduction by Denodo
Why Data Virtualization? An Introduction by DenodoJusto Hidalgo
 
Next-Generation Completeness and Consistency Management in the Digital Threa...
Next-Generation Completeness and Consistency Management in the Digital Threa...Next-Generation Completeness and Consistency Management in the Digital Threa...
Next-Generation Completeness and Consistency Management in the Digital Threa...Ákos Horváth
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesSigmoid
 
Distributed information sys
Distributed information sysDistributed information sys
Distributed information sysMeena Chauhan
 
IncQuery_presentation_Incose_EMEA_WSEC.pptx
IncQuery_presentation_Incose_EMEA_WSEC.pptxIncQuery_presentation_Incose_EMEA_WSEC.pptx
IncQuery_presentation_Incose_EMEA_WSEC.pptxIncQuery Labs
 
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesWindows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesComunidade NetPonto
 
Porting Spring PetClinic to GigaSpaces
Porting Spring PetClinic to GigaSpacesPorting Spring PetClinic to GigaSpaces
Porting Spring PetClinic to GigaSpacesUri Cohen
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsMichael Häusler
 
Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore Geoffrey Fox
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
 
ASP.NET 3.5 SP1
ASP.NET 3.5 SP1ASP.NET 3.5 SP1
ASP.NET 3.5 SP1Dave Allen
 
Cloud to hybrid edge cloud evolution Jun112020.pptx
Cloud to hybrid edge cloud evolution Jun112020.pptxCloud to hybrid edge cloud evolution Jun112020.pptx
Cloud to hybrid edge cloud evolution Jun112020.pptxMichel Burger
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?confluent
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsDirecti Group
 
Presentation on Cloud Mashups
Presentation on Cloud MashupsPresentation on Cloud Mashups
Presentation on Cloud MashupsMichael Heydt
 

Similar to Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud (20)

IEEE ICPADS 2008 - Kalman Graffi - SkyEye.KOM: An Information Management Over...
IEEE ICPADS 2008 - Kalman Graffi - SkyEye.KOM: An Information Management Over...IEEE ICPADS 2008 - Kalman Graffi - SkyEye.KOM: An Information Management Over...
IEEE ICPADS 2008 - Kalman Graffi - SkyEye.KOM: An Information Management Over...
 
Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_
 
Software architecture unit 4
Software architecture unit 4Software architecture unit 4
Software architecture unit 4
 
Why Data Virtualization? An Introduction by Denodo
Why Data Virtualization? An Introduction by DenodoWhy Data Virtualization? An Introduction by Denodo
Why Data Virtualization? An Introduction by Denodo
 
Next-Generation Completeness and Consistency Management in the Digital Threa...
Next-Generation Completeness and Consistency Management in the Digital Threa...Next-Generation Completeness and Consistency Management in the Digital Threa...
Next-Generation Completeness and Consistency Management in the Digital Threa...
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time Series
 
Future prediction-ds
Future prediction-dsFuture prediction-ds
Future prediction-ds
 
Distributed information sys
Distributed information sysDistributed information sys
Distributed information sys
 
IncQuery_presentation_Incose_EMEA_WSEC.pptx
IncQuery_presentation_Incose_EMEA_WSEC.pptxIncQuery_presentation_Incose_EMEA_WSEC.pptx
IncQuery_presentation_Incose_EMEA_WSEC.pptx
 
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesWindows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud Computing
 
Porting Spring PetClinic to GigaSpaces
Porting Spring PetClinic to GigaSpacesPorting Spring PetClinic to GigaSpaces
Porting Spring PetClinic to GigaSpaces
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
ASP.NET 3.5 SP1
ASP.NET 3.5 SP1ASP.NET 3.5 SP1
ASP.NET 3.5 SP1
 
Cloud to hybrid edge cloud evolution Jun112020.pptx
Cloud to hybrid edge cloud evolution Jun112020.pptxCloud to hybrid edge cloud evolution Jun112020.pptx
Cloud to hybrid edge cloud evolution Jun112020.pptx
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 
Presentation on Cloud Mashups
Presentation on Cloud MashupsPresentation on Cloud Mashups
Presentation on Cloud Mashups
 

More from University of New South Wales (10)

Declarative analysis of noisy information networks
Declarative analysis of noisy information networksDeclarative analysis of noisy information networks
Declarative analysis of noisy information networks
 
InfiniteGraph
InfiniteGraphInfiniteGraph
InfiniteGraph
 
Gremlin
Gremlin Gremlin
Gremlin
 
DHHT - Modeling beyond plain graphs
DHHT - Modeling beyond plain graphsDHHT - Modeling beyond plain graphs
DHHT - Modeling beyond plain graphs
 
Ontological Conjunctive Query Answering over Large Knowledge Bases
Ontological Conjunctive Query Answering over Large Knowledge BasesOntological Conjunctive Query Answering over Large Knowledge Bases
Ontological Conjunctive Query Answering over Large Knowledge Bases
 
Allegograph
AllegographAllegograph
Allegograph
 
Neo4j
Neo4jNeo4j
Neo4j
 
Dependable Cardinality Forecast for XQuery
Dependable Cardinality Forecast for XQueryDependable Cardinality Forecast for XQuery
Dependable Cardinality Forecast for XQuery
 
GraphREL: A Relational Graph Query Processor
GraphREL: A Relational Graph Query ProcessorGraphREL: A Relational Graph Query Processor
GraphREL: A Relational Graph Query Processor
 
XML Compression Benchmark
XML Compression BenchmarkXML Compression Benchmark
XML Compression Benchmark
 

Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud

  • 1. Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud Alexander G. Connor Panos K. Chrysanthis AlexandrosLabrinidis Advanced Data Management Technologies Laboratory Department of Computer Science University of Pittsburgh
  • 2. Data in social networks A social network manages user profiles, updates and connections How to manage this data in a scalable way? Key-value stores offer performance under high load Some observations about social networks A profile view usually includes data from a user’s friends Spatial locality A friend’s profile is often visited next Temporal locality Requests might ask for updates from several users Web pages might include pieces of several user profiles A single request requires connecting to many machines
  • 3. Connections in a Social Network Alice
  • 4. Leveraging Locality Can we take advantage of the connections? What if we stored connected user’s profiles and data in the same place? Locality can be leveraged The number of connections is reduced User data can be pre-fetched We can think of this as a graph partitioning problem… Partitions = machines Vertices = user profiles, including update Edges = connections Objective: minimize the number of edges that cross partitions
  • 5.
  • 6. Accessing a vertex’s neighbors requires accessing many partitions
  • 7. In a social network, requesting updates from followed users requires connecting to many machines
  • 8. Far fewer edges cross partitions
  • 9. Accessing a vertex’s neighbors requires accessing few partitions
  • 10.
  • 11. Outline Introduction Data in Social Networks Leveraging Locality Key-Key-Value Stores System Model Client API Adding a Key-Key-Value Load management On-line partitioning algorithm Simulation Parameters Results Conclusion
  • 12.
  • 13.
  • 14. Run the KKV store software
  • 16.
  • 17. cached dataApplication Sessions Address table Virtual hosts Physical hosts
  • 18. Client API and Sessions Clients use a simple API that includes the get, put and sync commands Data is pulled from the logical layer in blocks Groups of related keys The client API keeps data in an in-memory cache Data is pushed out asynchronously to virtual nodes in blocks Push/pull can be done synchronously if requested by the client Offers stronger consistency at the cost of performance
  • 19. Adding a key-key-value put(alice, bob, follows) The on-line partitioning algorithm moves Alice’s data to Bob’s node because they are connected Two users: Alice and Bob Write the data to that node Write the same data to that node Use the Address Table to determine the virtual machine (node) that hosts Alice’s data Use the address table to determine the node that hosts Bob’s data Address table bob 8,8 8,8 alice 1,1 Virtual hosts kv(bob, ...) ... kkv(alice, bob, follows) kv(alice, ...) ... kkv(alice, bob, follows) 1,1 8,8
  • 20.
  • 21. Outline Introduction Data in Social Networks Leveraging Locality Key-Key-Value Stores System Model Client API Adding a Key-Key-Value Load management On-line Partitioning Algorithm Simulation Parameters Results Conclusion
  • 22. On-line Partitioning Algorithm Runs periodically in parallel on each virtual node Also after a split or merge For each key stored on a node Determine the number of connections (key-key-values) with keys on other nodes Can also be sum of edge weights Find the node that has the most connections If that node is different than the current node If the number of connections to that node is greater than the number of connections to the current node If this margin is greater than some threshold Move the key to the other node Update the address table Designed to work in a distributed, dynamic setting NOT a replacement for off-line algorithms in static settings
  • 23. Partitioning Example 2,1 1,1 1,2 NodeSum(Edges) 1,1 0 2,1 2 1,2 1
  • 26. Partitioning Quality Results % Edges in partition Vertices in graph On-line partitions as well as Kernighan-Lin
  • 27. Partitioning Performance Results Vertices moved Vertices in graph On-line partitions 2x faster than Kernighan-Lin!
  • 28. Conclusions Contributions: A novel model for scalable graph data stores that extends the key-value model Key-key-valuestore A high-level system design A novel on-line partitioning algorithm Preliminary experimental results Our proposed algorithm shows promise in the distributed, dynamic setting
  • 29. What’s Ahead? Prototype system implementation Java, PostgreSQL Performance Analysis against MongoDB, Cassandra Sensitivity Analysis Cloud Deployment
  • 30. Thank You! Acknowledgments Daniel Cole, Nick Farnan, Thao Pham, Sean Snyder ADMT Lab, CS Department, Pitt GPSA, Pitt A&S GSO, Pitt A&S PBC

Editor's Notes

  1. Two users: Alice and BobPut command – store “Alice Follows Bob”Use the Address Table to determine the virtual machine (node) that hosts Alice’s dataWrite the data to that nodeUse the address table to determine the node that hosts Bob’s dataWrite the same data to that nodeThe on-line partitioning algorithm moves Alice’s data to Bob’s node because they are connected
  2. Nodes in the logical layer have to handle varying demandsIf one node becomes overloaded, it can initiate a splitTo maintain the grid structure, nodes in the same row and column must also splitThe grid is used for replicationIt is used for efficient locking and messagingOnce the split is complete, new physical machines can be turned onVirtual nodes can be transferred to these new machinesSimilarly, as load decreases virtual nodes can be transferred off of physical machinesSome physical machines can then be shut down to save powerVirtual nodes can be merged back together
  3. Works by improving partitions – doesn’t create them from scratchOn-line means that it works with a changing graph – structure frequently changes
  4. The algorithm runs in parallel on each node When a split or merge occurs When load is below a thresholdEach vertex is considered in turn Find the number of edges to each node Edges can be weighted Find the node with the greatest no. edges If different, and the gain is > threshold, move vertex