SlideShare a Scribd company logo
1 of 19
Giraph: Large-scale graph processing infrastructure on Hadoop Avery Ching, Yahoo! Christian Kunz, Jybe  6/29/2011
Why scalable graph processing? 2 Real world web and social graphs are at immense scale and continuing to grow In 2008, Google estimated the number of web pages at 1 trillion In 2011, Goldman Sachs disclosed Facebook has over 600 million active users In March 2011, LinkedIn said it had over 100 million users In October 2010, Twitter claimed to have over 175 million users Relevant and personalized information for users relies strongly on iterative graph ranking algorithms (search results, social news, ads, etc.) In web graphs, page rank and its variants In social graphs, popularity rank, personalized rankings, shared connections, shortest paths, etc.
Existing solutions Sequence of map-reduce jobs in Hadoop Classic map-reduce overheads (job startup/shutdown, reloading data from HDFS, shuffling) Map-reduce programming model not a good fit for graph algorithms Google’s Pregel Requires its own computing infrastructure Not available (unless you work at Google) Master is a SPOF Message passing interface (MPI) Not fault-tolerant Too generic 3
Giraph Leverage Hadoop installations around the world for iterative graph processing Big data today is processed on Hadoop with the Map-Reduce computing model Map-Reduce with Hadoop is widely deployed outside of Yahoo! as well (i.e. EC2, Cloudera, etc.) Bulk synchronous processing (BSP) computing model Input data loaded once during the application, all messaging in memory Fault-tolerant/dynamic graph processing infrastructure Automatically adjust to available resources on the Hadoop grid No single point of failure except Hadoop namenode and jobtracker Relies on ZooKeeper as a fault-tolerant coordination service Vertex-centric API to do graph processing (similar to the map() method in Hadoop) in a bulk synchronous parallel computing model Inspired by Pregel Open-source https://github.com/aching/Giraph 4
Bulk synchronous parallel model Computation consists of a series of “supersteps” Supersteps are an atomic unit of computation where operations can happen in parallel During a superstep, components are assigned tasks and receive unordered messages from the previous superstep Components communicate through point-to-point messaging All (or a subset of) components can be synchronized through the superstep concept 5 Superstep i Superstep i+1 Superstep i+2 Component 1 Component 1 Component 1 Component 2 Component 2 Component 2 Component 3 Component 3 Component 3 Application progress
Writing a Giraph application Every vertex will call compute() method once during a superstep Analogous to map() method in Hadoop for a <key, value> tuple Users chooses 4 types for their implementation of Vertex (I VertexId, V VertexValue, E EdgeValue, M MsgValue) 6
Basic Giraph API Local vertex mutations happen immediately Graph mutations are processed just prior to the next superstep Sent messages received at the next superstep during compute Vertices “vote” to end the computation Once all vertices have voted to end the computation, the application is finished 7
Page rank example public class SimplePageRankVertex extends HadoopVertex<LongWritable, DoubleWritable, FloatWritable, DoubleWritable> {     public void compute(Iterator<DoubleWritable> msgIterator) {         double sum = 0;         while (msgIterator.hasNext()) {             sum += msgIterator.next().get();         }         setVertexValue(new DoubleWritable((0.15f / getNumVertices()) + 0.85f * sum);         if (getSuperstep() < 30) {             long edges = getOutEdgeIterator().size(); sentMsgToAllEdges(new DoubleWritable(getVertexValue().get() / edges));         } else { voteToHalt();         }     } 8
Thread architecture 9 Map-only job in Hadoop Map 0 Map 1 Map 2 Map 3 Map 4 Thread assignment in Giraph Worker Worker Worker Worker Worker Worker Worker Master ZooKeeper
Thread responsibilities Master Only one active master at a time Runs the VertexInputFormat getSplits() to get the appropriate number of VertexSplit objects for the application and writes it to ZooKeeper Coordinates application Synchronizes supersteps, end of application Handles changes that can occur within supersteps (i.e. vertex movement, change in number of workers, etc.) Worker Reads vertices from one or more VertexSplit objects, splitting them into VertexRanges Executes the compute() method for every Vertex it is assigned once per superstep Buffers the incoming messages to every Vertex it is assigned for the next superstep ZooKeeper Manages a server that Is a part of the ZooKeeper quorum (maintains global application state) 10
Vertex distribution 11 Worker 0 V Worker 0 VertexRange 0 V Master VertexSplit 0 V V Workers process the VertexSplit objects and divide them further to create VertexRange objects. V V VertexRange 0,1 Prior to every superstep, the master assigns every worker 0 or more VertexRange objects.  They are responsible for the vertices that fall in those VertexRange objects. V V VertexRange 1 V V V V V V Master V Worker 1 VertexRange 2 V VertexSplit 1 VertexRange 2,3 V V V V Master uses the  VertexInputFormat to divide the input data into VertexSplit objects and serializes them to ZooKeeper. V V V VertexRange 3 V V V V V Worker 1 V V
Worker phases in a superstep Master selects one or more of the available workers to use for the superstep Users can set the checkpoint frequency ,[object Object],Users can determine how to distribute vertex ranges on the set of available workers BSP model allows for dynamic resource usage Every superstep is an atomic unit of computation Resources can change between supersteps and used accordingly (shrink or grow) 12 Prepare by registering worker health If desired, checkpoint vertices If desired, exchange vertex ranges Execute compute() on each vertex and  exchange messages
Fault tolerance No single point of failure from BSP threads With multiple master threads, if the current master dies, a new one will automatically take over. If a worker thread dies, the application is rolled back to a previously checkpointed superstep.  The next superstep will begin with the new amount of workers If a zookeeper server dies, as long as a quorum remains, the application can proceed Hadoop single points of failure still exist Namenode, jobtracker Restarting manually from a checkpoint is always possible 13
Master thread fault tolerance 14 Before failure of active master 0 After failure of active master 0 Active Master State Active Master State “Active” Master 0 “Active” Master 0 “Spare” Master 1 “Active” Master 1 “Spare” Master 2 “Spare” Master 2 One active master, with spare masters taking over in the event of an active master failure All active master state is stored in ZooKeeper so that a spare master can immediately step in when an active master fails “Active” master implemented as a queue in ZooKeeper
Worker thread fault tolerance 15 Superstep i (no checkpoint) Superstep i+1 (checkpoint) Superstep i+2 (no checkpoint) Worker failure! Superstep i+3 (checkpoint) Superstep i+1 (checkpoint) Superstep i+2 (no checkpoint) Worker failure after checkpoint complete! Superstep i+3 (no checkpoint) Application Complete A single worker failure causes the superstep to fail In order to disrupt the application, a worker must be registered and chosen for the current superstep Application reverts to the last committed superstep automatically Master detects worker failure during any superstep through the loss of a  ZooKeeper “health” znode Master chooses the last committed superstep and sends a command through ZooKeeper for all workers to restart from that superstep
Optional features Combiners Similar to Map-Reduce combiners Users implement a combine() method that can reduce the amount of messages sent and received Run on both the client side and server side Client side saves memory and message traffic Server side save memory Aggregators Similar to MPI aggregation routines (i.e. max, min, sum, etc.) Users can write their own aggregators Commutative and associate operations that are performed globally Examples include global communication, monitoring, and statistics 16
Early Yahoo! customers Web of Objects Currently used for the movie database (10’s of millions of records, run with 400 workers) Popularity rank, shared connections, personalized page rank Web map Next generation page-rank related algorithms will use this framework (250 billion web pages) Current graph processing solution uses MPI (no fault-tolerance, customized code) 17
Future work Improved fault tolerance testing Unit tests exist, but test a limited subset of failures Performance testing Tested with 400 workers, but thousands desired Importing some of the currently used graph algorithms into an included library Storing messages that surpass available memory on disk Leverage the programming model, maybe even convert to a native Map-Reduce application 18
Conclusion Giraph is a graph processing infrastructure that runs on existing Hadoop infrastructure Already being used at Yahoo!  Open source – available on GitHub https://github.com/aching/Giraph In the process of submitting an Apache Incubator proposal Questions/Comments? aching@yahoo-inc.com 19

More Related Content

Viewers also liked

Freebase and the semantic web
Freebase and the semantic webFreebase and the semantic web
Freebase and the semantic webspencermountain
 
Bekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesBekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesdiannepatricia
 
TAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social GraphTAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social GraphAdrian-Tudor Panescu
 
#Espc15: Build a knowledge social network with o365, yammer and office graph
#Espc15: Build a knowledge social network with o365, yammer and office graph#Espc15: Build a knowledge social network with o365, yammer and office graph
#Espc15: Build a knowledge social network with o365, yammer and office graphNicolas Georgeault
 
Applications and Analytics players and positioning
Applications and Analytics players and positioningApplications and Analytics players and positioning
Applications and Analytics players and positioningEinat Shimoni
 
Introduction of apache giraph project
Introduction of apache giraph projectIntroduction of apache giraph project
Introduction of apache giraph projectChun Cheng Lin
 
Apache Giraph: Large-scale graph processing done better
Apache Giraph: Large-scale graph processing done betterApache Giraph: Large-scale graph processing done better
Apache Giraph: Large-scale graph processing done better🧑‍💻 Manuel Coppotelli
 
Enterprise Applications, Analytics and Knowledge Products Positionings in Isr...
Enterprise Applications, Analytics and Knowledge Products Positionings in Isr...Enterprise Applications, Analytics and Knowledge Products Positionings in Isr...
Enterprise Applications, Analytics and Knowledge Products Positionings in Isr...Einat Shimoni
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphsSören Auer
 
The Algorithm of Magical Customer Experiences
The Algorithm of Magical Customer ExperiencesThe Algorithm of Magical Customer Experiences
The Algorithm of Magical Customer ExperiencesEinat Shimoni
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge GraphTrey Grainger
 
Initiation à Neo4j
Initiation à Neo4jInitiation à Neo4j
Initiation à Neo4jNeo4j
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big GraphNeo4j
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge GraphLukas Masuch
 
Enterprise Knowledge - Taxonomy Design Best Practices and Methodology
Enterprise Knowledge - Taxonomy Design Best Practices and MethodologyEnterprise Knowledge - Taxonomy Design Best Practices and Methodology
Enterprise Knowledge - Taxonomy Design Best Practices and MethodologyEnterprise Knowledge
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
 

Viewers also liked (20)

Freebase and the semantic web
Freebase and the semantic webFreebase and the semantic web
Freebase and the semantic web
 
Bekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesBekas for cognitive_speaker_series
Bekas for cognitive_speaker_series
 
Aleph
AlephAleph
Aleph
 
TAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social GraphTAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social Graph
 
Overview of an Efficient Knowledge Management Model
Overview of an Efficient Knowledge Management ModelOverview of an Efficient Knowledge Management Model
Overview of an Efficient Knowledge Management Model
 
#Espc15: Build a knowledge social network with o365, yammer and office graph
#Espc15: Build a knowledge social network with o365, yammer and office graph#Espc15: Build a knowledge social network with o365, yammer and office graph
#Espc15: Build a knowledge social network with o365, yammer and office graph
 
Apache giraph
Apache giraphApache giraph
Apache giraph
 
Applications and Analytics players and positioning
Applications and Analytics players and positioningApplications and Analytics players and positioning
Applications and Analytics players and positioning
 
Introduction of apache giraph project
Introduction of apache giraph projectIntroduction of apache giraph project
Introduction of apache giraph project
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge Graph
 
Apache Giraph: Large-scale graph processing done better
Apache Giraph: Large-scale graph processing done betterApache Giraph: Large-scale graph processing done better
Apache Giraph: Large-scale graph processing done better
 
Enterprise Applications, Analytics and Knowledge Products Positionings in Isr...
Enterprise Applications, Analytics and Knowledge Products Positionings in Isr...Enterprise Applications, Analytics and Knowledge Products Positionings in Isr...
Enterprise Applications, Analytics and Knowledge Products Positionings in Isr...
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
The Algorithm of Magical Customer Experiences
The Algorithm of Magical Customer ExperiencesThe Algorithm of Magical Customer Experiences
The Algorithm of Magical Customer Experiences
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Initiation à Neo4j
Initiation à Neo4jInitiation à Neo4j
Initiation à Neo4j
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big Graph
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge Graph
 
Enterprise Knowledge - Taxonomy Design Best Practices and Methodology
Enterprise Knowledge - Taxonomy Design Best Practices and MethodologyEnterprise Knowledge - Taxonomy Design Best Practices and Methodology
Enterprise Knowledge - Taxonomy Design Best Practices and Methodology
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

2011.06.29. Giraph - Hadoop Summit 2011

  • 1. Giraph: Large-scale graph processing infrastructure on Hadoop Avery Ching, Yahoo! Christian Kunz, Jybe 6/29/2011
  • 2. Why scalable graph processing? 2 Real world web and social graphs are at immense scale and continuing to grow In 2008, Google estimated the number of web pages at 1 trillion In 2011, Goldman Sachs disclosed Facebook has over 600 million active users In March 2011, LinkedIn said it had over 100 million users In October 2010, Twitter claimed to have over 175 million users Relevant and personalized information for users relies strongly on iterative graph ranking algorithms (search results, social news, ads, etc.) In web graphs, page rank and its variants In social graphs, popularity rank, personalized rankings, shared connections, shortest paths, etc.
  • 3. Existing solutions Sequence of map-reduce jobs in Hadoop Classic map-reduce overheads (job startup/shutdown, reloading data from HDFS, shuffling) Map-reduce programming model not a good fit for graph algorithms Google’s Pregel Requires its own computing infrastructure Not available (unless you work at Google) Master is a SPOF Message passing interface (MPI) Not fault-tolerant Too generic 3
  • 4. Giraph Leverage Hadoop installations around the world for iterative graph processing Big data today is processed on Hadoop with the Map-Reduce computing model Map-Reduce with Hadoop is widely deployed outside of Yahoo! as well (i.e. EC2, Cloudera, etc.) Bulk synchronous processing (BSP) computing model Input data loaded once during the application, all messaging in memory Fault-tolerant/dynamic graph processing infrastructure Automatically adjust to available resources on the Hadoop grid No single point of failure except Hadoop namenode and jobtracker Relies on ZooKeeper as a fault-tolerant coordination service Vertex-centric API to do graph processing (similar to the map() method in Hadoop) in a bulk synchronous parallel computing model Inspired by Pregel Open-source https://github.com/aching/Giraph 4
  • 5. Bulk synchronous parallel model Computation consists of a series of “supersteps” Supersteps are an atomic unit of computation where operations can happen in parallel During a superstep, components are assigned tasks and receive unordered messages from the previous superstep Components communicate through point-to-point messaging All (or a subset of) components can be synchronized through the superstep concept 5 Superstep i Superstep i+1 Superstep i+2 Component 1 Component 1 Component 1 Component 2 Component 2 Component 2 Component 3 Component 3 Component 3 Application progress
  • 6. Writing a Giraph application Every vertex will call compute() method once during a superstep Analogous to map() method in Hadoop for a <key, value> tuple Users chooses 4 types for their implementation of Vertex (I VertexId, V VertexValue, E EdgeValue, M MsgValue) 6
  • 7. Basic Giraph API Local vertex mutations happen immediately Graph mutations are processed just prior to the next superstep Sent messages received at the next superstep during compute Vertices “vote” to end the computation Once all vertices have voted to end the computation, the application is finished 7
  • 8. Page rank example public class SimplePageRankVertex extends HadoopVertex<LongWritable, DoubleWritable, FloatWritable, DoubleWritable> { public void compute(Iterator<DoubleWritable> msgIterator) { double sum = 0; while (msgIterator.hasNext()) { sum += msgIterator.next().get(); } setVertexValue(new DoubleWritable((0.15f / getNumVertices()) + 0.85f * sum); if (getSuperstep() < 30) { long edges = getOutEdgeIterator().size(); sentMsgToAllEdges(new DoubleWritable(getVertexValue().get() / edges)); } else { voteToHalt(); } } 8
  • 9. Thread architecture 9 Map-only job in Hadoop Map 0 Map 1 Map 2 Map 3 Map 4 Thread assignment in Giraph Worker Worker Worker Worker Worker Worker Worker Master ZooKeeper
  • 10. Thread responsibilities Master Only one active master at a time Runs the VertexInputFormat getSplits() to get the appropriate number of VertexSplit objects for the application and writes it to ZooKeeper Coordinates application Synchronizes supersteps, end of application Handles changes that can occur within supersteps (i.e. vertex movement, change in number of workers, etc.) Worker Reads vertices from one or more VertexSplit objects, splitting them into VertexRanges Executes the compute() method for every Vertex it is assigned once per superstep Buffers the incoming messages to every Vertex it is assigned for the next superstep ZooKeeper Manages a server that Is a part of the ZooKeeper quorum (maintains global application state) 10
  • 11. Vertex distribution 11 Worker 0 V Worker 0 VertexRange 0 V Master VertexSplit 0 V V Workers process the VertexSplit objects and divide them further to create VertexRange objects. V V VertexRange 0,1 Prior to every superstep, the master assigns every worker 0 or more VertexRange objects. They are responsible for the vertices that fall in those VertexRange objects. V V VertexRange 1 V V V V V V Master V Worker 1 VertexRange 2 V VertexSplit 1 VertexRange 2,3 V V V V Master uses the VertexInputFormat to divide the input data into VertexSplit objects and serializes them to ZooKeeper. V V V VertexRange 3 V V V V V Worker 1 V V
  • 12.
  • 13. Fault tolerance No single point of failure from BSP threads With multiple master threads, if the current master dies, a new one will automatically take over. If a worker thread dies, the application is rolled back to a previously checkpointed superstep. The next superstep will begin with the new amount of workers If a zookeeper server dies, as long as a quorum remains, the application can proceed Hadoop single points of failure still exist Namenode, jobtracker Restarting manually from a checkpoint is always possible 13
  • 14. Master thread fault tolerance 14 Before failure of active master 0 After failure of active master 0 Active Master State Active Master State “Active” Master 0 “Active” Master 0 “Spare” Master 1 “Active” Master 1 “Spare” Master 2 “Spare” Master 2 One active master, with spare masters taking over in the event of an active master failure All active master state is stored in ZooKeeper so that a spare master can immediately step in when an active master fails “Active” master implemented as a queue in ZooKeeper
  • 15. Worker thread fault tolerance 15 Superstep i (no checkpoint) Superstep i+1 (checkpoint) Superstep i+2 (no checkpoint) Worker failure! Superstep i+3 (checkpoint) Superstep i+1 (checkpoint) Superstep i+2 (no checkpoint) Worker failure after checkpoint complete! Superstep i+3 (no checkpoint) Application Complete A single worker failure causes the superstep to fail In order to disrupt the application, a worker must be registered and chosen for the current superstep Application reverts to the last committed superstep automatically Master detects worker failure during any superstep through the loss of a ZooKeeper “health” znode Master chooses the last committed superstep and sends a command through ZooKeeper for all workers to restart from that superstep
  • 16. Optional features Combiners Similar to Map-Reduce combiners Users implement a combine() method that can reduce the amount of messages sent and received Run on both the client side and server side Client side saves memory and message traffic Server side save memory Aggregators Similar to MPI aggregation routines (i.e. max, min, sum, etc.) Users can write their own aggregators Commutative and associate operations that are performed globally Examples include global communication, monitoring, and statistics 16
  • 17. Early Yahoo! customers Web of Objects Currently used for the movie database (10’s of millions of records, run with 400 workers) Popularity rank, shared connections, personalized page rank Web map Next generation page-rank related algorithms will use this framework (250 billion web pages) Current graph processing solution uses MPI (no fault-tolerance, customized code) 17
  • 18. Future work Improved fault tolerance testing Unit tests exist, but test a limited subset of failures Performance testing Tested with 400 workers, but thousands desired Importing some of the currently used graph algorithms into an included library Storing messages that surpass available memory on disk Leverage the programming model, maybe even convert to a native Map-Reduce application 18
  • 19. Conclusion Giraph is a graph processing infrastructure that runs on existing Hadoop infrastructure Already being used at Yahoo! Open source – available on GitHub https://github.com/aching/Giraph In the process of submitting an Apache Incubator proposal Questions/Comments? aching@yahoo-inc.com 19