3. Project Team
27%
3
13%
6% 13%
7%
6%
7%
7%
7%
7%
15 Members
Facebook
LinkedIn
Twitter
Database Lab, Korea University
Jybe
HortonWorks
LSDS group, VU Amsterdam
Pivotal Inc.
Database Systems and Information
Management group (DIMA), TU Berlin
Trend Micro
4. Project Introduction
“Apache Giraph is an iterative graph processing system
built for high scalability. For example, it is currently
used at Facebook to analyze the social graph formed by
users and their connections. Giraph originated as the
open-source counterpart to Pregel, the graph processing
architecture developed at Google and described in a 2010
paper. Both systems are inspired by the Bulk Synchronous
Parallel model of distributed computation introduced by
Leslie Valiant. Giraph adds several features beyond the
basic Pregel model, including master computation, sharded
aggregators, edge-oriented input, out-of-core
computation, and more. With a steady development cycle
and a growing community of users worldwide, Giraph is a
natural choice for unleashing the potential of structured
datasets at a massive scale. To learn more, consult the
User Docs section above.” – Source: Apache Giraph
Project Website
4
As the open-source counterpart to
Bulk
Synchrono
us Parallel
Pregel
Giraph
Inspired by
5. Development Information
•Build Tool: Maven
–GroupId = org.apache.giraph
–ArtifactId = giraph-parent
–Verssion = 1.1.0-hadoop2
–Dependencies – See more at here
•Source Repository: Git
https://git-wip-us.apache.org/repos/asf/giraph.git
•Issue Tracking: Jira
https://issues.apache.org/jira/browse/GIRAPH
•Continuous Integration: Jenkins
http://builds.apache.org/job/Giraph-trunk-Commit
5
6. Documentation
User Docs
• Introduction
•Literature
• Quick Start
• Building and Testing
•Giraph Options
• FAQ
• Presentations
• Wiki
Developer Docs
• API JavaDoc
•Test API JavaDoc
• JDepend Report Metrics
• Source Xref
•Test Source Xref
• Modules
• How to generate patches
• How to build this site
6
11. Pregel
a system for large-scale graph processing
Abstract
“Many practical computing problems concern large graphs. Standard examples
include the Web graph and various social networks. The scale of these graphs
- in some cases billions of vertices, trillions of edges - poses challenges
to their efficient processing. In this paper we present a computational
model suitable for this task. Programs are expressed as a sequence of
iterations, in each of which a vertex can receive messages sent in the
previous iteration, send messages to other vertices, and modify its own
state and that of its outgoing edges or mutate graph topology. This vertex-centric
approach is flexible enough to express a broad set of algorithms.
The model has been designed for efficient, scalable and fault-tolerant
implementation on clusters of thousands of commodity computers, and its
implied synchronicity makes reasoning about programs easier. Distribution-related
details are hidden behind an abstract API. The result is a framework
for processing large graphs that is expressive and easy to program.” –
Source: http://dl.acm.org/citation.cfm?id=1807184
11