Introduction to Apache Giraph Project and Large-Scale Graph Processing

•Download as PPTX, PDF•

1 like•815 views

Chun Cheng Lin

Introduction of apache giraph project

Software

Project Summary
•Large-scale graph processing on Hadoop
•Website: http://giraph.apache.org/
•Mailing Lists: http://giraph.apache.org/mail-lists.html
•Current Version: 1.1.0-hadoop2
•Last Published: 2014-10-25
•License: Apache License, V2
2
Giraph
Running on
Hadoop

Project Team
27%
3
13%
6% 13%
7%
6%
7%
7%
7%
7%
15 Members
Facebook
LinkedIn
Twitter
Database Lab, Korea University
Jybe
HortonWorks
LSDS group, VU Amsterdam
Pivotal Inc.
Database Systems and Information
Management group (DIMA), TU Berlin
Trend Micro

Project Introduction
“Apache Giraph is an iterative graph processing system
built for high scalability. For example, it is currently
used at Facebook to analyze the social graph formed by
users and their connections. Giraph originated as the
open-source counterpart to Pregel, the graph processing
architecture developed at Google and described in a 2010
paper. Both systems are inspired by the Bulk Synchronous
Parallel model of distributed computation introduced by
Leslie Valiant. Giraph adds several features beyond the
basic Pregel model, including master computation, sharded
aggregators, edge-oriented input, out-of-core
computation, and more. With a steady development cycle
and a growing community of users worldwide, Giraph is a
natural choice for unleashing the potential of structured
datasets at a massive scale. To learn more, consult the
User Docs section above.” – Source: Apache Giraph
Project Website
4
As the open-source counterpart to
Bulk
Synchrono
us Parallel
Pregel
Giraph
Inspired by

Development Information
•Build Tool: Maven
–GroupId = org.apache.giraph
–ArtifactId = giraph-parent
–Verssion = 1.1.0-hadoop2
–Dependencies – See more at here
•Source Repository: Git
https://git-wip-us.apache.org/repos/asf/giraph.git
•Issue Tracking: Jira
https://issues.apache.org/jira/browse/GIRAPH
•Continuous Integration: Jenkins
http://builds.apache.org/job/Giraph-trunk-Commit
5

Documentation
User Docs
• Introduction
•Literature
• Quick Start
• Building and Testing
•Giraph Options
• FAQ
• Presentations
• Wiki
Developer Docs
• API JavaDoc
•Test API JavaDoc
• JDepend Report Metrics
• Source Xref
•Test Source Xref
• Modules
• How to generate patches
• How to build this site
6

An example
Source: http://giraph.apache.org/intro.html
7

An example (cont.)
Source: http://giraph.apache.org/intro.html
8

Pregel
a system for large-scale graph processing
Abstract
“Many practical computing problems concern large graphs. Standard examples
include the Web graph and various social networks. The scale of these graphs
- in some cases billions of vertices, trillions of edges - poses challenges
to their efficient processing. In this paper we present a computational
model suitable for this task. Programs are expressed as a sequence of
iterations, in each of which a vertex can receive messages sent in the
previous iteration, send messages to other vertices, and modify its own
state and that of its outgoing edges or mutate graph topology. This vertex-centric
approach is flexible enough to express a broad set of algorithms.
The model has been designed for efficient, scalable and fault-tolerant
implementation on clusters of thousands of commodity computers, and its
implied synchronicity makes reasoning about programs easier. Distribution-related
details are hidden behind an abstract API. The result is a framework
for processing large graphs that is expressive and easy to program.” –
Source: http://dl.acm.org/citation.cfm?id=1807184
11

What's hot

Latest Developments in H2OSri Ambati

Strata San Jose 2016: Scalable Ensemble Learning with H2OSri Ambati

Hadoop Summit - Sanoma self service on hadoopSander Kieft

ISAXSri Ambati

Scalable Automatic Machine Learning in H2OSri Ambati

Scalable Machine Learning in R and Python with H2OSri Ambati

H2O PySparkling WaterSri Ambati

Skutil - H2O meets Sklearn - Taylor SmithSri Ambati

Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...Databricks

Spark ML Pipeline servingStepan Pushkarev

H2O Rains with Databricks Cloud - NY 02.16.16Sri Ambati

H2O at BelgradeR MeetupJo-fai Chow

Scaling Ride-Hailing with Machine Learning on MLflowDatabricks

CONDG April 23 2020 - Baskar Rao - GraphQLMatthew Groves

Splice Machine's use of Apache Spark and MLflowDatabricks

How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformDatabricks

Podling Hivemall in the Apache IncubatorMakoto Yui

Current Trends and Challenges in Big Data BenchmarkingeXascale Infolab

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale

Data Science with Spark & ZeppelinVinay Shukla

What's hot (20)

Latest Developments in H2O

Strata San Jose 2016: Scalable Ensemble Learning with H2O

Hadoop Summit - Sanoma self service on hadoop

ISAX

Scalable Automatic Machine Learning in H2O

Scalable Machine Learning in R and Python with H2O

H2O PySparkling Water

Skutil - H2O meets Sklearn - Taylor Smith

Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...

Spark ML Pipeline serving

H2O Rains with Databricks Cloud - NY 02.16.16

H2O at BelgradeR Meetup

Scaling Ride-Hailing with Machine Learning on MLflow

CONDG April 23 2020 - Baskar Rao - GraphQL

Splice Machine's use of Apache Spark and MLflow

How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform

Podling Hivemall in the Apache Incubator

Current Trends and Challenges in Big Data Benchmarking

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models

Data Science with Spark & Zeppelin

Viewers also liked

Lecture 1 introduction of project methodRoel Hernandez

Chemistry Investigatory Project Class 12Self-employed

Hadoop trainingin bangaloreappaji intelhunt

Processing edges on apache giraphDataWorks Summit

Training Evaluation planSejal Mehta

Giraph at Hadoop Summit 2014Claudio Martella

2011.10.14 Apache Giraph - HortonworksAvery Ching

Large Scale Graph Processing with Apache Giraphsscdotopen

Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...rhatr

Data type source presentation imMohmmedirfan Momin

Initiation à Neo4jNeo4j

Hadoop Graph Processing with Apache GiraphDataWorks Summit

Acid base indicatorMuhammad Mohsin

Introduction to project management academic essay assignment - www.topgrade...Top Grade Papers

Methods of data collectionChintan Trivedi

Lean Project Management Sampleahmad bassiouny

P3M - Project, Program, Portfolio Management FrameworkRobert M. Buhrman, Jr. PMP, PgMP, CSM, SPC, ITIL

Introducing Apache Giraph for Large Scale Graph Processingsscdotopen

ppt on data collection , processing , analysis of data & report writingIVRI

Project Proposal Sample: RFID on Warehouse Management SystemCheri Amour Calicdan

Viewers also liked (20)

Lecture 1 introduction of project method

Chemistry Investigatory Project Class 12

Hadoop trainingin bangalore

Processing edges on apache giraph

Training Evaluation plan

Giraph at Hadoop Summit 2014

2011.10.14 Apache Giraph - Hortonworks

Large Scale Graph Processing with Apache Giraph

Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...

Data type source presentation im

Initiation à Neo4j

Hadoop Graph Processing with Apache Giraph

Acid base indicator

Introduction to project management academic essay assignment - www.topgrade...

Methods of data collection

Lean Project Management Sample

P3M - Project, Program, Portfolio Management Framework

Introducing Apache Giraph for Large Scale Graph Processing

ppt on data collection , processing , analysis of data & report writing

Project Proposal Sample: RFID on Warehouse Management System

Similar to Introduction to Apache Giraph Project and Large-Scale Graph Processing

GraphTech Ecosystem - part 2: Graph AnalyticsLinkurious

Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi

Evolution of Drupal and the Drupal communityAngela Byron

Capstone presentationVikal Gupta

Big Data Ingestion Using Hadoop - Capstone PresentationSamkannan

The Zoo Expands: Labrador *Loves* Elephant, Thanks to HamsterMilind Bhandarkar

Apache hadoop technology : BeginnersShweta Patnaik

Big Data BenchmarkingVenkata Naga Ravi

Dba to data scientist -Satyendrapasalapudi123

Advanced Analytics and Big Data (August 2014)Thomas W. Dinsmore

2019-04-17 Bio-IT World G Suite-Jira Cloud Sample TrackingBruce Kozuma

BDA R20 21NM - Summary Big Data AnalyticsNetajiGandi1

Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Jason Dai

Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll

project--2 nd review_2Aswini Ashu

project--2 nd review_2aswini pilli

hadoop overview.pptxSachinSingh217687

Introduction to the graph technologies landscapeLinkurious

Similar to Introduction to Apache Giraph Project and Large-Scale Graph Processing (20)

GraphTech Ecosystem - part 2: Graph Analytics

Transitioning Compute Models: Hadoop MapReduce to Spark

Evolution of Drupal and the Drupal community

Capstone presentation

Big Data Ingestion Using Hadoop - Capstone Presentation

The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster

Apache hadoop technology : Beginners

Big Data Benchmarking

Dba to data scientist -Satyendra

Advanced Analytics and Big Data (August 2014)

2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking

BDA R20 21NM - Summary Big Data Analytics

Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)

Being Ready for Apache Kafka - Apache: Big Data Europe 2015

project--2 nd review_2

hadoop overview.pptx

Introduction to the graph technologies landscape

Recently uploaded

Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater

React Server Component in Next.js by Hanief UtamaHanief Utama

A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska

CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies

Introduction Computer Science - Software Design.pdfFerryKemperman

How to submit a standout Adobe Champion ApplicationBradBedford3

Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1

What is Fashion PLM and Why Do You Need ItWave PLM

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea

Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase

Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López

Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts

How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions

Cyber security and its impact on E commercemanigoyal112

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky

Recently uploaded (20)

Ahmed Motair CV April 2024 (Senior SW Developer)

React Server Component in Next.js by Hanief Utama

A healthy diet for your Java application Devoxx France.pdf

CRM Contender Series: HubSpot vs. Salesforce

Introduction Computer Science - Software Design.pdf

How to submit a standout Adobe Champion Application

Best Web Development Agency- Idiosys USA.pdf

What is Fashion PLM and Why Do You Need It

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样

Buds n Tech IT Solutions: Top-Notch Web Services in Noida

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024

Automate your Kamailio Test Calls - Kamailio World 2024

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...

Odoo 14 - eLearning Module In Odoo 14 Enterprise

How to Track Employee Performance A Comprehensive Guide.pdf

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...

Cyber security and its impact on E commerce

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...

Introduction to Apache Giraph Project and Large-Scale Graph Processing

1. Introduction of Apache Giraph Project Chun Cheng Lin 2014/11/26

2. Project Summary •Large-scale graph processing on Hadoop •Website: http://giraph.apache.org/ •Mailing Lists: http://giraph.apache.org/mail-lists.html •Current Version: 1.1.0-hadoop2 •Last Published: 2014-10-25 •License: Apache License, V2 2 Giraph Running on Hadoop

3. Project Team 27% 3 13% 6% 13% 7% 6% 7% 7% 7% 7% 15 Members Facebook LinkedIn Twitter Database Lab, Korea University Jybe HortonWorks LSDS group, VU Amsterdam Pivotal Inc. Database Systems and Information Management group (DIMA), TU Berlin Trend Micro

4. Project Introduction “Apache Giraph is an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections. Giraph originated as the open-source counterpart to Pregel, the graph processing architecture developed at Google and described in a 2010 paper. Both systems are inspired by the Bulk Synchronous Parallel model of distributed computation introduced by Leslie Valiant. Giraph adds several features beyond the basic Pregel model, including master computation, sharded aggregators, edge-oriented input, out-of-core computation, and more. With a steady development cycle and a growing community of users worldwide, Giraph is a natural choice for unleashing the potential of structured datasets at a massive scale. To learn more, consult the User Docs section above.” – Source: Apache Giraph Project Website 4 As the open-source counterpart to Bulk Synchrono us Parallel Pregel Giraph Inspired by

5. Development Information •Build Tool: Maven –GroupId = org.apache.giraph –ArtifactId = giraph-parent –Verssion = 1.1.0-hadoop2 –Dependencies – See more at here •Source Repository: Git https://git-wip-us.apache.org/repos/asf/giraph.git •Issue Tracking: Jira https://issues.apache.org/jira/browse/GIRAPH •Continuous Integration: Jenkins http://builds.apache.org/job/Giraph-trunk-Commit 5

6. Documentation User Docs • Introduction •Literature • Quick Start • Building and Testing •Giraph Options • FAQ • Presentations • Wiki Developer Docs • API JavaDoc •Test API JavaDoc • JDepend Report Metrics • Source Xref •Test Source Xref • Modules • How to generate patches • How to build this site 6

7. An example Source: http://giraph.apache.org/intro.html 7

8. An example (cont.) Source: http://giraph.apache.org/intro.html 8

9. Contact jimlintw922@gmail.com 9

10. Appendix 10

11. Pregel a system for large-scale graph processing Abstract “Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertex-centric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distribution-related details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.” – Source: http://dl.acm.org/citation.cfm?id=1807184 11

Introduction to Apache Giraph Project and Large-Scale Graph Processing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Introduction to Apache Giraph Project and Large-Scale Graph Processing

Similar to Introduction to Apache Giraph Project and Large-Scale Graph Processing (20)

Recently uploaded

Recently uploaded (20)

Introduction to Apache Giraph Project and Large-Scale Graph Processing