SlideShare a Scribd company logo
1 of 74
Download to read offline
http://www.catehuston.com/blog/2009/11/02/touchgraph/
Hadoop MapReduce デザ
インパターン
——MapReduceによる大規
模テキストデータ処理

1 Jimmy Lin, Chris Dyer�著、神
  林 飛志、野村 直之�監修、玉川
  竜司�訳
2 2011年10月01日 発売予定
3 210ページ
4 定価2,940円
Shuffle &
     barrier




    job start/
     shutdown
i                i+1
1
        B                   E

    5           1
                        4
A                   D               G
        3
            3           2
                                4

        C           5       F
5               1
            B                   E
    5               1
                        3   4
A                       D               G
            3
                3           2                       5!4               min(6,4)
                                    4                             1
                                                     B                     E
            C           5       F
                                            5                 1
                        i                                         3   4
                                        A                         D                G
                                                    3
                                                          3            2
                                                                               4
                                                3                          2
                                                    C             5        F

                                                              i+1
a super step




         http://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel
.
.
.
a super step
a super step
1
        B                    E
    5            1
                         4
A                    D               G
        3
             3           2
                                 4

        C            5       F

            initialize
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
5               1           +∞
            B                   E
    5               1
0                       3   4            +∞
A                       D                G
            3
                3           2
                                     4

    +∞ C                5       F   +∞
                        1
5               1           +∞
            B                   E
    5               1
0                       3   4            +∞
A                       D                G
            3
                3           2
                                     4

    +∞ C                5       F   +∞
                        2
5               1           +∞
            B                   E
    5               1
0                       3   4            +∞
A                       D                G
            3
                3           2
                                     4

    +∞ C                5       F   +∞
                        2
4               1           6
            B                   E
    5               1
0                       3   4           +∞
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        2
4               1           6
            B                   E
    5               1
0                       3   4           +∞
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        3
4               1           6
            B                   E
    5               1
0                       3   4           +∞
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        3
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        3
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                    end
class ShortestPathMapper(Mapper)
  def map(self, node_id, Node):
    # send graph structure
    emit node_id, Node
    # get node value and add it to edge distance
    dist = Node.get_value()
    for neighbour_node_id in Node.get_adjacency_list():
      dist_to_nbr = Node.get_distance(
                             node_id, neighbour_node_id )
      emit neighbour_node_id, dist + dist_to_nbr
class ShortestPathReducer(Reducer):
    def reduce(self, node_id, dist_list):
      min_dist = sys.maxint
      for dist in dist_list:
        # dist_list contains a Node
        if is_node(dist):
          Node = dist
        elif dist < min_dist:
          min_dist = dist
      Node.set_value(min_dist)
"    emit node_id, Node
# In-Mapper Combiner
class ShortestPathMapper(Mapper):
  def __init__(self):
     self.buffer = {}

  def check_and_put(self, key, value):
    if key not in self.buffer or value < self.buffer[key]:
      self.buffer[key] = value

  def check_and_emit(self):
    if is_exceed_limit_buffer_size(self.buffer):
      for key, value in self.buffer.items():
         emit key, value
      self.buffer = {}

  def close(self):
    for key, value in self.buffer.items():
      emit key, value
#...continue
  def map(self, node_id, Node):
    # send graph structure
    emit node_id, Node
    # get node value and add it to edge distance
    dist = Node.get_value()
    for nbr_node_id in Node.get_adjacency_list():
      dist_to_nbr = Node.get_distance(node_id, nbr_node_id)
      dist_nbr = dist + dist_to_nbr
      check_and_put(nbr_node_id, dist_nbr)
      check_and_emit()
# Shimmy trick
class ShortestPathReducer(Reducer):
  def __init__(self):
    P.open_graph_partition()


  def emit_precede_node(self, node_id):
    for pre_node_id, Node in P.read():
      if node_id == pre_node_id:
        return Node
      else:
        emit pre_node_id, Node
#(...continue)
  def reduce(node_id, dist_list):
    Node = self.emit_precede_node(node_id)
    min_dist = sys.maxint
    for dist in dist_list:
      if dist < min_dist:
        min_dist = dist
    Node.set_value(min_dist)
    emit node_id, Node
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
5               1           +∞
            B                   E
    5               1
0                       3   4            +∞
A                       D                G
            3
                3           2
                                     4

    +∞ C                5       F   +∞
                        2
5               1           +∞
            B                   E
    5               1
0                       3   4            +∞
A                       D                G
            3
                3           2
                                     4

    +∞ C                5       F   +∞
                        2
4               1           6
            B                   E
    5               1
0                       3   4           +∞
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        3
4               1           6
            B                   E
    5               1
0                       3   4           +∞
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        3
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        4
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        4
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        5
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        5
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                    end
class ShortestPathVertex:
  def compute(self, msgs):
    min_dist = 0 if self.is_source() else sys.maxint;
    # get values from all incoming edges.
    for msg in msgs:
      min_dist = min(min_dist, msg.get_value())
    if min_dist < self.get_value():
      # update current value(state).
   " self.set_current_value(min_dist)
      # send new value to outgoing edge.
      out_edge_iterator = self.get_out_edge_iterator()
      for out_edge in out_edge_iterator:
        recipient =
            out_edge.get_other_element(self.get_id())
        self.send_massage(recipient.get_id(),
                             min_dist + out_edge.get_distance() )
    self.vote_to_halt()
Pregel
Science and Technology), South Korea             edwardyoon@apache.org                  Science and Technology), South Korea
          swseo@calab.kaist.ac.kr                                                                jaehong@calab.kaist.ac.kr

           Seongwook Jin                                 Jin-Soo Kim                                   Seungryoul Maeng
     Computer Science Division       School of Information and Communication      Computer Science Division
KAIST (Korea Advanced Institute of    Sungkyunkwan University, South Korea   KAIST (Korea Advanced Institute of
Science and Technology), South Korea            jinsookim@skku.edu           Science and Technology), South Korea
       swjin@calab.kaist.ac.kr                                                      maeng@calab.kaist.ac.kr



   Abstract—APPLICATION. Various scientific computations                                    HAMA API
have become so complex, and thus computation tools play an                       HAMA Core                 HAMA Shell
important role. In this paper, we explore the state-of-the-art
framework providing high-level matrix computation primitives                                                            Computation Engine
with MapReduce through the case study approach, and demon-              MapReduce            BSP            Dryad       (Plugged In/Out)
strate these primitives with different computation engines to
show the performance and scalability. We believe the opportunity                           Zookeeper                    Distributed Locking
for using MapReduce in scientific computation is even more
promising than the success to date in the parallel systems
literature.                                                              HBase
                                                                                                                        Storage Systems
                                                                             HDFS                       RDBMS
                      I. I NTRODUCTION                                                        File

   As cloud computing environment emerges, Google has
                                                                                 Fig. 1.    The overall architecture of HAMA.
introduced the MapReduce framework to accelerate parallel
                                                                                                 http://wiki.apache.org/hama/Articles
and distributed computing on more than a thousand of in-
expensive machines. Google has shown that the MapReduce
framework is easy to use and provides massive scalability             HAMA is a distributed framework on Hadoop for massive
with extensive fault tolerance [2]. Especially, MapReduce fits      matrix and graph computations. HAMA aims at a power-
well with complex data-intensive computations such as high-        ful tool for various scientific applications, providing basic
dimensional scientific simulation, machine learning, and data       primitives for developers and researchers with simple APIs.
mining. Google and Yahoo! are known to operate dedicated           HAMA is currently being incubated as one of the subprojects
clusters for MapReduce applications, each cluster consisting       of Hadoop by the Apache Software Foundation [10].
of several thousands of nodes. One of typical MapReduce               Figure 1 illustrates the overall architecture of HAMA.
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)

More Related Content

Viewers also liked

Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTakahiro Inoue
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Takahiro Inoue
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joinsShalish VJ
 
並列データベースシステムの概念と原理
並列データベースシステムの概念と原理並列データベースシステムの概念と原理
並列データベースシステムの概念と原理Makoto Yui
 
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けRecruit Technologies
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) hamaken
 
Hadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントHadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントCloudera Japan
 

Viewers also liked (8)

Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC Demo
 
MapReduce入門
MapReduce入門MapReduce入門
MapReduce入門
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joins
 
並列データベースシステムの概念と原理
並列データベースシステムの概念と原理並列データベースシステムの概念と原理
並列データベースシステムの概念と原理
 
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分け
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
 
Hadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントHadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイント
 

More from Takahiro Inoue

トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングTakahiro Inoue
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Takahiro Inoue
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するTakahiro Inoue
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューションTakahiro Inoue
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方Takahiro Inoue
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータTakahiro Inoue
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612Takahiro Inoue
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)Takahiro Inoue
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜Takahiro Inoue
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Takahiro Inoue
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big DataTakahiro Inoue
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsTakahiro Inoue
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to TinkerpopTakahiro Inoue
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFSTakahiro Inoue
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDBTakahiro Inoue
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelTakahiro Inoue
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceTakahiro Inoue
 
MongoDB全機能解説2
MongoDB全機能解説2MongoDB全機能解説2
MongoDB全機能解説2Takahiro Inoue
 

More from Takahiro Inoue (20)

トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティング
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解する
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータ
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big Data
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB Plugins
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to Tinkerpop
 
Advanced MongoDB #1
Advanced MongoDB #1Advanced MongoDB #1
Advanced MongoDB #1
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFS
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDB
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduce
 
MongoDB Oplog入門
MongoDB Oplog入門MongoDB Oplog入門
MongoDB Oplog入門
 
MongoDB全機能解説2
MongoDB全機能解説2MongoDB全機能解説2
MongoDB全機能解説2
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Large-Scale Graph Processing〜Introduction〜(完全版)

  • 1.
  • 2.
  • 4. Hadoop MapReduce デザ インパターン ——MapReduceによる大規 模テキストデータ処理 1 Jimmy Lin, Chris Dyer�著、神 林 飛志、野村 直之�監修、玉川 竜司�訳 2 2011年10月01日 発売予定 3 210ページ 4 定価2,940円
  • 5.
  • 6.
  • 7.
  • 8. Shuffle & barrier job start/ shutdown i i+1
  • 9.
  • 10. 1 B E 5 1 4 A D G 3 3 2 4 C 5 F
  • 11. 5 1 B E 5 1 3 4 A D G 3 3 2 5!4 min(6,4) 4 1 B E C 5 F 5 1 i 3 4 A D G 3 3 2 4 3 2 C 5 F i+1
  • 12. a super step http://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel
  • 13. . . .
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 22.
  • 23.
  • 24. 1 B E 5 1 4 A D G 3 3 2 4 C 5 F initialize
  • 25. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 26. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 27. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 28. 5 1 +∞ B E 5 1 0 3 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 29. 5 1 +∞ B E 5 1 0 3 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 30. 5 1 +∞ B E 5 1 0 3 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 31. 4 1 6 B E 5 1 0 3 4 +∞ A D G 3 3 2 4 6 C 5 F 5 2
  • 32. 4 1 6 B E 5 1 0 3 4 +∞ A D G 3 3 2 4 6 C 5 F 5 3
  • 33. 4 1 6 B E 5 1 0 3 4 +∞ A D G 3 3 2 4 6 C 5 F 5 3
  • 34. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 3
  • 35. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 end
  • 36. class ShortestPathMapper(Mapper) def map(self, node_id, Node): # send graph structure emit node_id, Node # get node value and add it to edge distance dist = Node.get_value() for neighbour_node_id in Node.get_adjacency_list(): dist_to_nbr = Node.get_distance( node_id, neighbour_node_id ) emit neighbour_node_id, dist + dist_to_nbr
  • 37. class ShortestPathReducer(Reducer): def reduce(self, node_id, dist_list): min_dist = sys.maxint for dist in dist_list: # dist_list contains a Node if is_node(dist): Node = dist elif dist < min_dist: min_dist = dist Node.set_value(min_dist) " emit node_id, Node
  • 38.
  • 39.
  • 40.
  • 41.
  • 42. # In-Mapper Combiner class ShortestPathMapper(Mapper): def __init__(self): self.buffer = {} def check_and_put(self, key, value): if key not in self.buffer or value < self.buffer[key]: self.buffer[key] = value def check_and_emit(self): if is_exceed_limit_buffer_size(self.buffer): for key, value in self.buffer.items(): emit key, value self.buffer = {} def close(self): for key, value in self.buffer.items(): emit key, value
  • 43. #...continue def map(self, node_id, Node): # send graph structure emit node_id, Node # get node value and add it to edge distance dist = Node.get_value() for nbr_node_id in Node.get_adjacency_list(): dist_to_nbr = Node.get_distance(node_id, nbr_node_id) dist_nbr = dist + dist_to_nbr check_and_put(nbr_node_id, dist_nbr) check_and_emit()
  • 44.
  • 45.
  • 46.
  • 47.
  • 48. # Shimmy trick class ShortestPathReducer(Reducer): def __init__(self): P.open_graph_partition() def emit_precede_node(self, node_id): for pre_node_id, Node in P.read(): if node_id == pre_node_id: return Node else: emit pre_node_id, Node
  • 49. #(...continue) def reduce(node_id, dist_list): Node = self.emit_precede_node(node_id) min_dist = sys.maxint for dist in dist_list: if dist < min_dist: min_dist = dist Node.set_value(min_dist) emit node_id, Node
  • 50.
  • 51. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 52. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 53. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 54. 5 1 +∞ B E 5 1 0 3 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 55. 5 1 +∞ B E 5 1 0 3 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 56. 4 1 6 B E 5 1 0 3 4 +∞ A D G 3 3 2 4 6 C 5 F 5 3
  • 57. 4 1 6 B E 5 1 0 3 4 +∞ A D G 3 3 2 4 6 C 5 F 5 3
  • 58. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 4
  • 59. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 4
  • 60. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 5
  • 61. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 5
  • 62. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 end
  • 63. class ShortestPathVertex: def compute(self, msgs): min_dist = 0 if self.is_source() else sys.maxint; # get values from all incoming edges. for msg in msgs: min_dist = min(min_dist, msg.get_value()) if min_dist < self.get_value(): # update current value(state). " self.set_current_value(min_dist) # send new value to outgoing edge. out_edge_iterator = self.get_out_edge_iterator() for out_edge in out_edge_iterator: recipient = out_edge.get_other_element(self.get_id()) self.send_massage(recipient.get_id(), min_dist + out_edge.get_distance() ) self.vote_to_halt()
  • 64.
  • 65.
  • 66.
  • 67.
  • 69.
  • 70.
  • 71.
  • 72. Science and Technology), South Korea edwardyoon@apache.org Science and Technology), South Korea swseo@calab.kaist.ac.kr jaehong@calab.kaist.ac.kr Seongwook Jin Jin-Soo Kim Seungryoul Maeng Computer Science Division School of Information and Communication Computer Science Division KAIST (Korea Advanced Institute of Sungkyunkwan University, South Korea KAIST (Korea Advanced Institute of Science and Technology), South Korea jinsookim@skku.edu Science and Technology), South Korea swjin@calab.kaist.ac.kr maeng@calab.kaist.ac.kr Abstract—APPLICATION. Various scientific computations HAMA API have become so complex, and thus computation tools play an HAMA Core HAMA Shell important role. In this paper, we explore the state-of-the-art framework providing high-level matrix computation primitives Computation Engine with MapReduce through the case study approach, and demon- MapReduce BSP Dryad (Plugged In/Out) strate these primitives with different computation engines to show the performance and scalability. We believe the opportunity Zookeeper Distributed Locking for using MapReduce in scientific computation is even more promising than the success to date in the parallel systems literature. HBase Storage Systems HDFS RDBMS I. I NTRODUCTION File As cloud computing environment emerges, Google has Fig. 1. The overall architecture of HAMA. introduced the MapReduce framework to accelerate parallel http://wiki.apache.org/hama/Articles and distributed computing on more than a thousand of in- expensive machines. Google has shown that the MapReduce framework is easy to use and provides massive scalability HAMA is a distributed framework on Hadoop for massive with extensive fault tolerance [2]. Especially, MapReduce fits matrix and graph computations. HAMA aims at a power- well with complex data-intensive computations such as high- ful tool for various scientific applications, providing basic dimensional scientific simulation, machine learning, and data primitives for developers and researchers with simple APIs. mining. Google and Yahoo! are known to operate dedicated HAMA is currently being incubated as one of the subprojects clusters for MapReduce applications, each cluster consisting of Hadoop by the Apache Software Foundation [10]. of several thousands of nodes. One of typical MapReduce Figure 1 illustrates the overall architecture of HAMA.