Spark GraphFrames のススメ

/ 20
Spark GraphFrames のススメ
ビッグデータ部加嵜長門
2016年3月23日道玄坂LT祭り

/ 20
自己紹介
• 加嵜長門
• 2014年4月～ DMM.comラボ
• Hadoop基盤構築
• Spark MLlib, GraphX, spark.ml, GraphFrames を用いたレコメンド開発
• 好きな言語
• SQL
• Cypher
2

/ 20
GraphFramesとは？
• GraphFrames
• http://graphframes.github.io/
• 分散グラフ処理のための Apache Spark パッケージ
• Spark GraphX と DataFrames (SparkSQL) の統合
• Databricksが2016年3月3日にリリース
3

/ 20
なぜGraphFramesか？
4
生産性（処理の書きやすさ）
スケーラビリティ
GraphFrames
※ 個人の感想です
グラフDB/グラフ処理系製品

/ 20
GraphFramesのメリット
• 高レイヤのAPI
• 数行の記述でグラフの分散処理が実現できる
• グラフデータの構築が容易
• RDBやDataFramesなどのテーブル形式のデータから
手軽にグラフ構造のデータを作成できる
• ブルーオーシャン！
5
https://www.google.co.jp/search?q=graphframes&ie=utf-8&oe=utf-8&hl=ja (2016.3.23現在)

/ 20
GraphFramesを試す
• Sparkと同様、Scala, Java, Python, R向けのAPIを使用可能
• Spark Shell でインタラクティブに試す
• Spark 1.4以上に対応
• DataFramesの利点を活かすなら最新版を推奨
6
# spark をダウンロード
$ wget http://ftp.jaist.ac.jp/pub/apache/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
$ tar xzvf spark-1.6.0-bin-hadoop2.6.tgz
# graphframesパッケージを指定してspark-shellを起動
$ spark-1.6.0-bin-hadoop2.6/bin/spark-shell --packages graphframes:graphframes:0.1.0-spark1.6

/ 20
GraphFrames – グラフの作成
7
// graphframesパッケージのインポート
scala> import org.graphframes._
import org.graphframes._
// Vertex（頂点）となるDataFrameを作成
scala> val v = sqlContext.createDataFrame(List(
| (0L, "user", "u1"),
| (1L, "user", "u2"),
| (2L, "item", "i1"),
| (3L, "item", "i2"),
| (4L, "item", "i3"),
| (5L, "item", "i4")
| )).toDF("id", "type", "name")
v: org.apache.spark.sql.DataFrame = [id: bigint, type: string, name: string]
u1
u2
ユーザ
i1
i2
i3
i4
アイテム

/ 20
GraphFrames – グラフの作成
8
// Edge（辺）となるDataFrameを作成
scala> val e = sqlContext.createDataFrame(List(
| (0L, 2L, "purchase"),
| (1L, 5L, "purchase")
| )).toDF("src", "dst", "type")
e: org.apache.spark.sql.DataFrame = [src: bigint, dst: bigint, type: string]
// GraphFrameを作成
scala> val g = GraphFrame(v, e)
g: org.graphframes.GraphFrame = GraphFrame(v:[id: bigint, attr: string, gender: string],
e:[src: bigint, dst: bigint, relationship: string])
u1
u2
i1
i2
i3
i4
購入ログ

/ 20
GraphFrames – アイテムレコメンドの実行例
9
// レコメンドアイテムの問い合わせ例
scala> g.find(
| " (a)-[]->(x); (b)-[]->(x);" +
| " (b)-[]->(y); !(a)-[]->(y)"
| ).groupBy(
| "a.name", "y.name"
| ).count().show()
+----+----+-----+
|name|name|count|
+----+----+-----+
| u1| i4| 2|
| u2| i1| 2|
+----+----+-----+
u1
u2
i1
i2
i3
i4
共通の商品を
購入したユーザ
まだ購入していないアイテムをレコメンド
(b)
(y)
(a)
(x)

/ 20
GraphFrames – PageRankの実行例
11
// PageRankを計算
scala> val pr = g.pageRank.resetProbability(0.1).tol(0.01).run()
// PageRankのスコアを表示
scala> pr.vertices.show()
+---+-------+--------+
| id|v_attr1|pagerank|
+---+-------+--------+
| 0| root| 0.55|
| 1| node-1| 0.1|
| 2| node-2| 0.1|
| 3| node-3| 0.1|
| 4| node-4| 0.1|
| 5| node-5| 0.1|
+---+-------+--------+
0
1
2
3 4
5
0.1 0.1
0.1
0.1 0.1
0.55

/ 20
a
GraphFrames – 最短距離を計算
13
// すべてのユーザからユーザ “a” までの最短距離を計算
scala> val d1 = friends.shortestPaths.landmarks(Seq("a")).run()
// 結果を表示
scala> d1.show()
+---+-------+---+-----------+
| id| name|age| distances|
+---+-------+---+-----------+
| f| Fanny| 36| Map()|
| g| Gabby| 60| Map()|
| a| Alice| 34|Map(a -> 0)|
| b| Bob| 36| Map()|
| c|Charlie| 30| Map()|
| d| David| 29|Map(a -> 1)|
| e| Esther| 32|Map(a -> 2)|
+---+-------+---+-----------+
a
b
c
de
f
a -> 0
g
a -> 2 a -> 1

/ 20
a
c
GraphFrames – 最短距離を計算
14
// すべてのユーザからユーザ “a”, “c” までの最短距離を計算
scala> val d2 = friends.shortestPaths.landmarks(Seq("a", "c")).run()
// 結果を表示
scala> d2.show()
+---+-------+---+-------------------+
| id| name|age| distances|
+---+-------+---+-------------------+
| f| Fanny| 36| Map(c -> 1)|
| g| Gabby| 60| Map()|
| a| Alice| 34|Map(a -> 0, c -> 2)|
| b| Bob| 36| Map(c -> 1)|
| c|Charlie| 30| Map(c -> 0)|
| d| David| 29|Map(a -> 1, c -> 3)|
| e| Esther| 32|Map(a -> 2, c -> 2)|
+---+-------+---+-------------------+
a
b
c
de
f
g
a -> 0
c -> 2
a -> 2
c -> 2
a -> 1
c -> 3
c -> 0
c -> 1
c -> 1

/ 20
a
b
c
d
GraphFrames – 最短経路の探索
15
// ユーザ “d”から“c” への最短経路を探索
scala> val path = friends.bfs.fromExpr("id = 'd'").toExpr("id = 'c'").run()
// 結果を表示
scala> path.show()
+------------+------------+------------+
| from| e0| v1|
+------------+------------+------------+
|[d,David,29]|[d,a,friend]|[a,Alice,34]|
+------------+------------+------------+
+------------+----------+------------+--------------+
| e1| v2| e2| to|
+------------+----------+------------+--------------+
|[a,b,friend]|[b,Bob,36]|[b,c,follow]|[c,Charlie,30]|
+------------+----------+------------+--------------+
a
b
c
de
f
g
Alice, 34
Bob, 36
Charlie, 30
David, 29
friend
follow
friend

/ 20
GraphFrames – その他の機能
• GraphFrames User Guide
16
http://graphframes.github.io/user-guide.html

/ 20
GraphFrames – ユースケース
• On-Time Flight Performance with GraphFrames for Apache Spark
17
https://databricks.com/blog/2016/03/16/on-time-flight-performance-with-spark-graphframes.html

/ 20
GraphFrames vs. Neo4j
18
引用：http://www.slideshare.net/SparkSummit/graphframes-graph-queries-in-spark-sql-by-ankur-dave

/ 20
GraphFrames × Spark 2.0
19
引用： http://www.slideshare.net/databricks/2016-spark-summit-east-keynote-matei-zaharia

/ 20
GraphFramesまとめ
• 高レイヤの分散グラフ処理API
• 高い生産性
• 高速な分散グラフ処理
• 今月リリースされたばかり
• まだまだ機能や情報は少ない
• 今後の発展や活用に期待
20

Spark GraphFrames のススメ

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Spark GraphFrames のススメ

Similar to Spark GraphFrames のススメ (20)

Spark GraphFrames のススメ