SlideShare a Scribd company logo
1 of 46
2 0 1 7 . 0 6
給 初 學 者 的 S p a r k 教 學
P o p c o r n y ( 陸 振 恩 )
Who am I
• 陸振恩 (popcorny)
• Director of Engineering @TenMax
• 之前經歷
– 交大資科所
– 第四屆趨勢百萬程式競賽冠軍
– 聯發科技 (2005- 2010)
– SmartQ (2011 – 2014)
– cacaFly/TenMax (2014-present)
• FB: https://fb.me/popcornylu
2
Target Audience
• 有基本的Java寫作能力
• 最好有Java8 Stream或是其他語言function language相關的基本概念 (map,
flatMap, filter, reduce, …)
• 還不會寫Spark,或是看過Spark的書還沒動手做的
3
Outline
• 了解Spark的基本常識
• 介紹Spark DataFrame/SQL
• 寫一個Spark Application
4
• 了解Spark的基本常識
• 介紹Spark DataFrame/SQL
• 寫一個Spark Application
5
Introduction to Spark
• Spark是一個分散式運算引擎
• MapReduce框架
• 以RDD為基礎 (Resilient Distributed Datasets)
6
Spark適合什麼
• 適合
– 大資料量的批次資料處理
– 流運算
– 各種資料量的ETL及資料分析
• 不適合
– RDBMS就可以解決你的需求的時候
7
Big Data ArchitectureDistributedFileSystem
DistributedFileSystem
Resource Manager
Computation Framework
Application Framework
Application
8
Hadoop Application
Pig / Hive
Hadoop ArchitectureHDFS
HDFS
YARN
Hadoop MapReduce V2
9
Spark ArchitectureDFS
DFS
YARN
Spark Application
Spark DataFrame,SQL / Stream / Mllib / GraphX
10
Spark
Spark Application
Driver
Executor Executor
Executor Executor
Spark
context
Executor
Application
spark-submit
application.jar
Cluster 11
Node
JVM Process
Spark RDD
• Resilient Distributed Dataset
• 可以把它想成是Java的Stream,只是分散式的版本
• 特色
– Lazy Evaluation: 只有在action被觸發時,才會真正運算,否則只是
建立關聯而已。
– Partitioned: 資料可以分成很多可以平行處理的partition
– Cachable: 運算過的資料可以暫存在executor中。
– Reusable: RDD可以被重複使用,相較的Java Stream就只能用一
次。
12
Spark RDD
• 處理資料不外乎Input, Transformation, Output
• 或是稱為ETL (Extract, Transformation, Load)
• 而Spark中
– Input是由spark context產生的RDD
– 由RDD可以產生一系列的transformation
– 最後執行一個action,會啟動整個pipeline,並且產生output到
action所對應的地方
13
Input
• 都是從spark context取得input的RDD
• sc.parallelize(list): 把一個list送到spark cluster
• sc.textFile(path): 從path取得一個文字檔
14
Simple Operations
• map(func): 一對一的轉換
T  U
• flatMap(func): 一對多的轉換
T  0..* U
• mapPartitions(func) : 多對多的轉換
T0..*  0..* U
• filter(func) : 過濾器
T  0..1T
15
Shuffle Operations (Single Source)
• groupByKey([numTasks]): 把同樣的key的資料串成一個list
(K, V)  (K, Iterable<V>)
• reduceByKey(func, [numTasks]): 把同樣的資料reduce起來
(K, V)  (K, V),
reducer (V,V)  V
• aggregateByKey(zeroValue, seqOp, combOp, [numTasks]): 把同樣的資料
reduce起來,但是透過accumulator
(K, V)  (K, U),
seqOp (U,V) -> U,
combOp (U,U)  U
• sortByKey([ancending], [numTasks]): 根據key排序
(K, V)  (K, V)
16
Shuffle Operations (Two Sources)
• cartesian(otherDataset, [numTasks]): 把兩邊的資料n x m種的完全配對。例
如撲克牌的4個花色 x 13個數字可以配對成整副牌。
T, U  (T, U)
• join(otherDataset, [numTasks]): 把同key的資料join起來,支援inner join,
left/right/full outer join
(K, V), (K, W)  (K, (V, W))
• cogroup(otherDataset, [numTasks]): 類似gropuByKey,只是是兩個
sources的版本
(K, V), (K, W)  (K, (Iterable<V>, Iterable<W>))
17
Repartition Operations
• repartition(numParitions): 單純shuffle
• coalesce(numParitions): 不會shuffle,只是減少partition數量
18
Actions
• 寫檔案
– saveAsTextFile(path)
• 傳回driver
– first(): 取得第一筆
– take(n): 取得前n筆
– collect(): 取得所有的結果
– count(): 算結果有幾筆
– reduce(func): 用一個reducer去收資料
• 直接在exectuor內部執行
– foreach(func): 直接在executor中一個一個item callback
19
Word Count
20
RDD Graph
21
RDD Graph
TASK
STAGE
JOB
22
Shuffle
• 資料交換的動作
• 資料必須要先有key, value
• 用key來分群
• 同一個key的一定被分到同一個partition
• 這東西其實就是MapReduce在做的事情
23
Shuffle
24Source: MapReduce Shuffle原理 与 Spark Shuffle原理
Job, Stage, Task
• Application由spark-submit產生
• Job由action operation產生
• Stage由shuffle operation產生,不同stage可以有不同的task數量。
• Task由shuffle operation的tasks或由input partition來決定數量,為平行
處理中最小不可切割的任務。
Cluster Application Job Stage Task
1 * 1 * 1 * 1 *
25
Operations
groupByKey
reduceByKey
aggregateByKey
repartition
map
flatMap
mapPartitions
filter
cartesian
join
cogroup
foreach
foreachPartitions
DistributedFileSystem
DistributedFileSystem
sc.textFile
sc.xxxFile
Driver Program
saveAsTextFile
saveAsXxxFile
sc.parallelized collect first
take count
reduce
26
• 了解Spark的基本常識
• 了解Spark DataFrame/SQL常用操作
• 寫一個Spark Application
27
Spark DataFrame & Dataset
• DataFrame
– 就像是RDBMS的table
– 有Schema,並且可以是巢狀的
– Dataset<Row>
– 一筆資料由很多columns所組成
• Dataset
– Dataset<T>
– Typed dataset
28
Reader and Writer
• Input/output的來源
– RDD
– File
• 支援的格式
– CSV
– Json
– Parquet (推薦)
29
DataFrame Operations
• select(column…)
• distinct()
• join(right, column)
• where(column)
• groupBy(columns…)
• agg(column...)
• orderBy(column…)
30
DataFrame Functions
• Import org.apache.spark.sql.functions
• Normal Functions
– col(name)
• Aggregation Functions
– min(column)
– max(column)
– count(column)
– sum(column)
– avg(column)
31
DataFrame Operations
32
• Data
DataFrame Operations
33
• SQL
Select year, region, sum(people_total) as people_total
from population group by year, region order by people_total desc
• Spark Dataframe
DataFrame Schema
• 定義Schema
– JavaBean, Encoder
– 程式化指定
– Metastore (Hive)
– 從檔案內容去推導schema
• 檢查Schema
– df.printSchema()
34
Spark SQL
• 用SQL語法來query dataframe
• SQL本身是一個declarative語言,所以內建優化引擎,把它變成phisicial的
dataframe operations
• Output則是另外一個dataframe
35
Spark SQL
36
1. 了解Spark的基本常識
2. 了解Spark DataFrame/SQL常用操作
3. 寫一個Spark Application
37
Spark Application
• 包裝在一個application jar
• 透過spark-submit來執行程式
• Submit需要指定master
• Master代表的是一個resource manager或說是cluster manager。
Submit之後會在整個resource manager取得所需要的資源
• Spark application透過spark context跟這些資源互動
38
Uber jar
• 因為spark application jar需要傳到各個executer執行,所以要怎麼把用到的
library也傳過去?
• 把所用到的jar檔解開,必且直接包在application jar,這種方法就叫做uber jar
• 或稱fat jar或shadow jar
39
Spark Template Project
• https://github.com/popcornylu/spark-wordcount
• Commands
– Application Jar:
./gradlew jar
spark-submit –master local[*] build/libs/spark-wordcount.jar
– Application uber jar
./gradlew shadowJar
spark-submit –master local[*] build/libs/spark-wordcount-
all.jar
40
Resource Manager
• Local
• Standalone cluster
• YARN cluster
• Mesos cluster
41
Spark Web UI
• 預設在跑spark application的時候可以啟動WebUI (port: 4040, 4041,….)
• 可以用來看Job, Stage, Task的進度
• Debug好工具
42
History Server
• WebUI只能看到正在執行的spark application
• 但是可以透過history server已經結束的application的紀錄
43
Configurations
• conf/log4.properties: Log Configuration。可以把預設log level從INFO改
成WARN
• conf/core-site.xml: File System Configuration。如果有用到DFS要在這
邊設定。
• conf/spark-default.xml: Default Application Configuration。例如預設的
master,或是預設要記錄history都要在這邊設定
• conf/spark-env.sh: Default Environment Variable。主要是各個daemon執
行的環境變數。
44
Recap
• Spark是一個分散式的運算引擎
• 由RDD所構成,有Input, Transformations, Action
• 執行一個Action換產生Job,一個Job可能有很多Stages,每個Stages有不一樣
的task數量
• Shuffle的原理
• Spark DataFrame跟Spark SQL
• 如何寫一個Spark Application
45
46

More Related Content

What's hot

Understand more about C
Understand more about CUnderstand more about C
Understand more about CYi-Hsiu Hsu
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDBLinaro
 
パッケージングの呼び声 Python Charity Talks in Japan 2021.02
パッケージングの呼び声 Python Charity Talks in Japan 2021.02パッケージングの呼び声 Python Charity Talks in Japan 2021.02
パッケージングの呼び声 Python Charity Talks in Japan 2021.02Atsushi Odagiri
 
Composer 從入門到實戰
Composer 從入門到實戰Composer 從入門到實戰
Composer 從入門到實戰Shengyou Fan
 
nioで作ったBufferedWriterに変えたら例外になった
nioで作ったBufferedWriterに変えたら例外になったnioで作ったBufferedWriterに変えたら例外になった
nioで作ったBufferedWriterに変えたら例外になったchibochibo
 
OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?ScyllaDB
 
研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法Takeshi Yamamuro
 
C16 45分でわかるPostgreSQLの仕組み by 山田努
C16 45分でわかるPostgreSQLの仕組み by 山田努C16 45分でわかるPostgreSQLの仕組み by 山田努
C16 45分でわかるPostgreSQLの仕組み by 山田努Insight Technology, Inc.
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Josef A. Habdank
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at FacebookDatabricks
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkKazuaki Ishizaki
 
Binary exploitation - AIS3
Binary exploitation - AIS3Binary exploitation - AIS3
Binary exploitation - AIS3Angel Boy
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overviewJulian Hyde
 
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale PlatformsBest Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale PlatformsDatabricks
 
Apache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームApache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームKouhei Sutou
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllMichael Mior
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyDaniel Bimschas
 

What's hot (20)

Understand more about C
Understand more about CUnderstand more about C
Understand more about C
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDB
 
パッケージングの呼び声 Python Charity Talks in Japan 2021.02
パッケージングの呼び声 Python Charity Talks in Japan 2021.02パッケージングの呼び声 Python Charity Talks in Japan 2021.02
パッケージングの呼び声 Python Charity Talks in Japan 2021.02
 
Learn C Programming Language by Using GDB
Learn C Programming Language by Using GDBLearn C Programming Language by Using GDB
Learn C Programming Language by Using GDB
 
Composer 從入門到實戰
Composer 從入門到實戰Composer 從入門到實戰
Composer 從入門到實戰
 
nioで作ったBufferedWriterに変えたら例外になった
nioで作ったBufferedWriterに変えたら例外になったnioで作ったBufferedWriterに変えたら例外になった
nioで作ったBufferedWriterに変えたら例外になった
 
OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?
 
研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法研究動向から考えるx86/x64最適化手法
研究動向から考えるx86/x64最適化手法
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
C16 45分でわかるPostgreSQLの仕組み by 山田努
C16 45分でわかるPostgreSQLの仕組み by 山田努C16 45分でわかるPostgreSQLの仕組み by 山田努
C16 45分でわかるPostgreSQLの仕組み by 山田努
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
 
Binary exploitation - AIS3
Binary exploitation - AIS3Binary exploitation - AIS3
Binary exploitation - AIS3
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overview
 
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale PlatformsBest Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale Platforms
 
Apache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームApache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォーム
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
 
Embedded Virtualization applied in Mobile Devices
Embedded Virtualization applied in Mobile DevicesEmbedded Virtualization applied in Mobile Devices
Embedded Virtualization applied in Mobile Devices
 

Similar to 給初學者的Spark教學

Linux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledgeLinux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledgeAngel Boy
 
Spark introduction - In Chinese
Spark introduction - In ChineseSpark introduction - In Chinese
Spark introduction - In Chinesecolorant
 
D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版Jackson Tian
 
D2_Node在淘宝的应用实践
D2_Node在淘宝的应用实践D2_Node在淘宝的应用实践
D2_Node在淘宝的应用实践Jackson Tian
 
Introduction of Spark by Wang Haihua
Introduction of Spark by Wang HaihuaIntroduction of Spark by Wang Haihua
Introduction of Spark by Wang HaihuaWang Haihua
 
Node.js在淘宝的应用实践
Node.js在淘宝的应用实践Node.js在淘宝的应用实践
Node.js在淘宝的应用实践taobao.com
 
Zh tw introduction_to_map_reduce
Zh tw introduction_to_map_reduceZh tw introduction_to_map_reduce
Zh tw introduction_to_map_reduceTrendProgContest13
 
Study4.TW .NET Conf 2018 - Fp in c#
Study4.TW .NET Conf 2018  - Fp in c#Study4.TW .NET Conf 2018  - Fp in c#
Study4.TW .NET Conf 2018 - Fp in c#Chieh Kai Yang
 
分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocess分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocessbabel_qi
 
[students AI workshop] Pytorch
[students AI workshop]  Pytorch[students AI workshop]  Pytorch
[students AI workshop] PytorchTzu-Wei Huang
 
Linux Tracing System 浅析 & eBPF框架开发经验分享
Linux Tracing System 浅析 & eBPF框架开发经验分享Linux Tracing System 浅析 & eBPF框架开发经验分享
Linux Tracing System 浅析 & eBPF框架开发经验分享happyagan
 
Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoopTianwei Liu
 
Hadoop学习总结
Hadoop学习总结Hadoop学习总结
Hadoop学习总结ordinary2012
 
ElasticSearch Training#2 (advanced concepts)-ESCC#1
ElasticSearch Training#2 (advanced concepts)-ESCC#1ElasticSearch Training#2 (advanced concepts)-ESCC#1
ElasticSearch Training#2 (advanced concepts)-ESCC#1medcl
 
Python 于 webgame 的应用
Python 于 webgame 的应用Python 于 webgame 的应用
Python 于 webgame 的应用勇浩 赖
 
R統計軟體 -安裝與使用
R統計軟體 -安裝與使用R統計軟體 -安裝與使用
R統計軟體 -安裝與使用Person Lin
 
基于Symfony框架下的快速企业级应用开发
基于Symfony框架下的快速企业级应用开发基于Symfony框架下的快速企业级应用开发
基于Symfony框架下的快速企业级应用开发mysqlops
 
開發流程與工具介紹
開發流程與工具介紹開發流程與工具介紹
開發流程與工具介紹Shengyou Fan
 
Ruby Rails 老司機帶飛
Ruby Rails 老司機帶飛Ruby Rails 老司機帶飛
Ruby Rails 老司機帶飛Wen-Tien Chang
 
探索 ISTIO 新型 DATA PLANE 架構 AMBIENT MESH - GOLANG TAIWAN GATHERING #77 X CNTUG
探索 ISTIO 新型 DATA PLANE 架構 AMBIENT MESH - GOLANG TAIWAN GATHERING #77 X CNTUG探索 ISTIO 新型 DATA PLANE 架構 AMBIENT MESH - GOLANG TAIWAN GATHERING #77 X CNTUG
探索 ISTIO 新型 DATA PLANE 架構 AMBIENT MESH - GOLANG TAIWAN GATHERING #77 X CNTUGYingSiang Geng
 

Similar to 給初學者的Spark教學 (20)

Linux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledgeLinux binary Exploitation - Basic knowledge
Linux binary Exploitation - Basic knowledge
 
Spark introduction - In Chinese
Spark introduction - In ChineseSpark introduction - In Chinese
Spark introduction - In Chinese
 
D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版D2_node在淘宝的应用实践_pdf版
D2_node在淘宝的应用实践_pdf版
 
D2_Node在淘宝的应用实践
D2_Node在淘宝的应用实践D2_Node在淘宝的应用实践
D2_Node在淘宝的应用实践
 
Introduction of Spark by Wang Haihua
Introduction of Spark by Wang HaihuaIntroduction of Spark by Wang Haihua
Introduction of Spark by Wang Haihua
 
Node.js在淘宝的应用实践
Node.js在淘宝的应用实践Node.js在淘宝的应用实践
Node.js在淘宝的应用实践
 
Zh tw introduction_to_map_reduce
Zh tw introduction_to_map_reduceZh tw introduction_to_map_reduce
Zh tw introduction_to_map_reduce
 
Study4.TW .NET Conf 2018 - Fp in c#
Study4.TW .NET Conf 2018  - Fp in c#Study4.TW .NET Conf 2018  - Fp in c#
Study4.TW .NET Conf 2018 - Fp in c#
 
分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocess分布式流数据实时计算平台 Iprocess
分布式流数据实时计算平台 Iprocess
 
[students AI workshop] Pytorch
[students AI workshop]  Pytorch[students AI workshop]  Pytorch
[students AI workshop] Pytorch
 
Linux Tracing System 浅析 & eBPF框架开发经验分享
Linux Tracing System 浅析 & eBPF框架开发经验分享Linux Tracing System 浅析 & eBPF框架开发经验分享
Linux Tracing System 浅析 & eBPF框架开发经验分享
 
Kmeans in-hadoop
Kmeans in-hadoopKmeans in-hadoop
Kmeans in-hadoop
 
Hadoop学习总结
Hadoop学习总结Hadoop学习总结
Hadoop学习总结
 
ElasticSearch Training#2 (advanced concepts)-ESCC#1
ElasticSearch Training#2 (advanced concepts)-ESCC#1ElasticSearch Training#2 (advanced concepts)-ESCC#1
ElasticSearch Training#2 (advanced concepts)-ESCC#1
 
Python 于 webgame 的应用
Python 于 webgame 的应用Python 于 webgame 的应用
Python 于 webgame 的应用
 
R統計軟體 -安裝與使用
R統計軟體 -安裝與使用R統計軟體 -安裝與使用
R統計軟體 -安裝與使用
 
基于Symfony框架下的快速企业级应用开发
基于Symfony框架下的快速企业级应用开发基于Symfony框架下的快速企业级应用开发
基于Symfony框架下的快速企业级应用开发
 
開發流程與工具介紹
開發流程與工具介紹開發流程與工具介紹
開發流程與工具介紹
 
Ruby Rails 老司機帶飛
Ruby Rails 老司機帶飛Ruby Rails 老司機帶飛
Ruby Rails 老司機帶飛
 
探索 ISTIO 新型 DATA PLANE 架構 AMBIENT MESH - GOLANG TAIWAN GATHERING #77 X CNTUG
探索 ISTIO 新型 DATA PLANE 架構 AMBIENT MESH - GOLANG TAIWAN GATHERING #77 X CNTUG探索 ISTIO 新型 DATA PLANE 架構 AMBIENT MESH - GOLANG TAIWAN GATHERING #77 X CNTUG
探索 ISTIO 新型 DATA PLANE 架構 AMBIENT MESH - GOLANG TAIWAN GATHERING #77 X CNTUG
 

More from Chen-en Lu

TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingChen-en Lu
 
網路廣告的基本架構
網路廣告的基本架構網路廣告的基本架構
網路廣告的基本架構Chen-en Lu
 
From Java Stream to Java DataFrame
From Java Stream to Java DataFrameFrom Java Stream to Java DataFrame
From Java Stream to Java DataFrameChen-en Lu
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Chen-en Lu
 
Gradle起步走: 以CLI Application為例 @ JCConf 2014
Gradle起步走: 以CLI Application為例 @ JCConf 2014Gradle起步走: 以CLI Application為例 @ JCConf 2014
Gradle起步走: 以CLI Application為例 @ JCConf 2014Chen-en Lu
 
Introduction to rtb and retargeting
Introduction to rtb and retargetingIntroduction to rtb and retargeting
Introduction to rtb and retargetingChen-en Lu
 

More from Chen-en Lu (6)

TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
 
網路廣告的基本架構
網路廣告的基本架構網路廣告的基本架構
網路廣告的基本架構
 
From Java Stream to Java DataFrame
From Java Stream to Java DataFrameFrom Java Stream to Java DataFrame
From Java Stream to Java DataFrame
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
 
Gradle起步走: 以CLI Application為例 @ JCConf 2014
Gradle起步走: 以CLI Application為例 @ JCConf 2014Gradle起步走: 以CLI Application為例 @ JCConf 2014
Gradle起步走: 以CLI Application為例 @ JCConf 2014
 
Introduction to rtb and retargeting
Introduction to rtb and retargetingIntroduction to rtb and retargeting
Introduction to rtb and retargeting
 

給初學者的Spark教學