1. DataCon.TW 2019 Opening
Data Engineering in Taiwan:
PAST, NOW and FUTURE
Jazz Yao-Tsung Wang
Initiator and Chair of Taiwan Data Engineering Association
Co-Founder of Taiwan Hadoop User Group
Shared at 2019-09-06 DataCon.TW 2019
@ NTUH International Convention Center
4. 4
2018-10-03
What the Cloudera and Hortonworks merger means
Will you still believe
after DataCon.TW 2019
??
2019-08-06
HPE acquires the business assets of MapR2018-08-02
Arm acquires Treasure Data
to set the stage for IoT transformation
5. 5
Hit the bottom yet?
https://www.facebook.com/photo.php?fbid=10205989049499660&set=a.3351956552057&type=3&theater
6. 6
Data Talents are moving to Cloud Providers
“解讀雲端大數據新趨勢 ”, Jazz Yao-Tsung Wang, 2018-05-16 @ iThome Cloud Summit 2018
https://www.slideshare.net/jazzwang/ss-97231624/19
8. 沒有妳(你)們的支持,就沒有今日的盛況!
Thanks for your support ~ DataCon.TW 2019 Committees
8
協會秘書長 Angie Chang 協會常務理事 Anna Yen
情義相挺小編 Kai-Ting Kao ASF member
葉祐欣 Evans Ye
超人氣講師 郭二文
ErhWen Kuo
協會秘書 徐薇妮 Winnie
協會常務理事
Bryan Yang
12. 12
“Gold mine” model of Data-driven Adoption Project
Source: “Big Data Projet Management the Body of Knowledge (BDPMBOK)”, Jazz Wang, 2015-12-09 Big Data Conference
https://www.slideshare.net/jazzwang/big-data-projet-management-the-body-of-knowledge-bdpmbok/12
Gold Mine
(Data)
Royalty
(Access Right)
Fineness of Gold
(Value of Data)
Refinery
(Data Platform)
TCO
(Total cost of ownership)
Global Gold Price
(Value to Customer)
13. 13
“Six Thinking Hats” of Data-driven Adoption Project
Source: “Big Data Projet Management the Body of Knowledge (BDPMBOK)”, Jazz Wang, 2015-12-09 Big Data Conference
https://www.slideshare.net/jazzwang/big-data-projet-management-the-body-of-knowledge-bdpmbok/12
15. 15
商務問題 Problem
良率改善 Yield Rate Improvement → 機臺匹配 → 健康診斷
降低營運成本 OPEX ↓
數據驅動 Data-Driven? Yes
資料來源 Data Source
內部資料 Internal Data
資料型態 Variety: Sensor , Image, Log
法規限制 Legality
擁有權 Ownership: Yes
處理權 Process Right: Yes
使用權 Access Right: Yes
含金度 Fineness
可靠度 Veracity: High (6 sigma)
內含答案: Yes (根據以前機台經驗推論)
平台架構 Platform
架構 Lambda Architecture (資料先落地整理後再分析)
技術挑戰: (生) High Data Write Throughput
(析) 多變量 Multivariate -- too many columns
維運成本 TCO
人: 開發 Dev | 維運 Ops | 分析 Analysis | 決策 Expert
流程: 蒐集資料 → 前處理(整理) → 分析 → 建模/預測 → 反饋
技術: Hadoop/HBase → SPSS/SAS/R
永續條件: TCO << Diff of Lost(Yield Rate)
16. 16
High Level Date Pipeline in Semiconductor
Source: “製造業生產歷程全方位整合查詢與探勘的規劃心法 ”, Jazz Wang, 2015-05-20
https://www.slideshare.net/jazzwang/20150520-final