從大數據走向人工智慧

從大數據走向人工智慧
發揮大數據的價值，擁抱機器學習及人工智慧
陳昇瑋
台灣人工智慧學校執行長
中央研究院資訊科學研究所研究員

陳昇瑋 / 從大數據走向人工智慧 3
Academia Sinica
= Chinese Academy
32 research institutes in 3 major divisions
1) mathematics, physics, and applied sciences;
2) life sciences;
3) humanities and social sciences.
1,000 tenure-tracked research fellows
6,000 assistants, students, and technicians

陳昇瑋 / 從大數據走向人工智慧
中央研究院資訊科學研究所
4

組成
40 位研究員
30 位博士後研究員
300 位研究助理
研究領域
演算法資料科學智慧型代理人
語音處理中文認知多媒體
生物資訊系統技術機器學習

Since 2002 (my first PhD year) …
PhD dissertation: based on a 20-hour game packet trace
Collaboration & Consulting
製造業
電信業
社群網路 / 遊戲
銀行 / 壽險 / 電子票証
中央 / 地方政府
資料分析這條路
14

好，進入正題
16

Change is the only constant
-Heraclitus (535 BC - 475 BC)

3 Major Trends in Data Science
Big Data
Deep
Learning
Beyond
Prediction

3V Explained

Collecting & storing data is much economic now
New types and wide deployed sensors
Advances in machine learning (esp. for analyzing
unstructured data)
Why Big Data?

See through walls with WiFi!
34
applies to 8” concrete walls,
6” hollow walls, and
1.75” solid wooden doors.

Computer Vision Matters
Safety Health Security
Comfort AccessFun
(Slide Credit: Jia-Bin Huang)

Food Recognition
37

Matching street clothing photos in online
Shops
Street2shop, ICCV 2015 (Slide Credit: Jia-Bin Huang)

Computer vision in sports
SportVision: improving viewer experiences

Player tracking

Second Spectrum: visual analytics

ReplayTechnologies: improving viewer experiences

Computer vision for healthcare
Video magnification

https://www.youtube.com/watch?v=QbXgEbeceJI
(Credit: Jia-Bin Huang)

#2
DEEP
LEARNING

“Deep Learning” search trend
Sep 2015 Sep 2016 Sep 2017

Machine Learning
50
A type of algorithms that gives computers the
ability to learn rules from experience, rather than
being hard coded.
Find the common patterns
from the left waveforms
It seems impossible to
write a program for
speech recognition
你好你好
你好你好
You quickly get lost in the
exceptions and special cases.
(Slide Credit: Hung-Yi Lee)

Let the machine learn by itself
你好
大家好
人帥真好
You said
“你好”
A large amount of
audio data
You only have to write the
learning algorithm ONCE
Derive rules
from datasets

Patterns learned by machine
53

Multi-layer patterns learned from faces
54

ML Arxiv Papers per Year
(Slide Credit: Google Brain)

陳昇瑋 / 從大數據走向人工智慧 (Slide Credit: Hung-Yi Lee)

61http://technews.tw/2017/05/05/updating-google-maps-with-deep-learning-and-street-view/

62http://technews.tw/2017/05/05/updating-google-maps-with-deep-learning-and-street-view/

Deep learning can be highly flexible
• Speech Recognition
• Handwritten Recognition
• Playing Go
• Dialogue System
( )=*f
( )=*f
( )=*f
( )=*f
“2”
“Morning”
“5-5”
“Hello”“Hi”
(what the user said) (system response)
(step)

Deep Learning is a subset of
Representation Learning
67

Human Brains

69
An Artificial Neuron
z
1w
2w
Nw
…
1x
2x
Nx
+
b
( )zσ
( )zσ
zbias
a
( ) z
e
z −
+
=
1
1
σ
Sigmoid function
Each neuron is a function
Activation
function

Artificial Neural Network
( )zσ+
( )zσ+
( )zσ+
( )zσ+

Output
LayerHidden
Layers
Input
Layer
Fully Connect Feedforward Network
Input Output
1x
2x
Layer 1
……
Nx
……
Layer 2
……
Layer L
……
……
……
……
……
y1
y2
yM
Deep means many hidden layers
neuron

Visual Question Answering
source: http://visualqa.org/

Word Embedding
73

Word Embedding
74

Word Vector
Source: http://www.slideshare.net/hustwj/cikm-keynotenov2014

Word Vector
Characteristics
Solving analogies
𝑉𝑉 ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 − 𝑉𝑉 ℎ𝑜𝑜𝑜𝑜 ≈ 𝑉𝑉 𝑏𝑏𝑏𝑏 𝑏𝑏 𝑏𝑏𝑏𝑏𝑏𝑏 − 𝑉𝑉 𝑏𝑏𝑏𝑏 𝑏𝑏
𝑉𝑉 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 − 𝑉𝑉 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 ≈ 𝑉𝑉 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 − 𝑉𝑉 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺
𝑉𝑉 𝑘𝑘𝑘𝑘𝑘𝑘𝑘𝑘 − 𝑉𝑉 𝑞𝑞𝑞𝑞𝑞𝑞𝑞𝑞𝑞𝑞 ≈ 𝑉𝑉 𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 − 𝑉𝑉 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎
Rome : Italy = Berlin : ?
Compute 𝑉𝑉 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 − 𝑉𝑉 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 + 𝑉𝑉 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼
Find the word w with the closestV(w)

Machine Reading
Machine learn the meaning of words from reading a lot
of documents without supervision
Machine learns to
understand netizens via
reading the posts on PTT

DEEP NEURAL NETWORKS: AN
EXTREMELY SHORT TUTORIAL
79

𝜎𝜎
Matrix Operation
2y
1y1
-2
1
-1
1
0
4
-2
0.98
0.12
1
−1
1 −2
−1 1
+
1
0
0.98
0.12
=
1
-1
4
−2 (Slide Credit: Hung-Yi Lee)
𝜎𝜎

1x
2x
……
Nx
……
……
……
……
……
……
……
y1
y2
yM
Neural Network
W1 W2 WL
b2 bL
x a1
a2 y
b1
W1 x +𝜎𝜎
b2
W2 a1 +𝜎𝜎
bL
WL +𝜎𝜎 aL-1
b1

= 𝜎𝜎 𝜎𝜎
1x
2x
……
Nx
……
……
……
……
……
……
……
y1
y2
yM
Neural Network
W1 W2 WL
b2 bL
x a1
a2 y
y = 𝑓𝑓 x
b1
W1 x +𝜎𝜎 b2
W2 + bL
WL +…
b1
…
Using parallel computing techniques
to speed up matrix operation

Output Layer
as Multi-Class Classifier
……
……
……
……
……
…… ……
……
y1
y2
yM
Kx
Output
LayerHidden Layers
Input
Layer
x
1x
2x
Feature extractor replacing
feature engineering
= Multi-class
Classifier
Softmax

Example Application
Input Output
16 x 16 = 256
1x
2x
256x
……
Ink → 1
No ink → 0
……
y1
y2
y10
Each dimension represents
the confidence of a digit.
is 1
is 2
is 0
……
0.1
0.7
0.2
The image
is “2”

Example Application
• Handwriting Digit Recognition
Machine “2”
1x
2x
256x
……
……
y1
y2
y10
is 1
is 2
is 0
……
What is needed is a
function ……
Input:
256-dim vector
output:
10-dim vector
Neural
Network

Sheng-Wei Chen / From Data Science to Artificial Intelligence
Learning to do XOR
(Credit: Deep Learning Book)
ReLU (Rectified Linear Unit)
Y = aX + b?

XW + c =
ReLU(XW + c) = w (ReLU(XW + c)) =
ReLU (Rectified Linear Unit)

XW + c =
ReLU(XW + c) = w (ReLU(XW + c)) =

TensorFlow Playground
http://playground.tensorflow.org/

Modularization
• Deep → Modularization
1x
2x ……
Nx
……
……
……
……
……
……
The most basic
classifiers
Use 1st layer as module
to build classifiers
Use 2nd layer as
module ……
The modularization is
automatically learned from data.
→ Less training data?

Modularization - Image
• Deep → Modularization
1x
2x
……
Nx
……
……
……
……
……
……
The most basic
classifiers
Use 1st layer as module
to build classifiers
Use 2nd layer as
module ……
Reference: Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding
convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833)

93
(Credit: https://www.slideshare.net/WillStanton/deep-learning-with-text-v4)

94(Credit: Deep Learning Book)

Fat + Short v.s. Thin + Tall
1x 2x …… Nx
Deep
1x 2x …… Nx
……
Shallow
Which one is better?
The same number
of parameters

Fat + Short v.s. Thin + Tall
Seide, Frank, Gang Li, and Dong Yu. "Conversational Speech Transcription
Using Context-Dependent Deep Neural Networks." Interspeech. 2011.
Layer X Size
Word Error
Rate (%)
Layer X Size
Word Error
Rate (%)
1 X 2k 24.2
2 X 2k 20.4
3 X 2k 18.4
4 X 2k 17.8
5 X 2k 17.2 1 X 3772 22.5
7 X 2k 17.1 1 X 4634 22.6
1 X 16k 22.1

陳昇瑋 / 網路購書大數據
Deep models tend to perform better
97

A Straightforward Answer
• Do Deep Nets Really Need To Be Deep? (by Rich Caruana)
• http://research.microsoft.com/apps/video/default.aspx?id=
232373&r=1
keynote of Rich Caruana at ASRU 2015

陳昇瑋 / 網路購書大數據
Deep Neural Networks
99
1. Deep = many layers
2. Deep = hierarchical of concepts

Hand-crafted
kernel function
SVM
Source of image: http://www.gipsa-lab.grenoble-
inp.fr/transfert/seminaire/455_Kadri2013Gipsa-lab.pdf
Apply simple
classifier
Deep Learning
1x
2x
…
Nx
…
…
…
y1
y2
yM
…
……
……
……
𝑥𝑥
𝜙𝜙 𝑥𝑥
simple
classifier
Learnable kernel

Boosting
Deep Learning
1x
2x
…
Nx
…
…
…
Input
𝑥𝑥
Weak
classifier
Weak classifier
Combine
Weak classifier
Weak classifier
……
Boosted weak
classifier
Boosted Boosted
weak classifier
……
……
……

Does not “memorize” millions of viewed samples
Extracts greatly reduced number of features that
are vital to classify different classes of data
Classifying data becomes a simple task when the
features measured are “”good”
What do DNNs learn?
(Slide Credit: AlbertY. C. Chen)

103

AI Winter (1970-1980, 1990-2000)
https://www.mynewsdesk.com/toshiba-global/blog_posts/bringing-the-new-ai-era-to-life-the-
researchers-creating-toshibas-technologies-55589

1958: Perceptron (linear model)
1969: Perceptron has limitation
1980s: Multi-layer perceptron
Do not have significant difference from DNN today
1986: Backpropagation
Usually more than 3 hidden layers is not helpful
1989: 1 hidden layer is “good enough”, why deep?
2006: RBM initialization
2009: GPU
2011: Start to be popular in speech recognition
2012: win ILSVRC image competition
2015.2: Image recognition surpassing human-level performance
2016.3: Alpha GO beats Lee Sedol
2016.10: Speech recognition system as good as humans
Ups and downs of Deep Learning

What was actually wrong with backprop
in 1986?
We all drew the wrong conclusions about why it
failed. The real reasons were:
Our labeled datasets were thousands of times too
small.
Our computers were millions of times too slow.
We initialized the weights in a stupid way.
We used the wrong type of non-linearity.
107
(Credit: Geoff Hinton,What Was ActuallyWrongWith Backpropagation in 1986?)

人工智慧近年突破的主因之一：運算能力
108
顯示卡的運算力未來
將持續大幅提升
-- GPUTechnique Conference 2017/10/26

109

110
-- GPUTechnique Conference 2017/10/26
一張消費級顯示卡的價格
落在 500 – 700 美元

111
手寫數字辨識 in 1993
影像物件辨識 2016
https://pjreddie.com/darknet/yolo/

NN breakthroughs since 1970’s
Network structure
Optimization method
Dataset
Computation power!
More modeling variants
113

1. Better Network Structure
Convolutional Neural Network greatly reduces the number of variables in
NN’s designed for images and videos. —> Improved convergence speed,
reduced data requirements.
Upper-left corner
Bird Beak Detector
Center Bird Beak
Detector
Almost identical, can be shared across regions

Feed-Forward Neural Networks (FNN)
MNIST digit data:
28x28 images
# parameters: 784 *
15 + 15 * 10 = 11,910

Convolutional
Neural Networks
Convolution
Max Pooling
Convolution
Max Pooling
Flatten
Can repeat
many times
 Some patterns are much
smaller than the whole image
The same patterns appear in
different regions.
Subsampling the pixels will
not change the object
Property 1
Property 2
Property 3

Convolution – 1D
117

Convolution – 2D
118

CNN – Convolution
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
3 -1 -3 -1
-3 1 0 -3
-3 -3 0 1
3 -2 -2 -1
stride=1
Property 2

CNN – Max Pooling
3 -1 -3 -1
-3 1 0 -3
-3 -3 0 1
3 -2 -2 -1
-1 1 -1
-1 1 -1
-1 1 -1
Filter 2
-1 -1 -1 -1
-1 -1 -2 1
-1 -1 -2 1
-1 0 -4 3
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1

CNN – Max Pooling
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
3 0
13
-1 1
30
2 x 2 image
Each filter
is a channel
New image
but smaller
Conv
Max
Pooling

1. Network Structure
Each filter of C1 has a 5x5 receptive field in the input layer
(5*5+1)*6=156 parameters to learn for the 1st layer
If it was fully connected we had (32*32+1)*(28*28)*6 = 4,821,600
parameters

2. Improved Activation Functions
Large
input
Small
output
……
……
……
……
……
……
……
……
y1
y2
yM

ReLU
Rectified Linear Unit (ReLU)
Reasons:
1. Fast to compute
2. Biological reason
3. Infinite sigmoid
with different biases
4.Vanishing gradient
problem
𝑧𝑧
𝑎𝑎
𝑎𝑎 = 𝑧𝑧
𝑎𝑎 = 0
𝜎𝜎 𝑧𝑧
[Xavier Glorot, AISTATS’11]
[Andrew L. Maas, ICML’13]
[Kaiming He, arXiv’15]

w1
w2
Clipping
[Razvan Pascanu, ICML’13]
3. Effective Backpropagation

𝐿𝐿 𝜃𝜃 = �
𝑛𝑛=1
𝑁𝑁
𝑙𝑙 𝑛𝑛
𝜃𝜃
𝜕𝜕𝐿𝐿 𝜃𝜃
𝜕𝜕𝑤𝑤
= �
𝑛𝑛=1
𝑁𝑁
𝜕𝜕𝑙𝑙 𝑛𝑛
𝜃𝜃
𝜕𝜕𝑤𝑤
𝑥𝑥1
𝑥𝑥2
𝑦𝑦1
𝑦𝑦2
xn NN
𝜃𝜃
yn
�𝑦𝑦 𝑛𝑛
𝑙𝑙 𝑛𝑛
3. Effective Backpropagation

3. Efficient Training Methods
Mini-batch
Adaptive Learning
Rate
Dropout
Batch-normalization With minibatchWithout minibatch
Optimization
after 1 epoch

Dropout
Training:
 Each time before updating the parameters
 Each neuron has p% to dropout

Dropout
Training:
 Each time before updating the parameters
 Each neuron has p% to dropout
 Using the new network for training
The structure of the network is changed.
Thinner!
For each mini-batch, we resample the dropout neurons

Dropout can be considered a form of
bagging
130
Bagging stands for bootstrap aggregating
A technique of model averaging / ensemble models for
reducing generalization errors

Deep Neural Networks (DNN)
are way more complex and capable!

8 layers
19 layers
22 layers
AlexNet (2012) VGG (2014) GoogleNet (2014)
16.4%
7.3%
6.7%
http://cs231n.stanford.e
du/slides/winter1516_lec
ture8.pdf

AlexNet
(2012)
VGG
(2014)
GoogleNet
(2014)
152 layers
3.57%
Residual Net
(2015)
Taipei
101
101 layers
16.4%
7.3% 6.7%
Special
structure
Ref:
https://www.youtube.com/watch?v=
dxB6299gpvI

What do CNNs learn?
Neurons act like “custom-trained filters”;
react to very different visual cues, depending on data.

What do CNNs learn?
Neurons act like “custom-trained filters”; react to
very different visual cues, depending on data.

Edge & Blob
137
http://vision03.csail.mit.edu/cnn_art/data/single_layer.png
(like receptive fields inV1 neurons)

Texture
138

Object Parts
139

Object Classes
140

如果你想 “深度學習深度學習”
“Deep Learning”
Written by Yoshua Bengio,
Ian J. Goodfellow and Aaron Courville
http://www.iro.umontreal.ca/~bengioy/
dlbook/
“Neural Networks and Deep Learning”
written by Michael Nielsen
http://neuralnetworksanddeeplearning.com/
Course: Machine learning and having it deep and
structured
http://speech.ee.ntu.edu.tw/~tlkagk/courses_MLSD15_
2.html

#3. BEYOND
ORDINARY SUPERVISED
PREDICTION
143
BEYOND
PREDICTION MODELING

Time series forecasting
144

STRUCTURED LEARNING
147

Auto Coloring
151
https://paintschainer.preferred.tech/index_zh.html
https://zhuanlan.zhihu.com/p/24712438

Colorful Image Colorization
152
Zhang, Richard, Phillip Isola, and Alexei A. Efros. "Colorful image colorization." European
Conference on Computer Vision. Springer International Publishing, 2016.

Colorful Image Colorization
153
http://richzhang.github.io/colorization/
A 313-class classification problem
Input: 224x224x1 (L)
Model output: 64x64x313
Pixel values: annealed mean of 313
colors

Colorizing Legacy Photos
154

GENERATIVE MODELS
155

Generative Adversarial Networks
156

Truth vs. Generated Samples
162
https://metacademy.org/roadmaps/rgrosse/deep_learning

Source of images: https://zhuanlan.zhihu.com/p/24767059
DCGAN: https://github.com/carpedm20/DCGAN-tensorflow
Anime Girl Face Generation

100 rounds

1000 rounds

2000 rounds

5000 rounds

50,000 rounds

BEGAN (Boundary Equilibrium GAN)
Berthelot, David,Tom Schumm, and Luke Metz. "Began: Boundary equilibrium generative
adversarial networks." arXiv preprint arXiv:1703.10717(2017).
https://github.com/carpedm20/BEGAN-tensorflow

NN
Generator
v1
Discri-
minator
v1
NN
Generator
v2
Discri-
minator
v2
NN
Generator
v3
Discri-
minator
v3
Real poems: 床前明月光，疑似地上霜，舉頭望明月，低頭思故鄉。
哈哈哈哈哈… 低頭吃便當… 春眠不覺曉…
WGAN – Poem Generation

由李仲翊同學提
供實驗結果
• 升雲白遲丹齋取，此酒新巷市入頭。黃道故海歸中後，不驚入得韻子門。
• 據口容章蕃翎翎，邦貸無遊隔將毬。外蕭曾臺遶出畧，此計推上呂天夢。
• 新來寳伎泉，手雪泓臺蓑。曾子花路魏，不謀散薦船。
• 功持牧度機邈爭，不躚官嬉牧涼散。不迎白旅今掩冬，盡蘸金祇可停。
• 玉十洪沄爭春風，溪子風佛挺橫鞋。盤盤稅焰先花齋，誰過飄鶴一丞幢。
• 海人依野庇，為阻例沉迴。座花不佐樹，弟闌十名儂。
• 入維當興日世瀕，不評皺。頭醉空其杯，駸園凋送頭。
• 鉢笙動春枝，寶叅潔長知。官爲宻爛去，絆粒薛一靜。
• 吾涼腕不楚，縱先待旅知。楚人縱酒待，一蔓飄聖猜。
• 折幕故癘應韻子，徑頭霜瓊老徑徑。尚錯春鏘熊悽梅，去吹依能九將香。
• 通可矯目鷃須浄，丹迤挈花一抵嫖。外子當目中前醒，迎日幽筆鈎弧前。
• 庭愛四樹人庭好，無衣服仍繡秋州。更怯風流欲鴂雲，帛陽舊據畆婷儻。
Randomly generated
WGAN – Poem Generation

Conditional GAN –
Text to Image
"red flower with
black center"

Sheng-Wei Chen / AI Now: A Data Science Perspective
AI 自動生成二次元妹子？
或將替代插畫師部分工作
175
http://bangqu.com/b4U76M.htmlhttp://make.girls.moe/#/

Image-to-image Translation
Phillip Isola, Jun-Yan Zhu,Tinghui Zhou, Alexei A. Efros, “Image-to-Image
Translation with Conditional Adversarial Networks”, arXiv preprint, 2016

Interactive Image Translation with
pix2pix-tensorflow
https://affinelayer.com/pixsrv/

Image to Image Translation: CycleGAN
179

Image to Image Translation: CycleGAN

CycleGAN
182

Horse <-> Zibra
183

Outcome
189

Video Reenactment
191
https://www.youtube.com/watch?v=ohmajJTcpNk
https://www.youtube.com/watch?v=9Yq67CjDqvw

Types of Machine Learning Methods
192
Machine
Learning
Supervised UnsupervisedReinforcement
Task driven
(Regression /
Classification)
Data driven
(Clustering,
Dimension Reduction)
Learning by reacting
to feedback

Why Supervised Learning is Not Enough
193
https://www.reddit.com/r/MachineLearning/comments/2lmo0l/ama_geoffrey_hinton/
The brain has about 1014 synapses and we only live for about
109 seconds. So we have a lot more parameters than data.
This motivates the idea that we must do a lot of
unsupervised learning since the perceptual input (including
proprioception) is the only place we can get 105 dimensions
of constraint per second.
-- Geoffrey Hinton

Learning to play Go
Supervised v.s. Reinforcement
194(Slide Credit: Hung-Yi Lee)

Approaches To Reinforcement Learning
Policy-based RL
Search directly for the optimal policy
This is the policy achieving maximum future reward
Value-based RL
Estimate the optimal value function
This is the maximum value achievable under any policy
Model-based RL
Build a transition model of the environment
Plan (e.g. by lookahead) using model
Of course you can combine any of the above
196

Typical Applications of RL
Play games: Atari, poker, Go, ...
Explore worlds: 3D worlds, Labyrinth, ...
Control physical systems: manipulate, walk, swim, ...
Interact with users: recommend, optimize, personalize,
...
197
(Slide credit: David Silver)

DeepMind A3C
199
https://youtu.be/0xo1Ldx3L5Q?t=22
SL: Mimic excellent drivers
RL: Learn from failures

NVidia Self Driving Car, 2016

More RL Applications
Flying Helicopter
Driving
Google Cuts Its Giant Electricity Bill With DeepMind-
Powered AI
Parameter tuning in manufacturing lines
Text generation
Hongyu Guo, “Generating Text with Deep Reinforcement
Learning”, NIPS, 2015
Marc'AurelioRanzato,SumitChopra,Michael Auli,Wojciech
Zaremba, “Sequence Level Training with Recurrent Neural
Networks”, ICLR, 2016
201(Slide Credit: Hung-Yi Lee)

Common
Myths

MYTH 1.
DATA SCIENCE =
BIG DATA
204

Data Science vs. Big Data
Data Science is a superset of Big Data.
However, the rise of Big Data draws people’s
attention to Data Science.
205
Data
Science
Big Data
Machine
Learning
Data
Mining
Deep
Learning

MYTH 2.
AI IS INDEPENDENT FROM
DATA SCIENCE / BIG DATA
208

Data vs. Machine learning vs. AI
Data: records of experience
Machine learning: “A type of algorithms
that gives computers the ability to learn
from data, rather than being explicitly
programmed."
Artificial intelligence
Turing test
209

AI vs. Data Science
211

MYTH 3.
DATA VISUALIZATION =
DATA ANALYTICS
213

Data Visualization vs. Data Analysis
214
Visualization is the act or process of interpreting in visual
terms or of putting into visible form.
Analysis indicates a careful study of something to learn
about its parts, what they do, and how they are related to
each other.
—Merriam-Webster's Dictionary

215
(來源: 哈佛商業評論 2017 五月號)

MYTH 4.
DATA ANALYTICS
CAN BE SIMPLY DONE WITH
XXX PLATFORM
217

陳昇瑋 / 從大數據走向人工智慧 218(Slide Credit: Jyun-Yu Jiang)
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics

MYTH 5.
IT TEAM IS RESPONSIBLE
FOR
DATA SCIENCE/ANALYTICS
220

AI 能做什麼
258
Smart R&D and
forecasting
Project
Optimized
production with
lower cost and
higher efficiency
Produce
Products and
services at the
right price, time,
and targets
Promote
Enriched and
tailored user
experience
Provide

零售業
預測市場對於特定產品類型的需求
自動化與供應商的價格協商
自動化的倉儲管理
最佳化的商品管理及動線、擺設設計
最佳化訂價
個人化行銷
個人化購物提醒
即時的 (虛擬) 客服
259
Provide
Project
Promote
Produce

製造業
預測市場對於特定產品類型的需求
原料採購程序的自動化
生產設備參數的自動調校
生產設備的排程
預設新市場的開發方向
貨物送交的路線及時間排程
退貨 / 抱怨的資料分析，回饋到生產階段的改善
261
Provide
Project
Promote
Produce

健康／醫療產業
國民健康狀況及特定疾病／傳染病的預測
因預施而實施的預防性措拖，減少發病／就診率
降低醫護人員的工作量
更全面地監測高風險族群
(醫院或居家) 狀況
個人化行銷提升健康意識
生活中的健康飲食提醒
個人化醫學／治療
個人化即時身體狀況檢查
262
Provide
Project
Promote
Produce

金融業
金融市場預測
更精準的放款／借款審核及信用卡盜刷偵測
更好的金融商品組合
更切中需求的個人金融商品 (e.g., 信用卡)
行銷策略設計
個人化行銷
AI 理財 / 客服專員
精準的 churn prediction
263
Provide
Project
Promote
Produce

AI 導入進程
264
成功
案例
開放
文化
資料
生態
技術
工具
流程
整合
先以成功案例
來創造價值及
營造信心
打破部門間的
隔閡，讓資料
可以分享及集
中處理
選擇正確的人
工智慧技術工
具，建立團隊
或尋找技術伙
伴
讓人工智慧技
術成為工作流
程的一部分
人與人工智慧
協同增進生產
力

AI IN MANUFACTURING
321

人工智慧製造在台灣
322
台灣的
絕佳良機
最強的製造 Know-How 源源不絕的第一手
生產數據
每間工廠都有許多的設備
及感測器，源源不絕產出
私有資料，是國外人工智
慧大廠不會擁有的
影響層面廣
2015 年製造業:
產值: 17 兆元
平均就業人口佔 30%
可最佳化的空間多
從接單、備料、生產到庫存
及出貨，每一個環節，都有
AI 輔助進行最佳化的空間
• 自動視覺化缺陷檢測
• 自動參數調控達成良率
最佳化
• 設備生產助理AI
Associate (AA)

人工智慧發展策略建議
人工智慧製造
台灣的絕佳時機
最強的製造 know-how
第一手的，絕無僅有的獨特資料，而且源源不絕
從接單、備料、生產到庫存及出貨，每一個環節，都
有 AI 輔助進行最佳化的空間
自動視覺化缺陷檢測
自動參數調控達成良率最佳化
AI Associate (AA): 每個設備都有自己的 AI 助理，隨時監測
硬體狀態調整參數、登記檢修，以達節能、降低損壞及隨
時維護產品品質
324

共通挑戰－視覺檢測 (1/4)
視覺檢測：幾乎每家公司都得配置大量人力檢測產品
外觀有無瑕疵，而且同一件產品在生產流程中的不同
階段可能要經過多次的視覺檢測。
326

An Example of Industry-wide Problem:
Human Operators for Optical Inspection
A sad story that AI-assisted AOI can help avoid
“流水線工作是流動人口最容易獲得的工作，像 … 這樣的工廠只招29周歲
以下的工人，工廠就像割韭菜，一茬一茬地收割這些流動勞動力的青春。"
29-year old or younger wanted
“她在流水線上的工作是檢查產品是否有劃痕，一整天機械重複同一個簡單
枯燥的動作兩千八百八十次"
This means 4 times
per minute assuming
12 working hours per day
https://theinitium.com/article/20170802-mainland-Foxconn-factorygirl/

人工智慧發展策略建議 330

電路板上常見的焊錫缺點
短路
空焊
極反
缺件
浮高
跪腳
撞件
錫球
墓碑
…
331
https://www.researchmfg.com/2011/02/soldering-defect-symptom/

核電廠反應槽內壁毀損偵測
333https://phys.org/news/2017-02-automatically-nuclear-power.html

實際案例 – 視覺檢測速度比較 (廠商 A)
334
產線數量: 23 條
4 位目檢人員
AOI 設備每小時影像輸出量: 配合人力允許條件, 60 萬張/每日
(極限為每條產線 2 萬張/小時 = 1104 萬/日)
判定耗時: 30 萬張 / 人日 = 120 萬張/日
傳統
人力
目檢
深度
學習
系統
硬體設備: 中高階桌上型電腦 + NVIDIA GPU: 10 ~ 15 萬
軟體: 開源軟體 + 高度調校之深度學習模型
判斷速度: 1 萬張影像／60 秒 = 166.67 張 / sec
每日輸出可達 1440 萬張影像

實際案例 – 視覺檢測效益評估 (廠商 B)
335
品質：根據複判初步統計，目檢人員漏網率至少為 12.9%
速度：目檢人員 8~10 位，每天約可檢查共約 3,000,000 張
傳統
人力
目檢
深度
學習
系統
硬體設備：中高階桌上型電腦 + NVIDIA GPU: 10 ~ 15 萬
軟體：開源軟體 + 高度調校之深度學習模型
品質：模型漏網率控制在 1% 之下，目檢人員只需檢查原本
總數之 10% 的圖片
速度：8,640,000 張 / 天 = 100 張 / 秒

共通挑戰－參數調控 (2/4)
生產設備都會需要參數來控制，例如溫度、溼度、馬
達轉速、風量等等，多的可達上百甚至上千個參數，
這些參數會影響產品品質、良率、成本及生產速度，
但因為維度過高，所以無法單純以人力／經驗來做最
佳化，通常是以老師傅的個人經驗為準，或是出了問
題再來調整。
336

PCB 鹼性蝕刻良率問題
蝕刻速率降低
蝕刻液出現沉澱
金屬抗蝕鍍層被浸蝕
銅表面發黑，蝕刻不動
基板表面有殘銅
基板兩面蝕刻效果差異明顯
板面蝕刻不均使部分還有留有殘銅
蝕刻後發現導線嚴重的側蝕
輸送帶上前進的基板呈現斜走現象
線路蝕銅未徹底，部分邊緣留有殘銅
兩面蝕刻效果不同步
鹼性蝕刻液過度結晶
光致抗蝕劑脫落（幹膜或油墨）
蝕刻過度導線變細
蝕刻不足，殘足太大
1. 檢查銅層厚度與蝕刻機傳送速度之間的關係
，通過工藝試驗法找出最佳操作條件。
2. 檢測蝕刻液的 PH 值，當該值低於 80 時即需
採取提高的方法，如添加氨水或加速子液的
補充與降低抽風等。
3. 檢測蝕刻液的比重值，並加較多子液以降低
比重值至工藝規定範圍。
4. 檢查子液補給系統是否失靈。
5. 檢查加熱器的功能是否有異常。
6. 檢查噴淋壓力，應調整到最隹狀態。
7. 備液槽中水位太低，造成泵空轉，檢查液位
控制、補充、與排放泵的操作程序。
337https://tw.wxwenku.com/d/100127553
1. 調整 PH 值到達規定值或適當降低抽風量。
2. 適當降低抽風量執行。
3. 排放出部分比重高的溶液經分析後補加氯化銨
和氨的水溶液，使蝕刻液的比重調整到工藝容
許的範圍。

共通挑戰－預測性維護 (3/4)
有些設備的失敗成本 (failure cost) 很高 (例如工具機的轉軸)，
有些耗材本身的成本很高 (例如精密切割機的刀片)，這些設備
與耗材最好能夠精確預測與控制，不能在設備在運轉中損壞，
也不能讓耗材太早或太遲汱換，都會造成巨大的損失及產品品
質／產出率的降低。
338

共通挑戰－原料的組合 (4/4)
有些產品的生產流程中，需要以多種原料做排列組合。例如在
紡織業中，要針對某個布料調出某個指定的顏色，且讓此顏色
有指定的固色性及演色性，是一個無論用程式模擬或是對老師
傅都是困難的問題，而且配方的嘗試每次需要幾天，因此我們
所訪問過的每家紡織業者都極需更好的方法來自動決定染料配
方。
339

人工智慧發展策略建議 340

AI IN MEDICINE
352

Deep Learning for Detection of Diabetic
Eye Disease (2016)
355
https://research.googleblog.com/2016/11/deep-
learning-for-detection-of-diabetic.html
128,000 images
Each image was evaluated by 3-7 ophthalmologists from
a panel of 54
A separate validation set of 12,000 images

Deep Learning for Detection of Diabetic
Eye Disease (2016)
356
Algorithm’s F1-score: 0.95
Median F1-score of 8 ophthalmologists : 0.91

1007 posteroanterior chest radiographs
AUC = 0.99 when AlexNet and GoogLeNet are combined using
ensemble.

arxiv.org/abs/1703.02442
Tumor localization score (FROC):
model: 0.89
pathologist: 0.73

Cardiologist-Level Arrhythmia Detection
with Convolutional Neural Networks
360
Goal: diagnose irregular heart rhythms, also known as
arrhythmias, from single-lead ECG signals better than a
cardiologist

Input and Output
Sequence-to-sequence
Input: a time-series of raw ECG signal
The 30 second long ECG signal is sampled at 200 Hz
Output: a sequence of rhythm classes
The model outputs a new prediction once every second
Total 14 rhythm classes are identified
361

Dataset
Training dataset
Collect a dataset of 64,121 ECG records from 29,163 patients
Each ECG record is 30 seconds long and sampled at 200 Hz
Annotations are done by a group of Certified Cardiographic
Technicians
Testing dataset
336 records from 328 unique patients
Annotations are obtained by a committee of three board-
certified cardiologists
363

Model
34 layers NN
16 residual blocks
2 conv layers per block
Filter length = 16 samples
# filter = 64*k, k start from 1 and is
incremented every 4-th residual block
Every residual block subsamples
its input by a factor of 2
364

Results – F1 score
365

Results – Confusion Matrix
366

Recurrent Neural Network (RNN)
1x 2x
2y1y
1a 2a
Memory can be considered
as another input.
The output of hidden layer
are stored in the memory.
store

Memory
Cell
Long Short-term Memory (LSTM)
Input Gate
Output Gate
Signal control
the input gate
Signal control
the output gate
Forget
Gate
Signal control
the forget gate
Other part of the network
Other part of the network
(Other part of
the network)
(Other part of
the network)
(Other part of
the network)
LSTM
Special Neuron:
4 inputs,
1 output

𝑧𝑧
𝑧𝑧𝑖𝑖
𝑧𝑧𝑓𝑓
𝑧𝑧𝑜𝑜
𝑔𝑔 𝑧𝑧
𝑓𝑓 𝑧𝑧𝑖𝑖
multiply
multiply
Activation function f is
usually a sigmoid function
Between 0 and 1
Mimic open and close gate
c
𝑐𝑐′
= 𝑔𝑔 𝑧𝑧 𝑓𝑓 𝑧𝑧𝑖𝑖 + 𝑐𝑐𝑐𝑐 𝑧𝑧𝑓𝑓
ℎ 𝑐𝑐′𝑓𝑓 𝑧𝑧𝑜𝑜
𝑎𝑎 = ℎ 𝑐𝑐′
𝑓𝑓 𝑧𝑧𝑜𝑜
𝑔𝑔 𝑧𝑧 𝑓𝑓 𝑧𝑧𝑖𝑖
𝑐𝑐′
𝑓𝑓 𝑧𝑧𝑓𝑓
𝑐𝑐𝑐𝑐 𝑧𝑧𝑓𝑓
𝑐𝑐

Multiple-layer
LSTM
This is quite
standard now.
https://img.komicolle.org/2015-09-20/src/14426967627131.gif
Don’t worry if you cannot understand this.
Keras can handle it.
Keras supports
“LSTM”, “GRU”, “SimpleRNN” layers

原文網址

AI vs. BI
AI systems make decisions for users
BI systems help users make the right decisions,
based on the available data
Key difference
AI systems are based on generalized models
BI systems require humans to generalize
Best practice: AI + BI work together
393

Google Search Frequency on
“Big Data” and “Data Science”
Data Science
Big Data

Google Search Frequency on
“Machine Learning” and “Deep Learning”
Deep Learning
Machine Learning

如同精靈寶可夢需要有訓練師才能發揮能力
，擁有大數據後，我們也需要很多很多的機
器學習專家（人呼為 AI 訓練師），才能讓我
們手中的大數據真正發揮價值。

資料科學及人工智慧
推廣在台灣

http://twconf.data-sci.org/
https://www.facebook.com/twdsconf
2 days x 800 ppl
8/30 – 8/31, 2014

Data Scientists x 17

4 days x 1300 ppl
8/20 – 8/23, 2015

Data Scientists x 24

台灣人工智慧年會系列活動

結合人工智慧及資料科學的盛大年會

帳篷照片
未雨綢繆，為四天活動特別撘上能容納千人的帳篷，並連接人文館及
活動中心，方便與會者兩個場地來往。

帳篷照片
除了擋雨外，帳篷亦在大晴天下為與會者遮陽

投影片線上閱讀／下載
443
http://www.iis.sinica.edu.tw/~swc/talk/from_data_science_to_ai.html

台灣發展人工智慧
的挑戰
 產學之間的鴻溝
 人才缺乏
 習慣外來技術的導入

 三個月密集培訓 AI 技術種子
 讓「找不到人才」不再成為障礙
 建立「自己的問題自己解決」的
文化
http://aiacademy.tw/

 學校後專業人才 + 人工智慧
 與學界連結，第一線研究者任導
師，訓練找題／解題的能力
 與產業連結，實際問題做中學
http://aiacademy.tw/

孔祥重院士
• 美國卡內基美隆大學電腦教授
• 美國哈佛大學資訊科技與管理博士學程共同主席
• 行政院 SRB 會議海外專家與科技顧問
• 行政院科技顧問
• 國家級計畫重要推手
• 數位台灣計畫 (e-Taiwan)
• 行動台灣計畫 (M-Taiwan)
• 電信國家型計畫
• WiMAX 發展藍圖
• 網路通訊國家型計畫
H.T. Kung
• 中央研究院院士
• 美國哈佛大學電腦與電機系蓋茲講座教授
現任
經歷

Using deep learning
Seen from outside Inside

技術領袖培訓班
技術領袖培訓班期為台灣育成具有人工智慧技術思維
與實戰經驗的技術領袖人才，讓產業欲以人工智慧進
行產業升級時，不再為缺乏人才所束縛。
我們將以為期十二周，每周五天，朝九晚六的全日班
形式，進行人工智慧技術人才的密集培訓。讓不同專
業領域的學員都能如虎添翼，以人工智慧加上原本的
領域知識，具備協助各企業解決問題，以及帶領人工
智慧團隊的能力。他們將擔任技術種子角色，深入台
灣產業的各個層面，以其本身專業及人工智慧技術協
助企業解決在邁向智慧化的過程中所面臨的難題。
452

http://aiacademy.tw/class-overview/
技術領袖培訓班－重要時程

經理人週末研修班簡介
我們將以為期十二周的時間，每周六的全天班形式進行，給予
所有經理人學員全面的人工智慧概觀。
我們預期透過經理人週末研修班，讓各產業的經理人能有一個
場域，與不同產業的經理人一同學習及理解人工智慧的技術概
觀，對於人工智慧技術能有全面性的大局觀，而且徹底瞭解人
工智慧是如何運作的，以及它的能力、侷限及未來發展，才能
讓產業欲以人工智慧進行產業升級時，能有清楚的方向感來帶
領企業前進。
454
http://aiacademy.tw/mgr-class-overview/

經理人週末研修班 (due Jan 8)
455
人工智慧概觀
統計與資料分析
機器學習與演算法概論
深度學習入門
手把手機器學習及深度學習
電腦視覺
文字探勘與自然語言處理
聊天機器人
推薦系統
語音與音樂訊號處理
人工智慧的產業案例分析
期末考與期末報告

456
企業夥伴計畫
以人工智慧技術解決實務問題
與學員的直接互動
優秀畢業生招募機會

http://aiacademy.tw/corporate-partner/
台灣人工智慧學校夥伴計畫

讓台灣在全球人工智慧技術的快速發展
洪流中不落人後，能佔有一席之地。

擁抱資料及人工智慧，
千萬不要錯過這波良好契機
陳昇瑋
台灣人工智慧學校

從大數據走向人工智慧

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 從大數據走向人工智慧

Similar to 從大數據走向人工智慧 (20)

More from Sheng-Wei (Kuan-Ta) Chen

More from Sheng-Wei (Kuan-Ta) Chen (17)

從大數據走向人工智慧