SlideShare a Scribd company logo
1 of 74
Download to read offline
%ìÝ(Deep Learning) 
í¬@ ¬, ø¬à ôtYX © 
@Ä- 
´íY 
September 10, 2014 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 1 / 74
What is Deep Learning? 
Contents 
1 What is Deep Learning? 
2 History 
Perceptron 
Multilayer Perceptron 
1st Breakthrough: Unsupervised Learning 
2nd Breakthrough: Supervised Learning 
3 Apply to Public Health 
Epidemiology vs Machine Learning 
Deep Learning vs Other ML 
Hypothesis Testing vs Hypothesis Generating 
4 Conclusion 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 2 / 74
What is Deep Learning? 
Machine Learning 
ôè0 YµXì !`  ˆÄ] !¨(prediction)D 
X” xõÀ¥X  „|. 
Computer science + Statistics ?? 
Amazon, Google, Facebook.. 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 3 / 74
What is Deep Learning? 
Neural Network 
Human brain VS Computer 
3431  3324 =?? 
@ à‘t lÄ, L1xÝ, 8xÝ 
Sequential VS Parallel 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 4 / 74
What is Deep Learning? 
Neuron  Arti
cial Neural Network(ANN)[19] 
Figure. (A) Human neuron; (B) arti
cial neuron or hidden unity; (C) biological 
synapse; (D) ANN synapses. 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 5 / 74
What is Deep Learning? 
http://www.nd.com/welcome/whatisnn.htm 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 6 / 74
What is Deep Learning? 
Deep Neural Network(DNN) ' Deep Learning 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 7 / 74
What is Deep Learning? 
Œ IT0Å `0ÄYµ' Ñ http://www.dt.co.kr/contents. 
html?article_no=2014062002010960718002 
8Ä” À xõÀ¥ ô 6pìì è$X m@ `]' 
http://vip.mk.co.kr/news/view/21/20/1178659.html 
MS t|°Ü, `8àìÝ' tÄä 
http://www.bloter.net/archives/196341 
 $t” 5 ü” 0 ü `%ìÝ' 
http://www.wikitree.co.kr/main/news_view.php?id=157174 
xõÀ¥ Ü lX èt¼ ¸ http://weekly.chosun. 
com/client/news/viw.asp?nNewsNumb=002311100009ctcd=C02 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 8 / 74
History 
Contents 
1 What is Deep Learning? 
2 History 
Perceptron 
Multilayer Perceptron 
1st Breakthrough: Unsupervised Learning 
2nd Breakthrough: Supervised Learning 
3 Apply to Public Health 
Epidemiology vs Machine Learning 
Deep Learning vs Other ML 
Hypothesis Testing vs Hypothesis Generating 
4 Conclusion 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 9 / 74
History Perceptron 
Perceptron 
1958D Rosenblatt[23]. 
y = '( 
Xn 
i=1 
wi xi + b) (1) 
(b: bias, ': activation function(e.g: logistic or tanh)) 
Figure. Concept of Perceptron[Honkela] 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 10 / 74
History Perceptron 
Low Performance 
XORÄ t°XÀ »ä[Hinton]. 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 11 / 74
History Multilayer Perceptron 
Multilayer Perceptron 
Hidden layer| ˜¬t t°ä!! 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 12 / 74
History Multilayer Perceptron 
Learing Problem 
Hidden layer ! Weight / .. 
1985D: Error Backpropagation Algorithm[24] 
Gradient Descent Methods 
¤Ð€0 p¸.. 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 13 / 74
History Multilayer Perceptron 
Gradient Descent Methods 
Weight / 4 Îä.. 
Linear regression: Least square, maximum likelihood: Exact 
calculation. 
MLP: No exact method 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 14 / 74
History Multilayer Perceptron 
Gradient Descent Algorithm[Han-Hsing] 
(a) Large Gradient (b) Small Gradient 
(c) Small Learning Rate (d) Large Learning Rate 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 15 / 74
History Multilayer Perceptron 
Example[Hinton] 
A toy example to illustrate the iterative method 
• Each day you get lunch at the cafeteria. 
– Your diet consists of fish, chips, and ketchup. 
– You get several portions of each. 
• The cashier only tells you the total price of the meal 
– After several days, you should be able to figure out the price of 
each portion. 
• The iterative approach: Start with random guesses for the prices and 
then adjust them to get a better fit to the observed prices of whole 
meals. 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 16 / 74
History Multilayer Perceptron 
Solving the equations iteratively 
• Each meal price gives a linear constraint on the prices of the 
portions: 
price = x fishwfish + xchipswchips + xketchupwketchup 
• The prices of the portions are like the weights in of a linear neuron. 
w = (wfish ,wchips ,wketchup ) 
• We will start with guesses for the weights and then adjust the 
guesses slightly to give a better fit to the prices given by the cashier. 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 17 / 74
History Multilayer Perceptron 
The true weights used by the cashier 
Price of meal = 850 = target 
150 50 100 
2 5 3 
portions 
of fish 
portions 
of chips 
portions of 
ketchup 
linear 
neuron 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 18 / 74
History Multilayer Perceptron 
A model of the cashier with arbitrary initial weights 
• Residual error = 350 
• The “delta-rule” for learning is: 
Δwi =ε xi (t − y) 
• With a learning rate of 1/35, 
the weight changes are 
+20, +50, +30 
• This gives new weights of 
70, 100, 80. 
– Notice that the weight for 
chips got worse! 
price of meal = 500 
50 50 50 
2 5 3 
portions 
of fish 
portions 
of chips 
portions of 
ketchup 
ε 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 19 / 74
History Multilayer Perceptron 
Deriving the delta rule 
• Define the error as the squared 
residuals summed over all 
training cases: 
• Now differentiate to get error 
derivatives for weights 
• The batch delta rule changes 
the weights in proportion to 
their error derivatives summed 
over all training cases 
E = 1 
2 
Σ (tn 
− yn )2 
n∈training 
∂E 
∂wi 
= 1 
2 
∂yn 
∂wi 
dEn 
dyn 
Σ 
n 
Σ n 
(tn − yn ) 
= − xi 
n 
∂E 
∂wi 
Δwi = −ε 
Σ n 
(tn − yn ) 
= ε xi 
n 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 20 / 74
History Multilayer Perceptron 
Backpropagation Algorithm[Kim] 
(e) Forward Propagation (f) Back Propagation 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 21 / 74
History Multilayer Perceptron 
Limitations of MLP[Kim] 
1 Vanishing gradient problem 
2 Typically requires lots of labeled data 
3 Over
tting problem: Given limited amounts of labeled data, training 
via back-propagation does not work well 
4 Get stuck in local minima (?) 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 22 / 74
History Multilayer Perceptron 
Vanishing Gradient[2] 
Figure. Sigmoid functions 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 23 / 74
History Multilayer Perceptron 
Local Minima[Kim] 
Figure. Global and Local Minima 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 24 / 74
History 1st Breakthrough: Unsupervised Learning 
1st Breakthrough: Unsupervised Learning 
2006D Restricted Boltzmann Machine, Deep Belief Network, Deep 
Boltzmann Machine[25, 13].. 
Figure. Description of Unsupervised Learning[Kim] 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 25 / 74
History 1st Breakthrough: Unsupervised Learning 
Limitations of MLP[Kim] 
1 Vanishing gradient problem 
Solved by bottom-up layerwise unsupervised pre-training 
2 Typically requires lots of labeled data 
3 Over
tting problem: Given limited amounts of labeled data, training 
via back-propagation does not work well 
Solved by using lots of unlabeled data 
4 Get stuck in local minima (?) 
Unsupervised pre-training may help the network initialize with good 
parameters 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 26 / 74
History 1st Breakthrough: Unsupervised Learning 
Restricted Boltzmann Machine(RBM) 
ÐÀ ®D] U`t ’ä 
P(v; h) = 
1 
Z 
expE(v;h) 
(Z: Normalized Constant) 
Figure. Diagram of a Restricted Boltzmann[Wikipedia] 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 27 / 74
History 1st Breakthrough: Unsupervised Learning 
Energy Function 
E(v; h) =  
X 
i 
ai vi  
X 
j 
bjhj  
X 
i 
X 
j 
hjwi ;jvi = aTv  bTh  hTWv 
(ai : oset of visible variable, bj : oset of hidden variable, wi ;j : weight 
between vi and hj ) 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 28 / 74
History 1st Breakthrough: Unsupervised Learning 
© 
P(v) = 
P 
h P(v; h)| T X” v@ øLX weightäD lX” ƒ. 
E(v; h) =  
X 
i 
ai vi  
X 
j 
bjhj  
X 
i 
X 
j 
hjwi ;jvi = aTv  bTh  hTWv 
‰, h, v ÙÜÐ Ä ½X weight| lŒX$” XÄ 
t 1T” ÜŤ(synapse)” ð°ä. 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 29 / 74
History 1st Breakthrough: Unsupervised Learning 
Hebb's Law (Hebbian Learning Rule) 
http://www.skewsme.com/behavior.htm 
http://lesswrong.com/lw/71x/a_crash_course_in_the_ 
neuroscience_of_human/l 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 30 / 74
History 1st Breakthrough: Unsupervised Learning 
Traing RBM 
P(v) = 
P 
h P(v; h)| T X” v@ øLX weightäD lX” ƒ. 
Gradient Ascent 
X 
logP(v) = log ( 
h 
expE(v;h) 
Z 
) 
X 
= log ( 
h 
expE(v;h))  logZ 
X 
= log ( 
h 
expE(v;h))  log ( 
X 
v;h 
expE(v;h)) 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 31 / 74
History 1st Breakthrough: Unsupervised Learning 
@logP(v) 
@ 
=  
1 
P 
h expE(v;h) 
X 
h 
expE(v;h) @E(v; h) 
@ 
+ 
1 
P 
v;h expE(v;h) 
X 
v;h 
expE(v;h) @E(v; h) 
@ 
=  
X 
h 
p(hjv) 
@E(v; h) 
@ 
+ 
X 
v;h 
p(h; v) 
@E(v; h) 
@ 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 32 / 74
History 1st Breakthrough: Unsupervised Learning 
P(vjh) = 
Ym 
i=1 
P(vi jh) 
P(hjv) = 
Yn 
j=1 
P(hj jv) 
p(hj = 1jv) =  
  
bj + 
Xm 
i=1 
wi ;jvi 
! 
p(vi = 1jh) =  
0 
@ai + 
Xn 
j=1 
wi ;jhj 
1 
A 
(: activation function) 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 33 / 74
History 1st Breakthrough: Unsupervised Learning 
@logP(v) 
@ 
=  
X 
h 
p(hjv) 
@E(v; h) 
@ 
+ 
X 
v;h 
p(h; v) 
@E(v; h) 
@ 
À Gibbs sampler SamplingXì t° 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 34 / 74
History 1st Breakthrough: Unsupervised Learning 
Figure. Contrastive Divergence(CD-k)[7] 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 35 / 74
History 1st Breakthrough: Unsupervised Learning 
Deep Belief Network[11, 12, 1] 
1 Multiple RBM 
2 Phoneme ! Word ! Grammer, Sentence 
3 GenerationÄ ¥!!! 
http://www.cs.toronto.edu/~hinton/adi/index.htm 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 36 / 74
History 2nd Breakthrough: Supervised Learning 
2nd Breakthrough: Supervised Learning 
1 Vanishing gradient problem 
Solved by a new non-linear activation :recti
ed linear unit (ReLU) 
2 Typically requires lots of labeled data 
Solved by big data  crowd sourcing 
3 Over
tting problem: Given limited amounts of labeled data, training 
via back-propagation does not work well 
Solved by a new regularization method : dropout, dropconnect, etc 
4 Get stuck in local minima (?) 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 37 / 74
History 2nd Breakthrough: Supervised Learning 
Recti
ed Linear Unit (ReLU) 
Figure. The proposed non-linearity, ReLU, and the standard neural 
network non-linearity, logistic[30] 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 38 / 74
History 2nd Breakthrough: Supervised Learning 
¥ 
1 0ôäÌ lt mÁ 0¸0 1 |t 0¸0 ŒX” ½° 
Æä. 
2 Yµt }ä. 
3 Pre-trainingX D”1D Æ`ä.[20, 8]. 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 39 / 74
History 2nd Breakthrough: Supervised Learning 
DropOut  DropConnect 
Ensemble Model 
DropOut: hidden unitX |€| lŒ ä[14]. 
DropConnect: hidden unitX ð°  |€| lŒ ä[28]. 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 40 / 74
History 2nd Breakthrough: Supervised Learning 
Figure. Description of DropOut  DropConnect[Wan] 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 41 / 74
History 2nd Breakthrough: Supervised Learning 
Figure. Using the MNIST dataset, in a) Ability of Dropout and 
DropConnect to prevent over
tting as the size of the 2 fully connected 
layers increase. b) Varying the drop-rate in a 400-400 network shows near 
optimal performance around the p = 0.5[28] 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 42 / 74
History 2nd Breakthrough: Supervised Learning 
Local Minima Issue 
High dimension and non-convex optimization 
1 Local minimaäX t D·D·` ƒ 
2 Local minima ' Global minima. 
3 Î@ (ÐÐ (ÐÈä local minimat0” ´5ä. 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 43 / 74
History 2nd Breakthrough: Supervised Learning 
Local minima are all similar, there are long 
plateaus, it can take long to break symmetries. 
Optimization is not the real problem when: 
– dataset is large 
– unit do not saturate too much 
– normalization layer 
31 
ConvNets: today 
Loss 
parameter 
Figure. Local minima when high dimension and non-convex optimization 
[Ranzato] 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 44 / 74
History 2nd Breakthrough: Supervised Learning 
Others: Convolutional Neural Network 
Sparse Connectivity  Shared Weight: 2(Ð pt0Ð 
i[documentation] 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 45 / 74
History 2nd Breakthrough: Supervised Learning 
http://parse.ele.tue.nl/education/cluster0 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 46 / 74
History 2nd Breakthrough: Supervised Learning 
http://eblearn.sourceforge.net/old/demos/mnist/index.shtml 
http://yann.lecun.com/exdb/lenet/ 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 47 / 74
History 2nd Breakthrough: Supervised Learning 
Deep Learning Summary!!! 
1 1950D |I¸`(perceptron)Ð Ü‘ xõà½Ý ðl” 1980D 
$Xí
Lଘ(Error Backpropagation Algorithm) ä5|I¸` 
(Multilayer perceptron)D Yµ`  ˆŒ t . 
2 Gradient vanishing, labeled dataX €q, over
tting, local minima issue ñt ˜ 
t°À »t 2000D LÀ xõà½Ý ðl” õôÁÜ. 
3 2006D€0 ü Ì8àD t© Unsupervised Learningx Restricted Boltzmann 
Machine(RBM), Deep Belief Network(DBN), Deep Boltzmann Machine(DBM), 
Convolutional Deep Belief Network ñt . 
4 Unlabeled data| t©Xì pre-trainingD ‰`  ˆŒ ´ Ð ¸	 
ä5|I¸`X Ät ùõ(. 
5 2010D€0” Ept0| ù t©h Î@ labeled data| ¬©` 
 ˆŒ Èà, Recti
ed linear unit (ReLU), DropOut, DropConnect ñX 
¬ vanishing gradient8@ over
tting issue| t°Xì D Supervised 
learningt ¥. 
6 Local minima issue” High dimension non-convex optimizationД Ä ” 
€„t DÈ|” õ. 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 48 / 74
Apply to Public Health 
Contents 
1 What is Deep Learning? 
2 History 
Perceptron 
Multilayer Perceptron 
1st Breakthrough: Unsupervised Learning 
2nd Breakthrough: Supervised Learning 
3 Apply to Public Health 
Epidemiology vs Machine Learning 
Deep Learning vs Other ML 
Hypothesis Testing vs Hypothesis Generating 
4 Conclusion 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 49 / 74
Apply to Public Health Epidemiology vs Machine Learning 
Objective of statistics 
1 ÀÝX U¥, Causal inference 
µÄY Pearson: äX ÄT` …D Xì.. 
2 X¬° 
µÄY R.A Fisher: ¥ 1¥t ‹@ DÌ Ý 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 50 / 74
Apply to Public Health Epidemiology vs Machine Learning 
Statistics in Epidemiology 
Causal inference: Ðxt 4Çx? 
tt ˜” ¨t ñtä. xüÄ ”`. 
è ¨ 8. 
ŽÀX èÄ ”(Kilometer VS meter, centering issue)
, Odds Ratio(OR), Hazard Ratio(HR), p-value, AIC 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 51 / 74
Apply to Public Health Epidemiology vs Machine Learning 
Statistics in Machine Learning 
Prediction: ^ ´»Œ ƒx? 
!%t ‹@ ƒt ñtä. 
õ¡ ¨Ä ÁÆä. !Ì ¨( ˜ ät. 
D”Ð 0| ŽÀäD  ¬ ¼ä. (Scale change) 
^ Y , ^p, Cross-validation, Accuracy, ROC curve 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 52 / 74
Apply to Public Health Epidemiology vs Machine Learning 
Example: Logistic regression 
Binomial data| äè” % µÄ„)•. 
¹ˆ epidemiologic studyД x À.
! Odds Ratio(OR) : tt }ä. 
But.. 
Logit function... Ä°t ´$ÌÀ” Ðx. 
Heritability issue of binomial trait?? Logith ”x.. 
Probit modelt Ht  ˆä. 
Ä°}ä.
t ´5ä.. 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 53 / 74
Apply to Public Health Epidemiology vs Machine Learning 
Logit VS Probit 
Figure. Logit VS Probit 
Logit: Pr(Y = 1 j X) = [1 + eX0
]1 
Probit: Pr(Y = 1 j X) = (X0
) 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 54 / 74
Apply to Public Health Epidemiology vs Machine Learning 
Example2: Cox proportional hazard model 
Censored data„X . 
http: 
//www.theriac.org/DeskReference/viewDocument.php?id=188 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 55 / 74
Apply to Public Health Epidemiology vs Machine Learning 
http://www.uni-kiel.de/psychologie/rexrepos/posts/ 
survivalCoxPH.html 
@Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 56 / 74
Apply to Public Health Epidemiology vs Machine Learning 
Assumptions 
ln (t) = ln 0(t) +

More Related Content

More from Jinseob Kim

Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...Jinseob Kim
 
Fst, selection index
Fst, selection indexFst, selection index
Fst, selection indexJinseob Kim
 
Why Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So WellWhy Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So WellJinseob Kim
 
괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.Jinseob Kim
 
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...Jinseob Kim
 
가설검정의 심리학
가설검정의 심리학 가설검정의 심리학
가설검정의 심리학 Jinseob Kim
 
Win Above Replacement in Sabermetrics
Win Above Replacement in SabermetricsWin Above Replacement in Sabermetrics
Win Above Replacement in SabermetricsJinseob Kim
 
Regression Basic : MLE
Regression  Basic : MLERegression  Basic : MLE
Regression Basic : MLEJinseob Kim
 
iHS calculation in R
iHS calculation in RiHS calculation in R
iHS calculation in RJinseob Kim
 
Selection index population_genetics
Selection index population_geneticsSelection index population_genetics
Selection index population_geneticsJinseob Kim
 
질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010Jinseob Kim
 
Case-crossover study
Case-crossover studyCase-crossover study
Case-crossover studyJinseob Kim
 
Generalized Additive Model
Generalized Additive Model Generalized Additive Model
Generalized Additive Model Jinseob Kim
 
Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)Jinseob Kim
 
Machine Learning Introduction
Machine Learning IntroductionMachine Learning Introduction
Machine Learning IntroductionJinseob Kim
 
Multilevel study
Multilevel study Multilevel study
Multilevel study Jinseob Kim
 

More from Jinseob Kim (20)

Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
 
Fst, selection index
Fst, selection indexFst, selection index
Fst, selection index
 
Why Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So WellWhy Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So Well
 
괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.
 
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
 
가설검정의 심리학
가설검정의 심리학 가설검정의 심리학
가설검정의 심리학
 
Win Above Replacement in Sabermetrics
Win Above Replacement in SabermetricsWin Above Replacement in Sabermetrics
Win Above Replacement in Sabermetrics
 
Regression Basic : MLE
Regression  Basic : MLERegression  Basic : MLE
Regression Basic : MLE
 
iHS calculation in R
iHS calculation in RiHS calculation in R
iHS calculation in R
 
Fst in R
Fst in R Fst in R
Fst in R
 
Selection index population_genetics
Selection index population_geneticsSelection index population_genetics
Selection index population_genetics
 
질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010
 
DALY & QALY
DALY & QALYDALY & QALY
DALY & QALY
 
Case-crossover study
Case-crossover studyCase-crossover study
Case-crossover study
 
Generalized Additive Model
Generalized Additive Model Generalized Additive Model
Generalized Additive Model
 
Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)
 
Machine Learning Introduction
Machine Learning IntroductionMachine Learning Introduction
Machine Learning Introduction
 
Tree advanced
Tree advancedTree advanced
Tree advanced
 
Main result
Main result Main result
Main result
 
Multilevel study
Multilevel study Multilevel study
Multilevel study
 

Recently uploaded

5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 

Recently uploaded (17)

5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 

Deep learning by JSKIM

  • 1. %ìÝ(Deep Learning) í¬@ ¬, ø¬à ôtYX © @Ä- ´íY September 10, 2014 @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 1 / 74
  • 2. What is Deep Learning? Contents 1 What is Deep Learning? 2 History Perceptron Multilayer Perceptron 1st Breakthrough: Unsupervised Learning 2nd Breakthrough: Supervised Learning 3 Apply to Public Health Epidemiology vs Machine Learning Deep Learning vs Other ML Hypothesis Testing vs Hypothesis Generating 4 Conclusion @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 2 / 74
  • 3. What is Deep Learning? Machine Learning ôè0 YµXì !` ˆÄ] !¨(prediction)D X” xõÀ¥X „|. Computer science + Statistics ?? Amazon, Google, Facebook.. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 3 / 74
  • 4. What is Deep Learning? Neural Network Human brain VS Computer 3431 3324 =?? @ à‘t lÄ, L1xÝ, 8xÝ Sequential VS Parallel @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 4 / 74
  • 5. What is Deep Learning? Neuron Arti
  • 6. cial Neural Network(ANN)[19] Figure. (A) Human neuron; (B) arti
  • 7. cial neuron or hidden unity; (C) biological synapse; (D) ANN synapses. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 5 / 74
  • 8. What is Deep Learning? http://www.nd.com/welcome/whatisnn.htm @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 6 / 74
  • 9. What is Deep Learning? Deep Neural Network(DNN) ' Deep Learning @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 7 / 74
  • 10. What is Deep Learning? Œ IT0Å `0ÄYµ' Ñ http://www.dt.co.kr/contents. html?article_no=2014062002010960718002 8Ä” À xõÀ¥ ô 6pìì è$X m@ `]' http://vip.mk.co.kr/news/view/21/20/1178659.html MS t|°Ü, `8àìÝ' tÄä http://www.bloter.net/archives/196341  $t” 5 ü” 0 ü `%ìÝ' http://www.wikitree.co.kr/main/news_view.php?id=157174 xõÀ¥ Ü lX èt¼ ¸ http://weekly.chosun. com/client/news/viw.asp?nNewsNumb=002311100009ctcd=C02 @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 8 / 74
  • 11. History Contents 1 What is Deep Learning? 2 History Perceptron Multilayer Perceptron 1st Breakthrough: Unsupervised Learning 2nd Breakthrough: Supervised Learning 3 Apply to Public Health Epidemiology vs Machine Learning Deep Learning vs Other ML Hypothesis Testing vs Hypothesis Generating 4 Conclusion @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 9 / 74
  • 12. History Perceptron Perceptron 1958D Rosenblatt[23]. y = '( Xn i=1 wi xi + b) (1) (b: bias, ': activation function(e.g: logistic or tanh)) Figure. Concept of Perceptron[Honkela] @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 10 / 74
  • 13. History Perceptron Low Performance XORÄ t°XÀ »ä[Hinton]. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 11 / 74
  • 14. History Multilayer Perceptron Multilayer Perceptron Hidden layer| ˜¬t t°ä!! @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 12 / 74
  • 15. History Multilayer Perceptron Learing Problem Hidden layer ! Weight / .. 1985D: Error Backpropagation Algorithm[24] Gradient Descent Methods ¤Ð€0 p¸.. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 13 / 74
  • 16. History Multilayer Perceptron Gradient Descent Methods Weight / 4 Îä.. Linear regression: Least square, maximum likelihood: Exact calculation. MLP: No exact method @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 14 / 74
  • 17. History Multilayer Perceptron Gradient Descent Algorithm[Han-Hsing] (a) Large Gradient (b) Small Gradient (c) Small Learning Rate (d) Large Learning Rate @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 15 / 74
  • 18. History Multilayer Perceptron Example[Hinton] A toy example to illustrate the iterative method • Each day you get lunch at the cafeteria. – Your diet consists of fish, chips, and ketchup. – You get several portions of each. • The cashier only tells you the total price of the meal – After several days, you should be able to figure out the price of each portion. • The iterative approach: Start with random guesses for the prices and then adjust them to get a better fit to the observed prices of whole meals. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 16 / 74
  • 19. History Multilayer Perceptron Solving the equations iteratively • Each meal price gives a linear constraint on the prices of the portions: price = x fishwfish + xchipswchips + xketchupwketchup • The prices of the portions are like the weights in of a linear neuron. w = (wfish ,wchips ,wketchup ) • We will start with guesses for the weights and then adjust the guesses slightly to give a better fit to the prices given by the cashier. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 17 / 74
  • 20. History Multilayer Perceptron The true weights used by the cashier Price of meal = 850 = target 150 50 100 2 5 3 portions of fish portions of chips portions of ketchup linear neuron @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 18 / 74
  • 21. History Multilayer Perceptron A model of the cashier with arbitrary initial weights • Residual error = 350 • The “delta-rule” for learning is: Δwi =ε xi (t − y) • With a learning rate of 1/35, the weight changes are +20, +50, +30 • This gives new weights of 70, 100, 80. – Notice that the weight for chips got worse! price of meal = 500 50 50 50 2 5 3 portions of fish portions of chips portions of ketchup ε @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 19 / 74
  • 22. History Multilayer Perceptron Deriving the delta rule • Define the error as the squared residuals summed over all training cases: • Now differentiate to get error derivatives for weights • The batch delta rule changes the weights in proportion to their error derivatives summed over all training cases E = 1 2 Σ (tn − yn )2 n∈training ∂E ∂wi = 1 2 ∂yn ∂wi dEn dyn Σ n Σ n (tn − yn ) = − xi n ∂E ∂wi Δwi = −ε Σ n (tn − yn ) = ε xi n @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 20 / 74
  • 23. History Multilayer Perceptron Backpropagation Algorithm[Kim] (e) Forward Propagation (f) Back Propagation @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 21 / 74
  • 24. History Multilayer Perceptron Limitations of MLP[Kim] 1 Vanishing gradient problem 2 Typically requires lots of labeled data 3 Over
  • 25. tting problem: Given limited amounts of labeled data, training via back-propagation does not work well 4 Get stuck in local minima (?) @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 22 / 74
  • 26. History Multilayer Perceptron Vanishing Gradient[2] Figure. Sigmoid functions @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 23 / 74
  • 27. History Multilayer Perceptron Local Minima[Kim] Figure. Global and Local Minima @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 24 / 74
  • 28. History 1st Breakthrough: Unsupervised Learning 1st Breakthrough: Unsupervised Learning 2006D Restricted Boltzmann Machine, Deep Belief Network, Deep Boltzmann Machine[25, 13].. Figure. Description of Unsupervised Learning[Kim] @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 25 / 74
  • 29. History 1st Breakthrough: Unsupervised Learning Limitations of MLP[Kim] 1 Vanishing gradient problem Solved by bottom-up layerwise unsupervised pre-training 2 Typically requires lots of labeled data 3 Over
  • 30. tting problem: Given limited amounts of labeled data, training via back-propagation does not work well Solved by using lots of unlabeled data 4 Get stuck in local minima (?) Unsupervised pre-training may help the network initialize with good parameters @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 26 / 74
  • 31. History 1st Breakthrough: Unsupervised Learning Restricted Boltzmann Machine(RBM) ÐÀ ®D] U`t ’ä P(v; h) = 1 Z expE(v;h) (Z: Normalized Constant) Figure. Diagram of a Restricted Boltzmann[Wikipedia] @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 27 / 74
  • 32. History 1st Breakthrough: Unsupervised Learning Energy Function E(v; h) = X i ai vi X j bjhj X i X j hjwi ;jvi = aTv bTh hTWv (ai : oset of visible variable, bj : oset of hidden variable, wi ;j : weight between vi and hj ) @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 28 / 74
  • 33. History 1st Breakthrough: Unsupervised Learning © P(v) = P h P(v; h)| T X” v@ øLX weightäD lX” ƒ. E(v; h) = X i ai vi X j bjhj X i X j hjwi ;jvi = aTv bTh hTWv ‰, h, v ÙÜÐ Ä ½X weight| lŒX$” XÄ t 1T” ÜŤ(synapse)” ð°ä. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 29 / 74
  • 34. History 1st Breakthrough: Unsupervised Learning Hebb's Law (Hebbian Learning Rule) http://www.skewsme.com/behavior.htm http://lesswrong.com/lw/71x/a_crash_course_in_the_ neuroscience_of_human/l @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 30 / 74
  • 35. History 1st Breakthrough: Unsupervised Learning Traing RBM P(v) = P h P(v; h)| T X” v@ øLX weightäD lX” ƒ. Gradient Ascent X logP(v) = log ( h expE(v;h) Z ) X = log ( h expE(v;h)) logZ X = log ( h expE(v;h)) log ( X v;h expE(v;h)) @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 31 / 74
  • 36. History 1st Breakthrough: Unsupervised Learning @logP(v) @ = 1 P h expE(v;h) X h expE(v;h) @E(v; h) @ + 1 P v;h expE(v;h) X v;h expE(v;h) @E(v; h) @ = X h p(hjv) @E(v; h) @ + X v;h p(h; v) @E(v; h) @ @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 32 / 74
  • 37. History 1st Breakthrough: Unsupervised Learning P(vjh) = Ym i=1 P(vi jh) P(hjv) = Yn j=1 P(hj jv) p(hj = 1jv) = bj + Xm i=1 wi ;jvi ! p(vi = 1jh) = 0 @ai + Xn j=1 wi ;jhj 1 A (: activation function) @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 33 / 74
  • 38. History 1st Breakthrough: Unsupervised Learning @logP(v) @ = X h p(hjv) @E(v; h) @ + X v;h p(h; v) @E(v; h) @ À Gibbs sampler SamplingXì t° @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 34 / 74
  • 39. History 1st Breakthrough: Unsupervised Learning Figure. Contrastive Divergence(CD-k)[7] @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 35 / 74
  • 40. History 1st Breakthrough: Unsupervised Learning Deep Belief Network[11, 12, 1] 1 Multiple RBM 2 Phoneme ! Word ! Grammer, Sentence 3 GenerationÄ ¥!!! http://www.cs.toronto.edu/~hinton/adi/index.htm @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 36 / 74
  • 41. History 2nd Breakthrough: Supervised Learning 2nd Breakthrough: Supervised Learning 1 Vanishing gradient problem Solved by a new non-linear activation :recti
  • 42. ed linear unit (ReLU) 2 Typically requires lots of labeled data Solved by big data crowd sourcing 3 Over
  • 43. tting problem: Given limited amounts of labeled data, training via back-propagation does not work well Solved by a new regularization method : dropout, dropconnect, etc 4 Get stuck in local minima (?) @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 37 / 74
  • 44. History 2nd Breakthrough: Supervised Learning Recti
  • 45. ed Linear Unit (ReLU) Figure. The proposed non-linearity, ReLU, and the standard neural network non-linearity, logistic[30] @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 38 / 74
  • 46. History 2nd Breakthrough: Supervised Learning ¥ 1 0ôäÌ lt mÁ 0¸0 1 |t 0¸0 ŒX” ½° Æä. 2 Yµt }ä. 3 Pre-trainingX D”1D Æ`ä.[20, 8]. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 39 / 74
  • 47. History 2nd Breakthrough: Supervised Learning DropOut DropConnect Ensemble Model DropOut: hidden unitX |€| lŒ ä[14]. DropConnect: hidden unitX ð° |€| lŒ ä[28]. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 40 / 74
  • 48. History 2nd Breakthrough: Supervised Learning Figure. Description of DropOut DropConnect[Wan] @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 41 / 74
  • 49. History 2nd Breakthrough: Supervised Learning Figure. Using the MNIST dataset, in a) Ability of Dropout and DropConnect to prevent over
  • 50. tting as the size of the 2 fully connected layers increase. b) Varying the drop-rate in a 400-400 network shows near optimal performance around the p = 0.5[28] @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 42 / 74
  • 51. History 2nd Breakthrough: Supervised Learning Local Minima Issue High dimension and non-convex optimization 1 Local minimaäX t D·D·` ƒ 2 Local minima ' Global minima. 3 Î@ (ÐÐ (ÐÈä local minimat0” ´5ä. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 43 / 74
  • 52. History 2nd Breakthrough: Supervised Learning Local minima are all similar, there are long plateaus, it can take long to break symmetries. Optimization is not the real problem when: – dataset is large – unit do not saturate too much – normalization layer 31 ConvNets: today Loss parameter Figure. Local minima when high dimension and non-convex optimization [Ranzato] @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 44 / 74
  • 53. History 2nd Breakthrough: Supervised Learning Others: Convolutional Neural Network Sparse Connectivity Shared Weight: 2(Ð pt0Ð i[documentation] @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 45 / 74
  • 54. History 2nd Breakthrough: Supervised Learning http://parse.ele.tue.nl/education/cluster0 @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 46 / 74
  • 55. History 2nd Breakthrough: Supervised Learning http://eblearn.sourceforge.net/old/demos/mnist/index.shtml http://yann.lecun.com/exdb/lenet/ @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 47 / 74
  • 56. History 2nd Breakthrough: Supervised Learning Deep Learning Summary!!! 1 1950D |I¸`(perceptron)Ð Ü‘ xõà½Ý ðl” 1980D $Xí
  • 57. Lଘ(Error Backpropagation Algorithm) ä5|I¸` (Multilayer perceptron)D Yµ` ˆŒ t . 2 Gradient vanishing, labeled dataX €q, over
  • 58. tting, local minima issue ñt ˜ t°À »t 2000D LÀ xõà½Ý ðl” õôÁÜ. 3 2006D€0 ü Ì8àD t© Unsupervised Learningx Restricted Boltzmann Machine(RBM), Deep Belief Network(DBN), Deep Boltzmann Machine(DBM), Convolutional Deep Belief Network ñt . 4 Unlabeled data| t©Xì pre-trainingD ‰` ˆŒ ´ Ð ¸ ä5|I¸`X Ät ùõ(. 5 2010D€0” Ept0| ù t©h Î@ labeled data| ¬©` ˆŒ Èà, Recti
  • 59. ed linear unit (ReLU), DropOut, DropConnect ñX ¬ vanishing gradient8@ over
  • 60. tting issue| t°Xì D Supervised learningt ¥. 6 Local minima issue” High dimension non-convex optimizationД Ä ” €„t DÈ|” õ. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 48 / 74
  • 61. Apply to Public Health Contents 1 What is Deep Learning? 2 History Perceptron Multilayer Perceptron 1st Breakthrough: Unsupervised Learning 2nd Breakthrough: Supervised Learning 3 Apply to Public Health Epidemiology vs Machine Learning Deep Learning vs Other ML Hypothesis Testing vs Hypothesis Generating 4 Conclusion @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 49 / 74
  • 62. Apply to Public Health Epidemiology vs Machine Learning Objective of statistics 1 ÀÝX U¥, Causal inference µÄY Pearson: äX ÄT` …D Xì.. 2 X¬° µÄY R.A Fisher: ¥ 1¥t ‹@ DÌ Ý @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 50 / 74
  • 63. Apply to Public Health Epidemiology vs Machine Learning Statistics in Epidemiology Causal inference: Ðxt 4Çx? tt ˜” ¨t ñtä. xüÄ ”`. è ¨ 8. ŽÀX èÄ ”(Kilometer VS meter, centering issue)
  • 64. , Odds Ratio(OR), Hazard Ratio(HR), p-value, AIC @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 51 / 74
  • 65. Apply to Public Health Epidemiology vs Machine Learning Statistics in Machine Learning Prediction: ^ ´»Œ ƒx? !%t ‹@ ƒt ñtä. õ¡ ¨Ä ÁÆä. !Ì ¨( ˜ ät. D”Ð 0| ŽÀäD  ¬ ¼ä. (Scale change) ^ Y , ^p, Cross-validation, Accuracy, ROC curve @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 52 / 74
  • 66. Apply to Public Health Epidemiology vs Machine Learning Example: Logistic regression Binomial data| äè” % µÄ„)•. ¹ˆ epidemiologic studyД x À.
  • 67. ! Odds Ratio(OR) : tt }ä. But.. Logit function... Ä°t ´$ÌÀ” Ðx. Heritability issue of binomial trait?? Logith ”x.. Probit modelt Ht ˆä. Ä°}ä.
  • 68. t ´5ä.. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 53 / 74
  • 69. Apply to Public Health Epidemiology vs Machine Learning Logit VS Probit Figure. Logit VS Probit Logit: Pr(Y = 1 j X) = [1 + eX0
  • 70. ]1 Probit: Pr(Y = 1 j X) = (X0
  • 71. ) @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 54 / 74
  • 72. Apply to Public Health Epidemiology vs Machine Learning Example2: Cox proportional hazard model Censored data„X . http: //www.theriac.org/DeskReference/viewDocument.php?id=188 @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 55 / 74
  • 73. Apply to Public Health Epidemiology vs Machine Learning http://www.uni-kiel.de/psychologie/rexrepos/posts/ survivalCoxPH.html @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 56 / 74
  • 74. Apply to Public Health Epidemiology vs Machine Learning Assumptions ln (t) = ln 0(t) +
  • 75. 1X1 + +
  • 76. pXp = ln 0(t) + X
  • 78. 1X1++
  • 81. ) = exp 0(t) eX
  • 85. : Hazard Ratio(HR) @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 57 / 74
  • 86. Apply to Public Health Epidemiology vs Machine Learning Hazard Ratio t ¸Xä. Odd Ratio . But, t Ît ä´ä. Ýt õ¡t Ä°t ´5ä. Conditional Logistic Regression.. PredictionÐÄ Cox| àÑ` D”” Æä. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 58 / 74
  • 87. Apply to Public Health Epidemiology vs Machine Learning Alternatives Yi : Time of event Not censored p(yi ji ; 2) = (22)1 2 expf (yi i )2 22 g Censored p(yi ti ji ; 2) = Z 1 ti (22)1 2 expf (yi i )2 22 g@yi = ( i ti ) Ü„ìX CDF èˆ ! Ä°t }ä!! @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 59 / 74
  • 88. Apply to Public Health Epidemiology vs Machine Learning Example3: Correlation Structure Correlation structure à$t|X˜? 1 Epidemiology: Important
  • 89. X s.e ä. ! p-value ä. 2 Prediction model: Not important
  • 90. ´” lŒ Hä.! ^ Y ; ^p” ˜ Hä. Correlation structure : Unmeasured eect ! !À J@ ƒ@ New dataÐ prediction` L t©` Æä. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 60 / 74
  • 91. Apply to Public Health Epidemiology vs Machine Learning Figure. A representation of the tradeo between exibility and interpretability, using dierent statistical learning methods. In general, as the exibility of a method increases, its interpretability decreases[16] @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 61 / 74
  • 92. Apply to Public Health Epidemiology vs Machine Learning Catching crumbs from the table In the face of metahuman science, humans have become metascientists. Ted Chiang It has been 25 years since a report of origi-nal research was last submitted to our editors for publication, making this an appropriate time to revisit the question that was so widely debated then: what is the role of human scientists in an age when the frontiers of scientific inquiry have moved beyond the comprehensibility of humans? No doubt many of our subscribers remember reading papers whose authors were the first individuals ever to obtain the results they described. But as metahumans began to dominate experimental research, they increasingly made their findings avail-able only via DNT (digital neural transfer), leaving journals to publish second-hand accounts translated into human language. Without DNT, humans could not fully grasp earlier developments nor effectively utilize the new tools needed to conduct research, while metahumans continued to improve DNT and rely on it even more. Jour-nals for human audiences were reduced to vehicles of popularization, and poor ones at that, as even the most brilliant humans found themselves puzzled by translations of the latest findings. No one denies the many benefits of metahuman science, but one of its costs to human researchers was the realization that they would probably never make an original contribution to science again. Some left the field altogether, but those who stayed shifted their attentions away from original research and toward hermeneutics: interpreting the scientific work of metahumans. Textual hermeneutics became popular first, since there were already terabytes of metahuman publications whose transla-tions, although cryptic, were presumably not entirely inaccurate. Deciphering these texts bears little resemblance to the task per-formed by traditional palaeographers, but progress continues: recent experiments have validated the Humphries decipherment of decade-old publications on histocompati-bility genetics. The availability of devices based on metahuman science gave rise to artefact hermeneutics. Scientists began attempting to ‘reverse engineer’ these artefacts, their goal being not to manufacture competing products, but simply to understand the physical principles underlying their opera-tion. The most common technique is the crystallographic analysis of nanoware appli-entific futures inquiry and increases the body of human knowledge just as original research did. Moreover, human researchers may discern applications overlooked by meta-humans, whose advantages tend to make them unaware of our concerns. For example, imagine if research offered hope of a different intelligence-enhancing therapy, one that would allow individuals to gradually ‘upgrade’ their minds to a level equivalent to that of a metahuman. Such a therapy would offer a bridge across what has become the greatest cultural divide in our species’ history, yet it might not even occur to metahumans to explore it; that possibility alone justifies the continuation of human research. We need not be intimidated by the accomplishments of metahuman science. We should always remember that the tech-nologies that made metahumans possible were originally invented by humans, and they were no smarter than we. n Ted Chiang is an occasional writer of science fiction. His latest story can be found in the anthology Vanishing Acts, published by Tor Books. ances, which frequently provides us with new insights into mechanosynthesis. The newest and by far the most speculative mode of inquiry is remote sensing of metahuman research facilities. A recent target of investigation is the ExaCollider recently installed beneath the Gobi Desert, whose puzzling neutrino signature has been the subject of much controversy. (The portable neutrino detector is, of course, another metahuman arte-fact whose oper-ating principles remain elusive.) The question is, are these worthwhile undertakings for sci-entists? Some call them a waste of time, likening them to a Native American research effort into bronze smelting when steel tools of European manufacture are readily available. This comparison might be more apt if humans were in competition with metahumans, but in today’s economy of abundance there is no evidence of such competition. In fact, it is important to recognize that — unlike most previous low-technology cultures confronted with a high-technology one — humans are in no danger of assimilation or extinction. There is still no way to augment a human brain into a metahuman one; the Sugimoto gene therapy must be performed before the embryo begins neurogenesis in order for a brain to be compatible with DNT. This lack of an assimilation mechanism means that human parents of a metahuman child face a difficult choice: to allow their child DNT interaction with metahuman culture, and watch him or her grow incomprehensible to them; or else restrict access to DNT during the child’s formative years, which to a metahuman is deprivation like that suffered by Kaspar Hauser. It is not surprising that the percentage of human parents choosing the Sugimoto gene therapy for their children has dropped almost to zero in recent years. As a result, human culture is likely to sur-vive well into the future, and the scientific tradition is a vital part of that culture. Hermeneutics is a legitimate method of sci- NATURE|VOL 405 | 1 JUNE 2000 |www.nature.com 517 JACEY © 2000 Macmillan Magazines Ltd @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 62 / 74
  • 93. Apply to Public Health Epidemiology vs Machine Learning Human VS metahuman[4] Ted Chiang : SF Œ$ TÀ xX(xõÀ¥)X UÄx Àݘ¬¥%. Human science: TÀ xX ¸ ƒäD tX” ÄX . TÀ xXX |8D ˆíX” ƒt human science.. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 63 / 74
  • 94. Apply to Public Health Deep Learning vs Other ML Deep Learning vs Other ML Multiple Hidden Layer: High exibility Massive Parallel Computing Programming language for GPU/parallel computing CUDA(Compute Uni
  • 95. ed Device Architecture), OpenCL[21, 26] @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 64 / 74
  • 96. Apply to Public Health Deep Learning vs Other ML Examples: Cat recognition 16,000X CPU ø¼Ì ôà à‘t xÝ (Unsupervised Learning) GPU| t©Xì Computing Ü „. http: //www.asiae.co.kr/news/view.htm?idxno=2012062708351993171 http://googleblog.blogspot.kr/2012/06/ using-large-scale-brain-simulations-for.html @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 65 / 74
  • 97. Apply to Public Health Deep Learning vs Other ML Paper[18, 5] Building High-level Features Using Large Scale Unsupervised Learning Quoc V. Le quocle@cs.stanford.edu Marc’Aurelio Ranzato ranzato@google.com Rajat Monga rajatmonga@google.com Matthieu Devin mdevin@google.com Kai Chen kaichen@google.com Greg S. Corrado gcorrado@google.com Jeff Dean jeff@google.com Andrew Y. Ng ang@cs.stanford.edu Abstract We consider the problem of building high-level, class-specific feature detectors from only unlabeled data. For example, is it pos-sible to learn a face detector using only unla-beled images? To answer this, we train a 9- layered locally connected sparse autoencoder with pooling and local contrast normalization on a large dataset of images (the model has 1 billion connections, the dataset has 10 mil-lion 200x200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. Control experiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation. We also find that the same network is sensitive to other high-level concepts such as cat faces and human bod-ies. Starting with these learned features, we trained our network to obtain 15.8% accu-racy in recognizing 22,000 object categories from ImageNet, a leap of 70% relative im-provement over the previous state-of-the-art. Appearing in Proceedings of the 29 th International Confer-ence on Machine Learning, Edinburgh, Scotland, UK, 2012. Copyright 2012 by the author(s)/owner(s). 1. Introduction The focus of this work is to build high-level, class-specific feature detectors from unlabeled images. For instance, we would like to understand if it is possible to build a face detector from only unlabeled images. This approach is inspired by the neuroscientific conjecture that there exist highly class-specific neurons in the hu-man brain, generally and informally known as “grand-mother neurons.” The extent of class-specificity of neurons in the brain is an area of active investigation, but current experimental evidence suggests the possi-bility that some neurons in the temporal cortex are highly selective for object categories such as faces or hands (Desimone et al., 1984), and perhaps even spe-cific people (Quiroga et al., 2005). Contemporary computer vision methodology typically emphasizes the role of labeled data to obtain these class-specific feature detectors. For example, to build a face detector, one needs a large collection of images labeled as containing faces, often with a bounding box around the face. The need for large labeled sets poses a significant challenge for problems where labeled data are rare. Although approaches that make use of inex-pensive unlabeled data are often preferred, they have not been shown to work well for building high-level features. This work investigates the feasibility of building high-level features from only unlabeled data. A positive answer to this question will give rise to two significant results. Practically, this provides an inexpensive way to develop features from unlabeled data. But perhaps more importantly, it answers an intriguing question as to whether the specificity of the “grandmother neuron” could possibly be learned from unlabeled data. Infor-mally, this would suggest that it is at least in principle possible that a baby learns to group faces into one class Deep learning with COTS HPC systems Adam Coates acoates@cs.stanford.edu Brody Huval brodyh@stanford.edu Tao Wang twangcat@stanford.edu David J. Wu dwu4@cs.stanford.edu Andrew Y. Ng ang@cs.stanford.edu Stanford University Computer Science Dept., 353 Serra Mall, Stanford, CA 94305 USA Bryan Catanzaro bcatanzaro@nvidia.com NVIDIA Corporation, 2701 San Tomas Expressway, Santa Clara, CA 95050 Abstract Scaling up deep learning algorithms has been shown to lead to increased performance in benchmark tasks and to enable discovery of complex high-level features. Recent efforts to train extremely large networks (with over 1 billion parameters) have relied on cloud-like computing infrastructure and thousands of CPU cores. In this paper, we present tech-nical details and results from our own sys-tem based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) tech-nology: a cluster of GPU servers with Infini-band interconnects and MPI. Our system is able to train 1 billion parameter networks on just 3 machines in a couple of days, and we show that it can scale to networks with over 11 billion parameters using just 16 machines. As this infrastructure is much more easily marshaled by others, the approach enables much wider-spread research with extremely large neural networks. 1. Introduction A significant amount of effort has been put into de-veloping deep learning systems that can scale to very large models and large training sets. With each leap in scale new results proliferate: large models in the literature are now top performers in supervised vi-sual recognition tasks (Krizhevsky et al., 2012; Cire-san et al., 2012; Le et al., 2012), and can even learn Proceedings of the 30 th International Conference on Ma-chine Learning, Atlanta, Georgia, USA, 2013. JMLR: WCP volume 28. Copyright 2013 by the author(s). to detect objects when trained from unlabeled im-ages alone (Coates et al., 2012; Le et al., 2012). The very largest of these systems has been constructed by Le et al. (Le et al., 2012) and Dean et al. (Dean et al., 2012), which is able to train neural networks with over 1 billion trainable parameters. While such extremely large networks are potentially valuable objects of AI research, the expense to train them is overwhelming: the distributed computing infrastructure (known as “DistBelief”) used for the experiments in (Le et al., 2012) manages to train a neural network using 16000 CPU cores (in 1000 machines) in just a few days, yet this level of resource is likely beyond those available to most deep learning researchers. Less clear still is how to continue scaling significantly beyond this size of network. In this paper we present an alternative approach to training such networks that leverages in-expensive computing power in the form of GPUs and introduces the use of high-speed communications in-frastructure to tightly coordinate distributed gradient computations. Our system trains neural networks at scales comparable to DistBelief with just 3 machines. We demonstrate the ability to train a network with more than 11 billion parameters—6.5 times larger than the model in (Dean et al., 2012)—in only a few days with 2% as many machines. Buoyed by many empirical successes (Uetz Behnke, 2009; Raina et al., 2009; Ciresan et al., 2012; Krizhevsky, 2010; Coates et al., 2011) much deep learning research has focused on the goal of building larger models with more parameters. Though some techniques (such as locally connected networks (Le- Cun et al., 1989; Raina et al., 2009; Krizhevsky, 2010), and improved optimizers (Martens, 2010; Le et al., 2011)) have enabled scaling by algorithmic advan-tage, another main approach has been to achieve scale @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 66 / 74
  • 98. Apply to Public Health Hypothesis Testing vs Hypothesis Generating Hypothesis Testing vs Hypothesis Generating Figure. Hypothesis-testing and Hypothesis-generating paradigms[3] @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 67 / 74
  • 99. Apply to Public Health Hypothesis Testing vs Hypothesis Generating @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 68 / 74
  • 100. Apply to Public Health Hypothesis Testing vs Hypothesis Generating @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 69 / 74
  • 101. Apply to Public Health Hypothesis Testing vs Hypothesis Generating @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 70 / 74
  • 102. Conclusion Contents 1 What is Deep Learning? 2 History Perceptron Multilayer Perceptron 1st Breakthrough: Unsupervised Learning 2nd Breakthrough: Supervised Learning 3 Apply to Public Health Epidemiology vs Machine Learning Deep Learning vs Other ML Hypothesis Testing vs Hypothesis Generating 4 Conclusion @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 71 / 74
  • 103. Conclusion Conclusion Deep Learningt Mobile Health X uì. Mobile data: Á, L1, M¤¸ ñ D pt0. Parallel Computing System l•t D”Xä. Prediction vs Inference Understanding concept of Machine Learning Hypothesis Generating Paradigm shift: Causal inference ! Big data Prediction @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 72 / 74
  • 104. Conclusion Reference I [1] Bengio, Y. (2009). Learning deep architectures for ai. Foundations and trends R in Machine Learning, 2(1):1{127. [2] Bengio, Y., Simard, P., and Frasconi, P. (1994). Learning long-term dependencies with gradient descent is dicult. Neural Networks, IEEE Transactions on, 5(2):157{166. [3] Biesecker, L. G. (2013). Hypothesis-generating research and predictive medicine. Genome research, 23(7):1051{1053. [4] Chiang, T. (2000). Catching crumbs from the table. Nature, 405(6786):517{517. [5] Coates, A., Huval, B., Wang, T., Wu, D., Catanzaro, B., and Andrew, N. (2013). Deep learning with cots hpc systems. In Proceedings of The 30th International Conference on Machine Learning, pages 1337{1345. [documentation] documentation, D. . Convolutional neural networks (lenet). http://deeplearning.net/tutorial/lenet.html. [7] Fischer, A. and Igel, C. (2012). An introduction to restricted boltzmann machines. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pages 14{36. Springer. [8] Glorot, X., Bordes, A., and Bengio, Y. (2011). Deep sparse recti
  • 105. er networks. In Proceedings of the 14th International Conference on Arti
  • 106. cial Intelligence and Statistics. JMLR WCP Volume, volume 15, pages 315{323. [Han-Hsing] Han-Hsing, T. [ml, python] gradient descent algorithm (revision 2). http://hhtucode.blogspot.kr/2013/04/ml-gradient-descent-algorithm.html. [Hinton] Hinton, G. Coursera: Neural networks for machine learning. https://class.coursera.org/neuralnets-2012-001. [11] Hinton, G., Osindero, S., and Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527{1554. [12] Hinton, G. E. (2009). Deep belief networks. Scholarpedia, 4(5):5947. [13] Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786):504{507. [14] Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 73 / 74
  • 107. Conclusion Reference II [Honkela] Honkela, A. Multilayer perceptrons. https://www.hiit.fi/u/ahonkela/dippa/node41.html. [16] James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An introduction to statistical learning. Springer. [Kim] Kim, J. 2014 (4xÝ 0ÄYµ ì„YP. http://prml.yonsei.ac.kr/. [18] Le, Q. V. (2013). Building high-level features using large scale unsupervised learning. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 8595{8598. IEEE. [19] Maltarollo, V. G., Honorio, K. M., and da Silva, A. B. F. (2013). Applications of arti
  • 108. cial neural networks in chemical problems. [20] Nair, V. and Hinton, G. E. (2010). Recti
  • 109. ed linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 807{814. [21] Nvidia, C. (2007). Compute uni
  • 110. ed device architecture programming guide. [Ranzato] Ranzato, M. Deep learning for vision: Tricks of the trade. www.cs.toronto.edu/~ranzato. [23] Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6):386. [24] Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1985). Learning internal representations by error propagation. Technical report, DTIC Document. [25] Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. [26] Stone, J. E., Gohara, D., and Shi, G. (2010). Opencl: A parallel programming standard for heterogeneous computing systems. Computing in science engineering, 12(3):66. [Wan] Wan, L. Regularization of neural networks using dropconnect. http://cs.nyu.edu/~wanli/dropc/. [28] Wan, L., Zeiler, M., Zhang, S., Cun, Y. L., and Fergus, R. (2013). Regularization of neural networks using dropconnect. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 1058{1066. [Wikipedia] Wikipedia. Wikepedia. http://en.wikipedia.org/wiki/Restricted_Boltzmann_machine. [30] Zeiler, M. D., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q. V., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., et al. (2013). On recti
  • 111. ed linear units for speech processing. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 3517{3521. IEEE. @Ä- ( ´íY) %ìÝ(Deep Learning) September 10, 2014 74 / 74