Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations, Honglak Lee(ICML 2009)

석사과정 세미나 발표를 위해 논문을 읽고 분석한 내용입니다. CDBN은 CNN와 DBN의 장점을 결합하여 translation invariance와 computational competence를 확보하였고, probabilistic max-pooling을 통해 image restoration을 할 수 있는 undirected DBM을 구성할 수 있게 합니다.

  • Login to see the comments

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations

  1. 1. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations Jaehyun Ahn (jaehyunahn@sogang.ac.kr) 서강대학교 데이터마이닝 연구실 Computer Science Department Sogang University Honglak Lee (ICML 2009, 744 quotes)
  2. 2. Key Reference •  Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representation (Honglak Lee et al., ICML 2009) •  Lecture at ICML 2009 (http://videolectures.net/icml09_lee_cdb/) •  Learning Multiple Layers of Features from Tiny Images (Alex Krizhevsky, 2009) •  To be Bernoulli or to be Gaussian, for a Restricted Boltzmann Machine (Takayoshi Yamashita et al., ICPR 2014) 7/21/15 2
  3. 3. Ideas 7/21/15 3
  4. 4. What this paper wants to say.. •  Taking advantages of Deep Belief Network through Convolutional networks –  Translation invariance •  Max-pooling –  Scalable to realistic image sizes •  Max-pooling –  Hierarchical probabilistic inference by combining bottom-up and top-down information •  Probabilistic max-pooling 7/21/15 4
  5. 5. Convolutional Neural Networks 7/21/15 5
  6. 6. Basic Design of Convolutional Networks 7/21/15 6What  is  this?  h)p://www.slideshare.net/sogo1127/101-­‐convolu>onal-­‐neural-­‐networks   Alternate between “Detection” and “Pooling” layers
  7. 7. Advantages of Convolutional Networks 1. Translation Invariance •  In LeNet, LeCun et al., 1998, Max-pooling provides a form of translation invariance. If max-pooling is done over a 2x2 region, 4 possible configurations will produce exactly the same output at the convolutional layer. 7/21/15 7 Image:  IEEE  2013,  h)p://www.computer.org/csdl/trans/tp/2013/08/)p2013081930-­‐abs.html  
  8. 8. Advantages of Convolutional Networks 2. Scalable to realistic image sizes •  This paper starts from this basic question “How can we scale to realistic image sizes (e.g. 200x200 pixels)?” •  Max-pooling shrinks the representation in higher layers 7/21/15 8 Image:  IEEE  2013,  h)p://www.computer.org/csdl/trans/tp/2013/08/)p2013081930-­‐abs.html  
  9. 9. Key Idea with advantages of Convolutional Networks 3. Hierarchical probabilistic inference Max-pooling •  Max-pooling is deterministic and feed-forward only. However, this paper gave a max-pooling to a probabilistic semantics that enables to combine bottom- up and top-down information. 7/21/15 9
  10. 10. Deep Belief Networks 7/21/15 10
  11. 11. General Deep Belief Network 7/21/15 11 Z: state probability If the visible units are binary-valued, If the visible layers are real-valued, The visible units are Gaussian with diagonal covariance.
  12. 12. General Deep Belief Network 7/21/15 12 The visible units are Gaussian with diagonal covariance. [Learning features from Tiny Images, 2009 p.13, 1.4.3 Gaussian-Bernoulli RBMs]
  13. 13. General Deep Belief Network 7/21/15 13 The visible units are Gaussian with diagonal covariance. [Learning features from Tiny Images, 2009 p.13, 1.4.3 Gaussian-Bernoulli RBMs]
  14. 14. General Deep Belief Network 7/21/15 14 The visible units are Gaussian with diagonal covariance. [Learning features from Tiny Images, 2009 p.13, 1.4.3 Gaussian-Bernoulli RBMs] V-dimensional Gaussian Distribution With Diagonal Covariance given by And mean in dimension i given by
  15. 15. General Deep Belief Network 7/21/15 15 Therefore, we can perform efficient block Gibbs sampling by alternately sampling each layer’s unit.
  16. 16. General Deep Belief Network 7/21/15 16[Learning features from Tiny Images, 2009 p.13, 1.4.3 Gaussian-Bernoulli RBMs]
  17. 17. General Deep Belief Network 7/21/15 17 ! = − exp!(−! !, ℎ ) !,! ! To  find  maximizing  model  weights  W,  we  have  to  calculate   Z: state probability with  gradient  ascent   Carreira-Perpinan and Hinton (2005) showed that the derivate of the log-likelihood of data using chain -rule. Since computing the average over the true model distribution is intractable, Hinton et al. (2006) use approximation of that derivative called contrastive divergence: one replaces the average infinite to small k. Quotes from: [Representational Power of RBM and DBN, Nicolas Le Roux et al. p.3]
  18. 18. Convolutional DBN 7/21/15 18
  19. 19. Basic structure of Convolutional RBM (CRBM) 7/21/15 19
  20. 20. Convolutional RBM 7/21/15 20 Bα := {(i, j):hij belongs to the block α }
  21. 21. Convolutional RBM’s Energy Function 7/21/15 21 K contains an information of Hidden layer’s units “group” E(v,h) = − hij k r,s=1 NW ∑ i, j=1 NH ∑ k=1 K ∑ Wrs k vi+r−1, j+s−1 − bk k=1 K ∑ hij k −c vij i, j=1 NV ∑ i, j=1 NH ∑ NV NW NH NP Bα := {(i, j):hij belongs to the block α } α := {(C ×C): pooling block }
  22. 22. Object function for Gibbs sampling 7/21/15 22 Gibbs sampling P(hij k =1| v) =σ ((W ~ *v)ij + bk ) P(vij =1| h) =σ (( W k k ∑ *hk )ij +c) NV NW NH NP Bα := {(i, j):hij belongs to the block α } α := {(C ×C): pooling block } P(v,h)model Gibbs sampling
  23. 23. Convolutional RBM 7/21/15 23 Bα := {(i, j):hij belongs to the block α }
  24. 24. Probabilistic Max pooling 7/21/15 24
  25. 25. Probabilistic Max pooling 7/21/15 25 P(Y = 0)? I(hij k ):= bk +(W k ~ *v)ij Convolu>onal  summa>on  (signal)  
  26. 26. Meaning of probabilistic max-pooling 7/21/15 26 Max-pooling was intended only for feed-forward architectures. In contrast, we are interested in a generative model of images which supports both top-down and bottom-up interface. Therefore, we designed our generative model so that inference inv -loves max-pooling like behavior. 3.3 Probabilistic max-pooling
  27. 27. Hierarchical Probabilistic Inference 7/21/15 27
  28. 28. Object: Image restoration via probabilistic way 7/21/15 28
  29. 29. Object: Image restoration via probabilistic way 7/21/15 29
  30. 30. Difference Between DBN and DBM 7/21/15 30
  31. 31. Difference Between DBN and DBM 7/21/15 31 Model  construc>on     with  sta%s%cal  dependency  
  32. 32. Training Deep Boltzmann Machine 7/21/15 32Deep  learning,  Russ  Salakhutdinov,  University  of  Toronto  (h)p://bit.ly/1Mg9mAi)  
  33. 33. Training Deep Boltzmann Machine 7/21/15 33Deep  learning,  Russ  Salakhutdinov,  University  of  Toronto  (h)p://bit.ly/1Mg9mAi)  
  34. 34. Let’s reconstruct our energy function 7/21/15 34Deep  learning,  Russ  Salakhutdinov,  University  of  Toronto  (h)p://bit.ly/1Mg9mAi)   h' Γ p h v ω E(v,h, p,h') = − v k ∑ •(wk *hk )− bk k ∑ ij k hij ∑ − pk k,l ∑ •(Γkl *h'kl )− b'l l ∑ ij l h'ij ∑
  35. 35. Let’s reconstruct our energy function 7/21/15 35Deep  learning,  Russ  Salakhutdinov,  University  of  Toronto  (h)p://bit.ly/1Mg9mAi)   h' Γ p h v ω E(v,h, p,h') = − v k ∑ •(wk *hk )− bk k ∑ ij k hij ∑ − pk k,l ∑ •(Γkl *h'kl )− b'l l ∑ ij l h'ij ∑
  36. 36. Compare with DBM learning method 7/21/15 36Deep  learning,  Russ  Salakhutdinov,  University  of  Toronto  (h)p://bit.ly/1Mg9mAi)   E(v,h, p,h') = − v k ∑ •(wk *hk )− bk k ∑ ij k hij ∑ − pk k,l ∑ •(Γkl *h'kl )− b'l l ∑ ij l h'ij ∑ visible   hidden  
  37. 37. Conditional Probability will be given by 7/21/15 37 P(hi, j k =1| v,h') = exp(I(hi, j k )+ I(pα k )) 1+ exp(I(hi, j k )+ I(pα k )) (i', j')∈Bα ∑ P(pα k =1| v,h') = exp(I(hi, j k )+ I(pα k )) (i', j')∈Bα ∑ 1+ exp(I(hi, j k )+ I(pα k )) (i', j')∈Bα ∑ Top-­‐down   Top-­‐down   Bo)om-­‐up   Bo)om-­‐up  
  38. 38. Experimental Results 7/21/15 38
  39. 39. Supervised Learning from MNIST 7/21/15 39
  40. 40. Unsupervised Learning from Natural Images 7/21/15 40
  41. 41. Unsupervised learning of object-parts 7/21/15 41
  42. 42. Hierarchical Probabilistic Inference 7/21/15 42

×