Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Deep Learning through Examples by Sri Ambati 143585 views
- Deep neural networks by Si Haem 150123 views
- Dữ liệu lớn và các bước tiếp cận: A... by Hong Ong 88926 views
- Bắt đầu nghiên cứu Big Data by Hong Ong 80388 views
- Bắt đầu học data science by Hong Ong 72123 views
- Machine Learning Applications in NL... by butest 5144 views

67,533 views

Published on

딥러닝 Deep Learning에 관해서 팀에 공유했던 발표자료입니다.

Published in:
Data & Analytics

No Downloads

Total views

67,533

On SlideShare

0

From Embeds

0

Number of Embeds

50,602

Shares

0

Downloads

1,837

Comments

9

Likes

89

No notes for slide

- 1. Deep Learning Conceptual Understanding & Application Variants ben.hur@daumkakao.com The aim of this slide is at conceptual understanding of deep learning, rather than implementing it for practical applications. That is, this is more for paper readers, less for system developers. Note that, some variables are not uniquely deﬁned nor consistently used for convenience. Also, variable/constant and vector/matrix are not clearly distinct. More importantly, some descriptions may be incorrect due to lack of my ability.
- 2. Deep Learning (DL) is, in concise, a deep and wide neural network. That is half-true.
- 3. Deep Learning is difﬁcult because of lack of understanding about — artiﬁcial neural network (ANN) — unfamiliar applications.
- 4. DL — ANN — Signal transmission between neurons — Graphical model (Belief network) — Linear regression / logistic regression — Weight, bias & activation — (Feed-forward) Multi-layer perceptron (MLP) — Optimization — Back-propagation — (Stochastic) Gradient-decent
- 5. DL — Applications — Speech recognition (Audio) — Natural language processing (Text) — Information retrieval (Text) — Image recognition (Vision) — & yours (+)
- 6. Artiﬁcial Neural Network (ANN)
- 7. ANN is a nested ensemble of [logistic] regressions.
- 8. Perceptron & Message Passing
- 9. http://en.wikipedia.org/wiki/Neuron
- 10. 5 5 y = x Message passing
- 11. y = Σ x y = Σ wx y = Σ wx + b Linear regression
- 12. y = sigm(Σwx + b) Logistic regression Σwx?Σwx
- 13. Activation function — Linear g(a) = a — Sigmoid g(a) = sigm(a) = 1 / (1 + exp(-a)) — Tanh g(a) = tanh(a) = (exp(a) - exp(-a)) / (exp(a) + exp(-a)) — Rectiﬁed linear g(a) = reclin(a) = max(0, a) — Step — Gaussian — Softmax, usually for the output layer
- 14. Bias controls activation. y = ax b y = ax + b
- 15. Multi-Layer Perceptron (MLP)
- 16. Linearly non-separable by a single perceptron
- 17. Input layer Hidden layer Output layer http://info.usherbrooke.ca/hlarochelle/ neural_networks/content.html Interconnecting weight Bias Activation function
- 18. http://info.usherbrooke.ca/hlarochelle/ neural_networks/content.html
- 19. Back-Propagation & Optimization
- 20. {x, y} h=a(wx+b) y'=a2(w2h+b2) l = y - y'w2 & b2w & b w2 R & b2 RwR & bR y' Back-propagation {x, y} —> feed-forward <— back-propagation
- 21. Reinforced through epoch ANN is a reinforcement learning.
- 22. Local minimum Plateaus http://info.usherbrooke.ca/hlarochelle/ neural_networks/content.html
- 23. Stochastic Gradient Descent The slide excludes details. See reference tutorial for that.
- 24. ANN is a good approximator of any functions, if well-designed and trained.
- 25. Deep Learning
- 26. DL (DNN) is a deep and wide neural network. many (2+) hidden layers many input/hidden nodes
- 27. DEEP WIDE Large matrix Many matrix Image from googling
- 28. Brain image from Simon Thorpe Other from googling
- 29. Many large connection matrices (W) — Computational burden (GPU) — Insufﬁcient labeled data (Big Data) — Under-ﬁtting (Optimization) — Over-ﬁtting (Regularization)
- 30. Unsupervised makes DL success.
- 31. Unsupervised pre-training — Initialize W matrices in an unsupervised manner — Restricted Boltzmann Machine (RBM) — Auto-encoder — Pre-train in a layer-wise (stacking) fashion — Fine-tune in a supervised manner
- 32. Unsupervised representation learning — Construct a feature space — DBM / DBN — Auto-encoder — Sparse coding — Feed new features to a shallow ANN as input cf, PCA + regression
- 33. visible hidden Restricted Boltzmann Machine (RBM)
- 34. Boltzmann Machine Example
- 35. visible hidden
- 36. Auto-encoder x x' W WT encoder decoder
- 37. if k <= n (single & linear), then AE becomes PCA. otherwise, AE can be arbitrary (over-complete). Hidden & Output layers plays the role of semantic feature space in general - compresses & latent feature (under-complete) - expanded & primitive feature (over-complete)
- 38. Denoising auto-encoder x x' nois(x)
- 39. Contractive auto-encoder Regularized auto-encoder Same as Ridge (or similar to Lasso) Regression
- 40. Hinton and Salakhutdinov, Science, 2006 Deep auto-encoder
- 41. (k > n, over-complete) Sparse coding
- 42. X Y
- 43. RBM3 RBM2 RBM1 Stacking Training in the order of RBM1, RBM2, RBM3, …
- 44. Representation Learning Artiﬁcial Neural Network+ - Sparse coding - Auto-encoder - DBN / DBM
- 45. Stochastic Dropout Training for preventing over-ﬁtting
- 46. http://info.usherbrooke.ca/hlarochelle/ neural_networks/content.html
- 47. Deep Architecture
- 48. Deep architecture is an ensemble of shallow networks.
- 49. Key Deep Architectures — Deep Neural Network (DNN) — Deep Belief Network (DBN) — Deep Boltzmann Machine (DBM) — Recurrent Neural Network (RNN) — Convolution Neural Network (CNN) — Multi-modal/multi-tasking — Deep Stacking Network (DSN)
- 50. Applications
- 51. Applications — Speech recognition — Natural language processing — Image recognition — Information retrieval Application Characteristic - Non-numeric data - Large dimension (curse of dimensionality)
- 52. Natural language processing — Word embedding — Language modeling — Machine translation — Part-of-speech (POS) tagging — Named entity recognition — Sentiment analysis — Paraphrase detection — …
- 53. Deep-structured semantic modeling (DSSM) BM25(Q, D) vs Corr(sim(Q, D), CTR)
- 54. DBN & DBM
- 55. Recurrent Neural Network (RNN)
- 56. Recurrent Neural Network (RNN)
- 57. http://nlp.stanford.edu/courses/NAACL2013/ NAACL2013-Socher-Manning-DeepLearning.pdf Recursive Neural Network
- 58. Convolution Neural Network (CNN)
- 59. C-DSSM
- 60. CBOW & Skip-gram (Word2Vec)
- 61. Deep Stacking Network (DSN)
- 62. Tensor Deep Stacking Network (TDSN)
- 63. Multi-modal / Mulit-tasking
- 64. Google Picture to Text
- 65. Content-based Spotify Recommendation
- 66. In various applications, Deep Learning provides either — better performance than state-of-the-art — similar performance with less labor
- 67. Remarks
- 68. Deep Personalization with Rich Proﬁle in a Vector Representation - Versatile / ﬂexible input - Dimension reduction - Same feature space ==> Similarity computation
- 69. RNN for CF http://erikbern.com/?p=589
- 70. Hard work — Application-speciﬁc conﬁguration — Network structure — Objective and loss function — Parameter optimization
- 71. Yann LeCun —> Facebook Geoffrey Hinton —> Google Andrew Ng —> Baidu Joshua Benjio —> U of Montreal
- 72. References Tutorial — http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html — http://deeplearning.stanford.edu/tutorial Paper — Deep learning: Method and Applications (Deng & Yu, 2013) — Representation learning: A review and new perspective (Benjio et al., 2014) — Learning deep architecture for AI (Benjio, 2009) Slide & Video — http://www.cs.toronto.edu/~ﬂeet/courses/cifarSchool09/slidesBengio.pdf — http://nlp.stanford.edu/courses/NAACL2013 Book — Deep learning (Benjio et al. 2015) http://www.iro.umontreal.ca/~bengioy/dlbook/
- 73. Open Sources (from wikipedia) — Torch (Lua) https://github.com/torch/torch7 — Theano (Python) https://github.com/Theano/Theano/ — Deeplearning4j (word2vec for Java) https://github.com/SkymindIO/deeplearning4j — ND4J (Java) http://nd4j.org https://github.com/SkymindIO/nd4j — DeepLearn Toolbox (matlab) https://github.com/rasmusbergpalm/DeepLearnToolbox/ graphs/contributors — convnetjs (javascript) https://github.com/karpathy/convnetjs — Gensim (word2vec for Python) https://github.com/piskvorky/gensim — Caffe (image) http://caffe.berkeleyvision.org — More+++
- 74. FBI WARNING Some reference citations are missed. Many of them come from the ﬁrst tutorial and the ﬁrst paper, and others from googling. All rights of those images captured, albeit not explicitly cited, are reserved to original authors. Free to share, but cautious of that.
- 75. Left to Blank

No public clipboards found for this slide

Login to see the comments