3. The concept of learning
in a ML system
• Learning = Improving with experience at some task
• Improve over task T,
• With respect to performance measure, P
• Based on experience, E.
Deep learning
CNN, RNN, LSTM ...
Machine learning
NN, SVM, DT ...
A.I.
3
12. Supervised learning
• These're training data set and already know what correct
output.
• The regression problem:
Predicting results within a continuous output
• The classification problem:
Predicting results in a discrete output
12
13. Application
Input (X) Output (X) Application
House size Prices estate
AD types, User info. Click on AD Online Advertising
Image Object (1,…,1000) Photo tagging
Audio Text transcript Speech recognition
Model
standard
NN
CNN
RNN
English Chinese Machine translation
Image, Radar info Position of the cars Autonomous driving
Customized
hybrid
13
14. Unsupervised learning
• The data have no target attribute.
• Analyze data, look for patterns and clustering
14
15. Reinforcement learning
• The agent take actions in an environment
so as to maximize some notion of cumulative reward.
15
16. The workflow
for Supervised learning
Feature
Extraction
Train
the model
Eval
the model
Feature
Extraction Predict
Model
Label
Label
Model
Data
New data
• Training phase
• predicting phase
16
17. How to train a model
• Training data set.
• The layers and neurons
• Hypothesis / Activation function
• Cost / Loss Function
• Optimization algorithm
17
30. Find the best weights to
minimize the loss
100
0.12
30
31. Optimization algorithm
Gradient Descent:
A iterative optimization algorithm for finding the minimum of a function
•
* one epoch = one pass of all the training examples
31
38. Mini-Batch optimization
• Mini-batch optimization has the following advantages.
• Reduce the memory usage.
• Avoid being trapped in the local minima with the random m
*Batch size = the number of training examples in one pass
Iterations = number of passes, each pass using [batch size] of examples
38
67. Alexnet
• A large, deep convolutional neural network (8 layers) to classify in the
training set into the 1000 different classes.
• On the test data, It achieved top-1 and top-5 error rates of 39.7% and
18.9%
Convolutional layers Fully-connected
CONV Layers: 5
Fully Connected Layers: 3
Weights: 61M
MACs: 724M
67
68. Alexnet
• Trained the network with 2 GPUs on ImageNet data, which contained
over 1.2 million annotated images from a total of over 1000 categories.
• Used ReLU for the nonlinearity functions (Found to decrease training
time as ReLUs are several times faster than the conventional tanh
function).
• Used data augmentation techniques that consisted of image
translations, horizontal reflections, and patch extractions.
• Implemented dropout layers in order to combat the problem of
overfitting to the training data.
• Trained the model using batch stochastic gradient descent, with specific
values for momentum and weight decay.
68
69. GPU & Big data
• Trained on two GTX 580 GPUs for five to six days.
69
70. Data augmentation
• It consisted of image translations, horizontal reflections,
and patch extractions.
70
72. Relu function
• The nonlinearity functions that be found to decrease
training time as ReLUs are several times faster than the
conventional tanh function
Relu
tanh
72
73. Polling
• Reduce resolution of each channel independently
• Increase translation-invariance and noise-resilience
73
79. Resource
• Deep learningon on Coursera, Andrew Ng, Stanford University
https://www.coursera.org/specializations/deep-learning
• Deep Learning on MOOC
https://www.udacity.com/course/deep-learning--ud730
• Machine Learning Foundations, HT Lin, National Taiwan University
https://www.coursera.org/learn/ntumlone-mathematicalfoundations/
• TensorFlow
https://www.tensorflow.org/
• cnn-benchmarks
https://github.com/jcjohnson/cnn-benchmarks