2. Index
• Layer
– Data
– ImageData
– Convolution
– Pooling
– ReLU
– InnerProduct
– LRN
• Net
– Mnist
– CIFAR-10
– ImageNet (Ilsvrc12)
• Net change Test
– Mnist
– CIFAR-10
• 64x64x3 Image Folder
• 64x64x3 Image Resize To 32x32x3
3. Data Layer
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_file:
"examples/cifar10/mean.binaryproto"
}
data_param {
source:
"examples/cifar10/ilsvrc12_train_lmdb"
batch_size: 100
backend: LMDB
}
}
• Required
– source: the name of the directory
containing the database
– batch_size: the number of inputs to
process at one time
• Optional
– backend [default LEVELDB]: choose
whether to use a LEVELDB or LMDB
LevelDB ,LMDB : efficient databases
• Example
– Data.mdb(41MB)
– Lock.mdb(8.2kB
4. ImageData Layer
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
transform_param {
mirror: false
crop_size: 227
mean_file:
"data/ilsvrc12/imagenet_mean.binaryproto"
}
image_data_param {
source: "examples/_temp/file_list.txt"
batch_size: 50
new_height: 256
new_width: 256
}
}
• Required
– source: name of a text file, with each
line giving an image filename and label
– batch_size: number of images to batch
together
5. Convolution Layer
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1“
param { lr_mult: 1 } # learning rate for the filters
param { lr_mult: 2 } # learning rate for the biases
convolution_param {
num_output: 32 # learn 32 filters
kernel_size: 5 # each filter is 5x5
pad: 2
stride: 1 # step 1 pixels between each filter application
weight_filler {
type: "gaussian“# initialize the filters from a Gaussian
std: 0.0001 # default mean: 0
}
# initialize the biases to zero (0)
bias_filler { type: "constant" }
}
}
• Required
– num_output (c_o): the number of
filters
– kernel_size (or kernel_h and kern
el_w): specifies height and width
of each filter
• Strongly Recommended
– weight_filler [default type:
'constant' value: 0]
• Optional
– pad [default 0]: specifies the
number of pixels to (implicitly)
add to each side of the input
– stride [default 1]: specifies the
intervals at which to apply the
filters to the input
6. Pooling Layer
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
• Required
– kernel_size : specifies height and width of
each filter
• Optional
– pool [default MAX]: the pooling method.
Currently MAX, AVE, or STOCHASTIC
– pad [default 0]: specifies the number of
pixels to (implicitly) add to each side of
the input
– stride [default 1]: specifies the intervals at
which to apply the filters to the input
• 예) stride 2 : step two pixels (in the
bottom blob) between pooling regions
7. ReLU Layer (Rectified-Linear and Leaky-ReLU)
Rectified 정류된, Leaky-ReLU 새는, 구멍이 난
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1" }
• Parameters optional
• negative_slope [default 0]:
– specifies whether to leak the
negative part by multiplying it
with the slope value rather than
setting it to 0.
• Input x, Compute Output y
• y = x (if x > 0)
• y = x * negative_slope (if x <= 0)
8. Why ReLU, Drop-Out!
• Drop-Out
– 2014. Toronto. paper title
• Dropout : A Simple Way to Prevent Neural Networks from Overfitting
• The key idea is to randomly drop units (along with their connections) from the
neural network during training
– Regularizer 의 일종
– Hidden Node를 모두 훈련시키지 않고, Random하게 Drop Out시킨다
• 관련된 Weight들은 훈련되지 않는다.
9. Drop-Out
• … using ReLUs trained with dropout during frame level training provide an 4.2% relative
improvement over a DNN trained with sigmoid units…
10. Rectified-Linear unit(ReLU)
• Drop-Out 은 학습이 느리다.
– Drop Out된 weight들은 학습이 일어나지 않는다.
• Non-Linear Activation Function의 교체
– 일반적으로 사용되는 Logistic Sigmoid, tanh 대신 ReLu 사용
• ReLU의 장점
– reduced likelihood of the gradient to vanish
– Sparsity
12. InnerProduct Layer
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool3"
top: "ip1"
param { lr_mult: 1 } # learning rate for the filters
param { lr_mult: 2 } # learning rate for the biases
inner_product_param {
num_output: 64
weight_filler {
type: "gaussian"
std: 0.1
}
bias_filler {type: "constant“ }
# initialize the biases to zero (0)
}
}
• Required
– num_output (c_o): the number
of filters
• Input
– n * 컬러채널 * height * width
• Output
– n * c_o
13. LRN Layer (Local Response Normalization)
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
• performs a kind of “lateral
inhibition - 측면억제(側面抑制)” by
normalizing over local input
regions
• Each input value is divided
by (1+(α/n)∑ix2i)β,
– where n is the size of each local region,
and the sum is taken over the region
centered at that value
• Optional
– local_size [default 5]: the number of
channels to sum over (for cross channel
LRN) or the side length of the square
region to sum over (for within channel
LRN)
– alpha [default 1]: the scaling parameter
– beta [default 5]: the exponent
14. CIFAR-10 (2010, Hinton/Krizhevsky)
• 10 classes
• 32x32 color image
• 6,000 images per class
• 5,000 training / 1,000
test images per class
• 전체 60,000장 =
(5,000+1,000) 개 * 10개
클래스
15. Ilsvrc12
(ImageNet large Scale Visual Recognition Challenge 2012)
• AlexNet
• 1.3 million high-resolution images
– Resize to 224x224x3
• Class 1,000
• 150,000 per class
• Net
– five convolutional layers
– two globally connected layers
– final 1000-way softmax.
16. • Caffe/examples/cifar10 모델 사용
• 응 교수의 matlab homework
– 64x64x3 image
– 4 클래스(비행기, 자동차,고양이, 개)
– Train(클래스당 500개), Test(클래스당 500개)
• 준비
– Resize 64x64x3 -> 32x32x3
– Mean data
24. ImageNet 사용법
• 보조 데이터 다운로드
– $ ./data/ilsvrc12/get_ilsvrc_aux.sh
• backup folder : caffe/examples/imagenet
• 수정 Create_imagenet.sh
– RESIZE=false -> RESIZE=true
• Create the leveldbs with
– $ ./examples/imagenet/create_imagenet.sh
• Create mean data
– $ ./examples/imagenet/make_imagenet_mean.sh
– 생성 : data/ilsvrc12/imagenet_mean.binaryproto
• Train Start
– $ ./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt
25. ImageNet Data Prepare
• Requirement
– Save Image Files
• /path/to/imagenet/train/n01440764/n01440764_10026.JPEG
• /path/to/imagenet/val/ILSVRC2012_val_00000001.JPEG
– Describe Folder
• train.txt
– n01440764/n01440764_10026.JPEG 0
– n01440764/n01440764_10027.JPEG 1
• val.txt
– ILSVRC2012_val_00004614.JPEG 954
– ILSVRC2012_val_00004615.JPEG 211
• Let’s Make lmdb
– $ ./examples/imagenet/create_imagenet.sh
• ilsvrc12_train_lmdb for Train
• ilsvrc12_val_lmdb for validate
26. Mnist Model net 수정 실험
• Iteration 100
• Basic model loss:0.34
• Remove convoution1, pooling1,
– loss:0.411
• Remove ReLU1
– loss:0.426
• Remove InnerProduct1
– loss:0.450
• Remove pooling2
– loss:0.522
• Remove convolution2
– loss:0.753
Convolution1
Pooling1
Convolution2
Pooling2
ReLU1
InnerProduct1
InnerProduct2
Accuracy
Data
SoftmaxWithLoss
27. Test
• Classify (Airplane, car, cat, dog)
– 이미지 사이즈 64x64x3
• Model
– Matlab cnn net
• softMax Layer에서만 학습 Iteratoin = 1,000
• 이미지 사이즈 조절 없음
– cifar10 net
• Iteration 5000
• 이미지 사이즈 64x64x3 -> 32x32x3 으로 조절해서 DB 생성
– Cifar10 net
• Iteration 5000
• 사이즈 조절 없이 DB 생성
– Labellio
• deep learning web service
28. Prepare 1/2. Make .mdb
• create_imagenet.sh
RESIZE=true
if $RESIZE; then
RESIZE_HEIGHT=32
RESIZE_WIDTH=32
else
RESIZE_HEIGHT=0
RESIZE_WIDTH=0
Fi
GLOG_logtostderr=1
$TOOLS/convert_imageset
--resize_height=$RESIZE_HEIGHT
--resize_width=$RESIZE_WIDTH
--shuffle
$TRAIN_DATA_ROOT
$DATA/train.txt
$EXAMPLE/ilsvrc12_train_lmdb
• 1. 64x64x3 이미지들을 폴더에 저장하고
• 2. train.txt 파일에 이미지 경로와 라벨을
모두 적는다.
• 3. lmdb 데이터 베이스를 만든다.
• lmdb 데이터 베이스를 만들 때 사이즈를
조절 할 수 있다.
– 사이즈를 조절 시 RESIZE=true
– 사이즈를 조절 NO RESIZE=false
– 사이즈를 조절 시 Data.mdb(32.9MB)
– 사이즈를 조절 NO Data.mdb(8.3MB)
– Lock.mbd 크기는 항상 고정 이다 8.2 kB
29. Prepare 2/2. Make mean data
• make_imagenet_mean.sh
EXAMPLE=examples/imagenet
DATA=data/ilsvrc12
TOOLS=build/tools
$TOOLS/compute_image_mean
$EXAMPLE/ilsvrc12_train_lmdb
$DATA/mean.binaryproto
30. Result
• Matlab cnn net
– softMax Layer에서만 학습 Iteratoin = 1,000
– 이미지 사이즈 조절 없음
– Testing accuracy = 0.8
• cifar10 net
– 이미지 사이즈 64x64x3 -> 32x32x3 으로 조절 해서 db생성
– Iteration 5,000, loss = 0.00059
– Testing accuracy = 0.755, loss=1.58
– Overfitting!
• Cifar10 net
– Iteration 5000, traning loss = 0.00026
– Test accuracy = 0.724, loss=1.94
– worse overfitting!
• Labellio
– Training accuracy = 0.66
– Test accuracy = ?
35. 참고
• Face Feature Recognition System with
Deep Belief Networks, for Korean/KIISE
Thesis
– http://www.slideshare.net/sogo1127/face-
feature-recognition-system-with-deep-belief-
networks-for-korean
• Labellio
– http://devblogs.nvidia.com/parallelforall/label
lio-scalable-cloud-architecture-efficient-
multi-gpu-deep-learning/