Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용

10,778 views

Published on

최근 IT업계에서는 딥러닝에 대한 큰 활약이 이슈화 되고 있습니다.

이와 같은 트렌드에 따라가고 싶지만 선형대수 등의 수학적 배경지식 습득부터 시작하여, 딥러닝의 원리와 주요 개념들을 이해 후에 실험을 시도하기에는 많은 시간과 노력이 필요합니다.

그러나 기존의 유용한 딥러닝 오픈소스를 활용한다면 어렵지 않게 딥러닝을 맛볼 수 있습니다.

본 발표는 수학적인 설명을 최대한 배제하고, 오픈소스 툴인 theano, pylearn2를 활용한 예제에 대해 설명하려고 합니다. 추가로 필요할 코드들도 소개하려고 합니다.

그리고 word2vec 를 활용하여, 자연어 처리에 딥러닝을 적용하는 사례를 다루려고 합니다.

주제가 학문적이고 이론적이기 때문에 발표가 부담되지만, 최대한 개념적으로 설명하여 실험을 쉽게 따라 할 수 있도록 돕고자 합니다.

오픈소스 툴들의 문서화가 잘 되어있지만, 저 또한 처음 접했을 때는 어려움이 있었기 때문에 딥러닝을 시작해보려는 분들에게 도움이 될 듯합니다.

컴퓨터가 딥러닝하는 동안 틈틈이 이론공부 하시면 좋겠네요.

Published in: Technology
  • Follow the link, new dating source: ❶❶❶ http://bit.ly/369VOVb ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ❶❶❶ http://bit.ly/369VOVb ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용

  1. 1. 오늘 당장 딥러닝 실험하기 2015. 08. 30. 김현호 1
  2. 2. 소개 김현호 - UST 컴퓨터전공 - 한국전자통신연구원 자동통역연구실 - Team Popong mobile담 당 - 인공지능, 기계학습, 자연 어 처리 - stray.leone@gmail.com 2
  3. 3. 순서 1.Neural Network 이해 2.Deep Neural Network a.Pretraining b.Rectified Linear Unit c.Drop out 3.Theano library 4.Deep Learning code using Theano 5.Deep Learning for Natural Language Processing a.Gensim library b.automatic word spacing by Recurrent Neural Network 3
  4. 4. 순서 1.Neural Network 이해 2.Deep Neural Network a.Pretraining b.Rectified Linear Unit c.Drop out 3.Theano library 4.Deep Learning code using Theano 5.Deep Learning for Natural Language Processing a.Gensim library b.automatic word spacing by Recurrent Neural Network 4
  5. 5. 순서 1.Neural Network 이해 2.Deep Neural Network a.Pretraining b.Rectified Linear Unit c.Drop out 3.Theano library 4.Deep Learning code using Theano 5.Deep Learning for Natural Language Processing a.Gensim library b.automatic word spacing by Recurrent Neural Network 5
  6. 6. 순서 1.Neural Network 이해 2.Deep Neural Network a.Pretraining b.Rectified Linear Unit c.Drop out 3.Theano library 4.Deep Learning code using Theano 5.Deep Learning for Natural Language Processing a.Gensim library b.automatic word spacing by Recurrent Neural Network 6
  7. 7. 순서 1.Neural Network 이해 2.Deep Neural Network a.Pretraining b.Rectified Linear Unit c.Drop out 3.Theano library 4.Deep Learning code using Theano 5.Deep Learning for Natural Language Processing a.Gensim library b.automatic word spacing by RNN 7
  8. 8. 요즘 딥러닝에 대한 관심 8
  9. 9. 다수의 딥러닝 강연 9
  10. 10. 10
  11. 11. 11
  12. 12. 12
  13. 13. 13
  14. 14. 14
  15. 15. Artificial Neural Network 15
  16. 16. Artificial Neural Network 인간 신경망의 구성요소인 뉴런의 동작방식이 모티브가 된 기계학습 시스템. 16
  17. 17. 실제 뉴런 vs 인공 뉴런 17
  18. 18. 실제 뉴런 vs 인공 뉴런 18
  19. 19. 실제 뉴런 vs 인공 뉴런 19
  20. 20. 실제 뉴런 vs 인공 뉴런 20
  21. 21. 실제 뉴런 vs 인공 뉴런 신호 전달 방향 21
  22. 22. 실제 뉴런 vs 인공 뉴런 신호 전달 방향 22
  23. 23. 실제 뉴런 vs 인공 뉴런 신호 전달 방향 Weigh t 23
  24. 24. Artificial Neural Network 24
  25. 25. Artificial Neural Network 25
  26. 26. Artificial Neural Network 26
  27. 27. Artificial Neural Network Learning 27
  28. 28. Artificial Neural Network Learning 28 Weight Weight Weight
  29. 29. Forward Propagation 29
  30. 30. Backward Propagation 30
  31. 31. Deep Neural Network 31
  32. 32. 32 Deep Neural Network란….
  33. 33. Deep Neural Network란…. 3층 이상의 hidden layer를 가진 Artificial Neural Network 33
  34. 34. 기존 Deep Learning의 어려움 34
  35. 35. 기존 Deep Learning의 어려움 35 deeper than two or three level networks yieled poorer results
  36. 36. Deep Learning이 어려운 이유 - Overfitting - Deep nets have lots of parameters - Underfitting - Gradient descent Vanishing 36
  37. 37. Deep Learning의 비약적 발전 - Pretraining - Drop Out - Rectified Linear Unit 37
  38. 38. Pretraining 성능 38 “Why Does Unsupervised Pre-training Help Deep Learning?” 2010 bengio, - pretraining initialization은 random initialize보다 better local minimum 에 서 시작한다.
  39. 39. 39 Pretraining 성능 Without Pretraining With Pretraining
  40. 40. Pretraining방법 1)Contrastive Divergence a) http://www.quora.com/What-is-contrastive-divergence b) https://www.youtube.com/watch?v=p4Vh_zMw-HQ&index=36&list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH 2)AutoEncoder 40
  41. 41. Drop out 41
  42. 42. Rectified Linear Unit (ReLU) 42 Activation function
  43. 43. Rectified Linear Unit (ReLU) 43 Activation function Sigmoid function Rectified Linear Unit
  44. 44. Rectified Linear Unit (ReLU) 44
  45. 45. Epoch sigmoid ReLU 1 0.7053 0.9433 2 0.8302 0.9647 3 0.8684 0.9723 3 0.8837 0.9737 4 0.89 0.9763 5 0.895 0.9792 ... .... ... ... ... ... 11 0.9116 0.9829 12 0.9127 0.9838 13 0.9142 0.9821 14 0.9152 0.9838 15 0.9159 0.9832 Rectified Linear Unit (ReLU)실험결과 실험 조건 - code : https://github.com/Newmu/The ano-Tutorials - data : mnist 45
  46. 46. Data Sets 46
  47. 47. MNIST 47
  48. 48. Cifar-10 48
  49. 49. Data Sets - MNIST - The MNIST database of handwritten digits - 28x28 grayscale images - 10 classes - Cifar10 - The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. - word2vec 49
  50. 50. Deep Learning 실험 50
  51. 51. Deep Learning 실험 시작 51
  52. 52. Theano 어원 - 여성 수학자 - 피타고라스의 아내 52
  53. 53. Deep Learning Library 비교 53 출처 : http://t-robotics.blogspot.kr/2015/06/hw-sw.html#.Vd59KPntlBe
  54. 54. Theano - Q) DNN을 자동으로 만들어 주나요?? - A) 아니요, Deep Neural Network를 직접 만들어야 함… 54
  55. 55. Theano 55 - DNN model learning library (x) - matrix 연산 등에 유용한 library (o)
  56. 56. Why Theano - Definition - Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi- dimensional arrays efficiently. (http://deeplearning.net/software/theano/) - Optimizing GPU-meta-programming code generating array oriented optimizing math compiler in Python (https://github.com/josephmisiti/awesome-machine-learning) 56
  57. 57. Why Theano - cuda code 작성하지 않고, python code로 gpu 연산 수행 - grad(), updates, function() - symbolic function 57
  58. 58. Why Theano - grad(), updates, function() gradients = T.grad() 하면 직접 gradient가 계산 된다. ex) x = T.scalar() gx = T.grad(x**2, x) ← x**2를 x에 대해서 gradient 값을 구한다. (= 2x) 58
  59. 59. Why Theano - grad(), updates, function() 300 updates = [ 301 (param, param - learning_rate * gparam) 302 for param, gparam in zip(classifier.params, gparams) 303 ] …….. 308 train_model = theano.function( 309 inputs=[index], 310 outputs=cost, 311 updates=updates, 312 givens={ 313 x: train_set_x[index * batch_size: (index + 1) * batch_size], 314 y: train_set_y[index * batch_size: (index + 1) * batch_size] 315 } 316 ) 59
  60. 60. Why Theano - grad(), updates, function() 60 This module provides function(), commonly accessed as theano.function, the interface for compiling graphs into callable objects. You’ve already seen example usage in the basic tutorial... something like this: >>> x = theano.tensor.dscalar() >>> f = theano.function([x], 2*x) >>> print f(4) # prints 8.0 http://deeplearning.net/software/theano/library/compile/function.html
  61. 61. Why Theano - grad(), updates, function() 61 This module provides function(), commonly accessed as theano.function, the interface for compiling graphs into callable objects. You’ve already seen example usage in the basic tutorial... something like this: >>> x = theano.tensor.dscalar() >>> f = theano.function([x], 2*x) >>> print f(4) # prints 8.0 http://deeplearning.net/software/theano/library/compile/function.html input output
  62. 62. Why Theano - grad(), updates, function() 62 This module provides function(), commonly accessed as theano.function, the interface for compiling graphs into callable objects. You’ve already seen example usage in the basic tutorial... something like this: >>> x = theano.tensor.dscalar() >>> f = theano.function([x], 2*x) >>> print f(4) # prints 8.0 http://deeplearning.net/software/theano/library/compile/function.html input output
  63. 63. Why Theano - grad(), updates, function() x = dmatrix('x') y = dmatrix('y') z = x + y f = theano.function([x,y], z) scalarscalar scalar
  64. 64. Why Theano - grad(), updates, function() x = dmatrix('x') y = dmatrix('y') z = x + y f = theano.function([x,y], z) Theano represents symbolic mathematical computations as graphs scalarscalar scalar
  65. 65. Why Theano - grad(), updates, function() x = theano.tensor.dscalar('x') y = theano.tensor.dscalar('y') z = x + y f = theano.function([x,y], z) print f(4,3) array(7.0) scalarscalar scalar
  66. 66. Install Theano - Environment : ubuntu 14.04 64bit - Install document : http://deeplearning.net/software/theano/install_ubun tu.html#install-ubuntu 66 $ sudo apt-get install python-numpy python- scipy python-dev python-pip python-nose g++ libopenblas-dev git $ sudo pip install Theano
  67. 67. Download Tutorial code $ git clone https://github.com/lisa-lab/DeepLearningTutorials.git Cloning into 'DeepLearningTutorials'... remote: Counting objects: 3652, done. remote: Total 3652 (delta 0), reused 0 (delta 0), pack-reused 3652 Receiving objects: 100% (3652/3652), 7.79 MiB | 2.32 MiB/s, done. Resolving deltas: 100% (2161/2161), done. Checking connectivity... done. $ ls DeepLearningTutorials 67
  68. 68. Run DBN DeepLearningTutorials$ cd code DeepLearningTutorials/code$ python DBN.py Using gpu device 0: GeForce GTX 770 Downloading data from http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz ... loading data ... building the model ... getting the pretraining functions ... pre-training the model Pre-training layer 0, epoch 0, cost -98.5296 Pre-training layer 0, epoch 1, cost -83.842 Pre-training layer 0, epoch 2, cost -80.688 Pre-training layer 0, epoch 3, cost -79.0362 Pre-training layer 0, epoch 4, cost -77.9295 68
  69. 69. DBN.py 13 from logistic_sgd import LogisticRegression, load_data 303 datasets = load_data(dataset) 304 305 train_set_x, train_set_y = datasets[0] 306 valid_set_x, valid_set_y = datasets[1] 307 test_set_x, test_set_y = datasets[2] 69
  70. 70. DBN.py 18 # start-snippet-1 19 class DBN(object): ……. 314 print '... building the model' 315 # construct the Deep Belief Network 316 dbn = DBN(numpy_rng=numpy_rng, n_ins=28 * 28, 317 hidden_layers_sizes=[1000, 1000, 1000], 318 n_outs=10) 70 28 * 28 . . . . . . . . . . . . . . . . . . . . . . . . . . 10
  71. 71. DBN.py 71 325 pretraining_fns = dbn.pretraining_functions(train_set_x=train_set_x, 326 batch_size=batch_size, 327 k=k) …………… 353 print '... getting the finetuning functions' 354 train_fn, validate_model, test_model = dbn.build_finetune_functions( 355 datasets=datasets, 356 batch_size=batch_size, 357 learning_rate=finetune_lr 358 )
  72. 72. DBN.py 72 228 train_fn = theano.function( 229 inputs=[index], 230 outputs=self.finetune_cost, 231 updates=updates, 232 givens={ 233 self.x: train_set_x[ 234 index * batch_size: (index + 1) * batch_size 235 ], 236 self.y: train_set_y[ 237 index * batch_size: (index + 1) * batch_size 238 ] 239 } 240 )
  73. 73. DBN.py 73 380 while (epoch < training_epochs) and (not done_looping): 381 epoch = epoch + 1 382 for minibatch_index in xrange(n_train_batches): 383 384 minibatch_avg_cost = train_fn(minibatch_index) 385 iter = (epoch - 1) * n_train_batches + minibatch_index 386 387 if (iter + 1) % validation_frequency == 0: 388 389 validation_losses = validate_model() 390 this_validation_loss = numpy.mean(validation_losses)
  74. 74. DNN using ReLU import theano from theano import tensor as T from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams import numpy as np from load import mnist 74
  75. 75. DNN using ReLU def floatX(X): return np.asarray(X, dtype=theano.config.floatX) def init_weights(shape): return theano.shared(floatX(np.random.randn(*shape) * 0.01)) def rectify(X): return T.maximum(X, 0.) def softmax(X): e_x = T.exp(X - X.max(axis=1).dimshuffle(0, 'x')) return e_x / e_x.sum(axis=1).dimshuffle(0, 'x') 75
  76. 76. DNN using ReLU def model(X, w_h, w_h2, w_o): h = rectify(T.dot(X, w_h)) h2 = rectify(T.dot(h, w_h2)) py_x = softmax(T.dot(h2, w_o)) return h, h2, py_x def prop(cost, params, lr=0.001): grads = T.grad(cost=cost, wrt=params) updates = [] for p, g in zip(params, grads): updates.append((p, p - lr * g)) return updates 76
  77. 77. trX, teX, trY, teY = mnist(onehot=True) X = T.fmatrix() Y = T.fmatrix() w_h = init_weights((784, 625)) w_h2 = init_weights((625, 625)) w_o = init_weights((625, 10)) 77
  78. 78. trX, teX, trY, teY = mnist(onehot=True) X = T.fmatrix() Y = T.fmatrix() w_h = init_weights((784, 625)) w_h2 = init_weights((625, 625)) w_o = init_weights((625, 10)) h, h2, py_x = model(X, w_h, w_h2, w_o) y_x = T.argmax(py_x, axis=1) 78
  79. 79. trX, teX, trY, teY = mnist(onehot=True) X = T.fmatrix() Y = T.fmatrix() w_h = init_weights((784, 625)) w_h2 = init_weights((625, 625)) w_o = init_weights((625, 10)) h, h2, py_x = model(X, w_h, w_h2, w_o) y_x = T.argmax(py_x, axis=1) cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y)) params = [w_h, w_h2, w_o] updates = prop(cost, params, lr=0.001) 79
  80. 80. trX, teX, trY, teY = mnist(onehot=True) X = T.fmatrix() Y = T.fmatrix() w_h = init_weights((784, 625)) w_h2 = init_weights((625, 625)) w_o = init_weights((625, 10)) h, h2, py_x = model(X, w_h, w_h2, w_o) y_x = T.argmax(py_x, axis=1) cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y)) params = [w_h, w_h2, w_o] updates = prop(cost, params, lr=0.001) train = theano.function(inputs=[X, Y], outputs=cost, updates=updates) predict = theano.function(inputs=[X], outputs=y_x) 80
  81. 81. for i in range(100): for start, end in zip(range(0, len(trX), 128), range(128, len(trX), 128)): cost = train(trX[start:end], trY[start:end]) print np.mean(np.argmax(teY, axis=1) == predict(teX)) 81
  82. 82. play with data 82
  83. 83. load_data() 172 def load_data(dataset): …... 193 if os.path.isfile(new_path) or data_file == 'mnist.pkl.gz': 194 dataset = new_path …... 204 print '... loading data' 205 206 # Load the dataset 207 f = gzip.open(dataset, 'rb') 208 train_set, valid_set, test_set = cPickle.load(f) 209 f.close() 83
  84. 84. data 만들기1 train_set.x.txt 84 input vector length input vector size
  85. 85. data 만들기1 train_set.y.txt 85 input vector size
  86. 86. data 만들기1 86 from numpy import genfromtxt import gzip, cPickle ……………. train_set_x = genfromtxt(dir_path+"train_set.x.txt", delimiter=",") ………………….. train_set = train_set_x, train_set_x valid_set = valid_set_x, valid_set_x test_set = test_set_x, test_set_x print "writing to pkl.gz..." data_set = [train_set, valid_set, test_set] print "zip data into a file" f= gzip.open(output_dir+str(i)+"_"+pkl_filename+".pkl.gz",'wb') print "zip data file name is " + str(i)+"_"+pkl_filename+".pkl.gz" cPickle.dump(data_set,f,protocol=2) f.close()
  87. 87. for n, sentence in enumerate(file_lines): …………………….. data_batch_fpath= vector_dir+"data_batch_"+str(n)+".npz" ………………………. # save vector list numpy.savez(data_batch_fpath, data=numpy.asarray(sentence_vector_list), labels=label_vector, length=max_length, dim=dimension) 87 data 만들기2
  88. 88. save, load model 88 save model load model
  89. 89. Theano modes 89
  90. 90. Theano modes 90 .bashrc 226 # Theano Settings 227 export THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32,exception_verbosity=high
  91. 91. 91 Deep Learning For Natural Language Processing
  92. 92. 92 Deep Learning For Natural Language Processing
  93. 93. 93 Deep Learning For Natural Language Processing
  94. 94. - 나는 밥을 먹는다 Deep Learning For Natural Language Processing 94 one-hot (1 of K) representation
  95. 95. Deep Learning For Natural Language Processing - 나는 밥을 먹는다 - 나 는 밥 을 먹 는 다. 95 형태소 단위로 분리 one-hot (1 of K) representation
  96. 96. Deep Learning For Natural Language Processing - 나는 밥을 먹는다 - 나 는 밥 을 먹 는 다 - 밥 = [0,0,0,0,0,0,0,………,0,0,0,0,1,0,0,0,0,0,0] 96 index 0(나) 1(가) 2(는) ... ... ... ... 999(.) 나 1 0 0 0 0 0 0 0 는 0 0 1 0 0 0 0 0 .. 0 0 0 0 0 1 0 0 .. 0 0 0 0 1 0 0 0 다 0 0 0 0 0 0 1 0 형태소 단위로 분리 one-hot (1 of K) representation 문자의 벡터로 표현
  97. 97. Deep Learning For Natural Language Processing - 나는 밥을 먹는다 - 나 는 밥 을 먹 는 다 97 형태소 단위로 분리 word2vec representation 문자의 벡터로 표현
  98. 98. Deep Learning For Natural Language Processing - 나는 밥을 먹는다 - 나 는 밥 을 먹 는 다 - Word2Vec model - 밥 = [0.323112, -0.021232, …….. , 0.82123123] 98 형태소 단위로 분리 word2vec representation 문자의 벡터로 표현
  99. 99. Deep Learning For Natural Language Processing - 밥 = [0,0,0,0,0,0,0,………,0,0,0,0,1,0,0,0,0,0,0] - 밥 = [0.323112, -0.021232, …….. , 0.82123123] 99 word2vec representation one-hot (1 of K) representation
  100. 100. Gensim - definition - Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora - word2vec class - word vector representation - multi threading - Skip Gram - Continuous Bag of Words 100
  101. 101. Gensim - import, settings 101 # imports 9 from gensim.models.word2vec import LineSentence 10 from gensim.models import word2vec 32 # settings 33 THEADS = 8 # progress with multi threading 34 DIMENSION = 50 35 SKIPGRAM = 1 # 1 is skip gram, 0 is cbow 36 WINDOW_SIZE = 8 37 NTimes = 10 # repeat number of sentences 38 min_count_of_word = 5 ……………….. 65 from gensim import utils
  102. 102. Gensim - training, save model 102 97 # load raw sentence 98 sentences = LineSentence(input_train_file_path) 99 # model settings 100 model = word2vec.Word2Vec(size=dimension, workers=THEADS, min_count=min_count_of_word, sg=SKIPGRAM, window=WINDOW_SIZE) 101 102 # build voca and train 103 number_iter = NTimes # number of iterations (epochs) over the corpus 104 model.build_vocab(sentences) 105 106 ss = utils.RepeatCorpusNTimes(sentences, number_iter) 107 model.train(ss) 108 # save model 109 model.save(model_file_name) 110 model.save_word2vec_format(model_file_name + '.bin', binary=True)
  103. 103. Gensim - load model, test 103 83 try: 84 model = utils.SaveLoad.load(fname=model_file_name) 85 except: 86 print "failed to load. Retrying by load_word2vec_format() !!" 87 model =word2vec.load_word2vec_format(fname=model_file_name+".bin") 297 x = model [w.decode('utf-8')] 314 mw, score = model.most_similar(positive=[x])[0] 315 print "most similar : ",mw 316 print "target vector :", x
  104. 104. ‘서울’의 most similar words 104 most similar words similarity 대구 0.4282917082309723 광주 0.4046330451965332 부산 0.40132588148117065 울산 0.3863871693611145 수원 0.38555505871772766 청주 0.35919708013534546 안양 0.35622960329055786 주왕산 0.3543151617050171 평택 0.3505415618419647 cebu 0.34598737955093384
  105. 105. Auto word spacing with Recurrent Neural Network 105 - 0 0 1 0 1 0 0 - 나는 밥을 먹는다 - [0.323112, -0.021232, …….. , 0.82123123]
  106. 106. Deep Learning 실험하면서 어려웠던 것들 - layer의 개수, layer당 node의 개수, learning rate, epoch횟수, batch횟수, activation function 선택 등 선택해야할 parameter들이 많다. - parameter 바꿔서 실험결과를 확인하는 데에 오래 걸린다. - big data이기때문에 gpu memory 문제 106
  107. 107. Thank you 107 Setting GPU building lmdb for caffe Softmax functionBias Negative Log Likelihood http://goo.gl/forms/IR45liXoQ3

×