3. Simple tasks: Classification and Detection
http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
Detection task is harder than classification, but both are almost done.
And with better-than-human quality.
4. Case #1: IJCNN 2011
The German Traffic Sign Recognition Benchmark
● Classification, >40 classes
● >50,000 real-life images
● First Superhuman Visual Pattern Recognition
○ 2x better than humans
○ 3x better than the closest artificial competitor
○ 6x better than the best non-neural method
Method Correct (Error)
1 Committee of CNNs 99.46 % (0.54%)
2 Human Performance 98.84 % (1.16%)
3 Multi-Scale CNNs 98.31 % (1.69%)
4 Random Forests 96.14 % (3.86%)
http://people.idsia.ch/~juergen/superhumanpatternrecognition.html
22. Example: NeuralTalk and Walk
Ingredients:
● https://github.com/karpathy/neuraltalk2
Project for learning Multimodal Recurrent Neural Networks that describe
images with sentences
● Webcam/notebook
Result:
● https://vimeo.com/146492001
24. Product of the near future: DenseCap and ?
http://arxiv.org/abs/1511.07571 DenseCap: Fully Convolutional Localization Networks for Dense Captioning
27. Reinforcement Learning
Управление симулированным автомобилем на основе видеосигнала (2013)
http://people.idsia.ch/~juergen/gecco2013torcs.pdf
http://people.idsia.ch/~juergen/compressednetworksearch.html
29. Reinforcement Learning
Human-level control through deep reinforcement learning (2014)
http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html
Playing Atari with Deep Reinforcement Learning (2013)
http://arxiv.org/abs/1312.5602
33. More Fun: Neural Style
http://www.dailymail.co.uk/sciencetech/article-3214634/The-algorithm-learn-copy-artist-Neural-network-recreate-snaps-style-Van-Gogh-Picasso.html
34. More Fun: Neural Style
http://www.boredpanda.com/inceptionism-neural-network-deep-dream-art/
35. More Fun: Photo-realistic Synthesis
http://arxiv.org/abs/1601.04589 Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis
36. More Fun: Neural Doodle
http://arxiv.org/abs/1603.01768 Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks
(a) Original painting by Renoir, (b) semantic annotations,
(c) desired layout, (d) generated output.
38. Deep Learning and NLP
Variety of tasks:
● Finding synonyms
● Fact extraction: people and company names, geography, prices, dates,
product names, …
● Classification: genre and topic detection, positive/negative sentiment
analysis, authorship detection, …
● Machine translation
● Search (written and spoken)
● Question answering
● Dialog systems
● Language modeling, Part of speech recognition
41. Encoding semantics
Using word2vec instead of word indexes allows you to better deal with the word
meanings (e.g. no need to enumerate all synonyms because their vectors are
already close to each other).
But the naive way to work with word2vec vectors still gives you a “bag of words”
model, where phrases “The man killed the tiger” and “The tiger killed the man” are
equal.
Need models which pay attention to the word ordering: paragraph2vec, sentence
embeddings (using RNN/LSTM), even World2Vec (LeCunn @CVPR2015).
47. Case: Automated Speech Translation
Translating voice calls and video calls in 7 languages and instant messages in over 50.
https://www.skype.com/en/features/skype-translator/
55. Why Deep Learning is helpful? Or even a game-changer
● Works on raw data (pixels, sound, text or chars), no need to feature
engineering
○ Some features are really hard to develop (requires years of work for
group of experts)
○ Some features are patented (i.e. SIFT, SURF for images)
● Allows end-to-end learning (pixels-to-category, sound to sentence, English
sentence to Chinese sentence, etc)
○ No need to do segmentation, etc. (a lot of manual labor)
⇒ You can iterate faster (and get superior quality at the same time!)
56. Still some issues exist
● No dataset -- no deep learning
There are a lot of data available (and it’s required for deep learning,
otherwise simple models could be better)
○ But sometimes you have no dataset…
■ Nonetheless some hacks available: Transfer learning, Data
augmentation, Mechanical Turk, …
● Requires a lot of computations.
No cluster or GPU machines -- much more time required
59. Libraries & Frameworks for image/video processing
● OpenCV (http://opencv.org/)
● Caffe (http://caffe.berkeleyvision.org/)
● Torch7 (http://torch.ch/)
● clarifai (http://clarif.ai/)
● Google Vision API (https://cloud.google.com/vision/)
● …
● + all universal libraries
60. Libraries & Frameworks for speech
● CNTK (http://www.cntk.ai/)
● KALDI (http://kaldi-asr.org/)
● Google Speech API (https://cloud.google.com/)
● Yandex SpeechKit (https://tech.yandex.ru/speechkit/)
● Baidu Speech API (http://www.baidu.com/)
● wit.ai (https://wit.ai/)
● …
61. Libraries & Frameworks for text processing
● Torch7 (http://torch.ch/)
● Theano/Keras/…
● TensorFlow (https://www.tensorflow.org/)
● MetaMind (https://www.metamind.io/)
● Google Translate API (https://cloud.google.com/translate/)
● …
● + all universal libraries
62. What to read and where to study?
- CS231n: Convolutional Neural Networks for Visual Recognition, Fei-Fei
Li, Andrej Karpathy, Stanford (http://vision.stanford.
edu/teaching/cs231n/index.html)
- CS224d: Deep Learning for Natural Language Processing, Richard
Socher, Stanford (http://cs224d.stanford.edu/index.html)
- Neural Networks for Machine Learning, Geoffrey Hinton (https://www.
coursera.org/course/neuralnets)
- Computer Vision course collection
(http://eclass.cc/courselists/111_computer_vision_and_navigation)
- Deep learning course collection
(http://eclass.cc/courselists/117_deep_learning)
- Book “Deep Learning”, Ian Goodfellow, Yoshua Bengio and Aaron Courville
(http://www.deeplearningbook.org/)
63. What to read and where to study?
- Google+ Deep Learning community (https://plus.google.
com/communities/112866381580457264725)
- VK Deep Learning community (http://vk.com/deeplearning)
- Quora (https://www.quora.com/topic/Deep-Learning)
- FB Deep Learning Moscow (https://www.facebook.
com/groups/1505369016451458/)
- Twitter Deep Learning Hub (https://twitter.com/DeepLearningHub)
- NVidia blog (https://devblogs.nvidia.com/parallelforall/tag/deep-learning/)
- IEEE Spectrum blog (http://spectrum.ieee.org/blog/cars-that-think)
- http://deeplearning.net/
- Arxiv Sanity Preserver http://www.arxiv-sanity.com/
- ...
64. Whom to follow?
- Jürgen Schmidhuber (http://people.idsia.ch/~juergen/)
- Geoffrey E. Hinton (http://www.cs.toronto.edu/~hinton/)
- Google DeepMind (http://deepmind.com/)
- Yann LeCun (http://yann.lecun.com, https://www.facebook.com/yann.lecun)
- Yoshua Bengio (http://www.iro.umontreal.ca/~bengioy, https://www.quora.
com/profile/Yoshua-Bengio)
- Andrej Karpathy (http://karpathy.github.io/)
- Andrew Ng (http://www.andrewng.org/)
- ...