Deep Learning has taken the world of Computer Science by storm yet for many of us it remains an elusive sci-fi-like buzzword. After years of feature engineering in Computer Vision and Natural Language Processing, we have finally come to the point where, we can feed raw data to a Neural Network, similar to how our brains work, and expect results that can surprise us in their high accuracy.
This talk is about de-mystifying Deep Learning for developers many of whom could benefit from understanding and using Deep Learning in their day-to-day job. It covers the background and brief theoretical grounds in the first third but shows actual working code and examples in the rest. We will overview convolutional Neural Networks and then cover network design techniques such as pooling, dropout and local connections.
The examples of this talk are in Keras and aimed to build real-world models in the field of Natural Language Processing.
5. @aliostad
/// Agenda
> Building blocks of DL throughout history
> State of the art
> Programming Language detection using DL
from scratch using Keras/TensorFlow (achieving
98.7% accuracy)
6. @aliostad
/// Not covering
Deep Dream (reverse input)
GAN (General Adverserial Networks)
Monte Carlo Tree Search (MCTS)
LSTM
RNN
12. @aliostad
/// Pitts
“In no uncertain sense [he
was] the genius of the
group … when you asked
him a question, you would
get back a whole textbook”
Jerome Lettvin on Pitts
27. @aliostad
/// 2nd Winter: Curse of Dimentionality
from Christopher Bishop’s
“Neural Networks for Pattern Recognition” (1995)
28. @aliostad
/// 2nd Winter: Feature extraction
from Christopher Bishop’s
“Neural Networks for Pattern Recognition” (1995)
29. @aliostad
/// 2nd Winter: Generalisation/Overfitting
from Christopher Bishop’s
“Neural Networks for Pattern Recognition” (1995)
30. @aliostad
/// Re-Cap problems
> Overfitting (deep networks would overfit)
> Trainings never converged (computation, weight decay)
> Human interaction still needed in feature engineering
> Curse of dimensionality (complexity, sparse dimensions)
> Support Vector Machine (SVM) much more practical
31. @aliostad
/// the new rebels
Yoshua Bengio
Yann LeCun,
Facebook
Geoff Hinton,
Google
Andrew Ng
Baidu (Google)
Canadian Mafia (CIFAR)
32. @aliostad
/// outcasts
Fifteen years ago, Yann LeCun was an outcast… remembers how
LeCun was relegated to the sidelines. “It was clear that he was an
outsider,” said Fergus. “He was talking about these methods.
Everyone was all, ‘Yann, yeah, we felt we had to invite him. These
models he’s talking about he’s been working on for years and
they’ve never really showed anything.
“Smart scientists go there to see their careers end.” Hinton’s lab
was seen as a renegade project, more the stuff of science fiction
than vocation.
from Welcome to the AI Conspiracy: The 'Canadian Mafia' Behind Tech's Latest Craze
https://www.recode.net/2015/7/15/11614684/ai-conspiracy-the-scientists-behind-deep-learning
33. @aliostad
/// outcasts
“In the late 90s and early 2000s, it was very very difficult to do
research in Neural Nets. In my own lab, I had to twist my students’
arm to do work on Neural Nets. They were afraid of seeing their
papers rejected because they were working on the subject, and
actually it did happen quite a bit for all the wrong reasons like ‘oh,
this is [neural nets]… we don’t do Neural Nets anymore’”.
Yoshua Bengio
from The History of Machine Learning from the Inside Out - Talking Machines Podcast, 26 Feb 2015
“In the 90s, other ML methods which were easier for a novice to
apply did as well or better than NN on many problems and interest
in them died. Three of us knew they would ultimately be the
answer. When we had better hardware, more data and slightly
better techniques, they took off again”.
Geoff Hinton
34. @aliostad
/// Revolution
> A fast learning algorithm for deep belief nets (2006)
Hinton, et al. - Boltzmann Machine (UL) for initialising weights
> Scaling Learning Algorithms towards AI (2007)
LeCun, et al.
> Large-scale Deep Unsupervised Learning using Graphics
Processors (2009) Hinton, et al. - Importance of GPU in training
> ReLU: Rectified Linear Units aka Rectifier (2010)
Hinton, et al. - LeCun, et al. - Bengio, et al. - Ng, et al.
> Gradient-based learning applied to Document Recognition (1998)
LeCun, Bengio, et al. - Convolutional networks, Gradient Descent
35. @aliostad
/// solutions to 2nd winter
> Overfitting: DropOut Layers
> Trainings never converged: GPU, initialisation of weights,
ReLU, Stochastic Gradient Descent with batching, pooling
> Human interaction still needed in feature engineering
> Curse of dimensionality: MOAR data!
36. @aliostad
/// State of the Art
> ImageNet 2013 winner (aka ZF Net):
Matthew Zeiler and Rob Fergus from NYU. error 11.2%
> ImageNet 2012 winner (aka AlexNet):
Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton (error 15.4% vs 26.2%)
> VGG Net 2014:
Karen Simonyan and Andrew Zisserman from Oxford. error 7.3%
> ImageNet 2014 winner (aka GoogLeNet):
Google. error 6.4%
> ImageNet 2015 winner (aka ResNet):
Microsoft Research Asia. error 3.6%
https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html
37. @aliostad
/// Future of DL research
> “Start from the beginning…” - Geoff Hinton
> Generally, Unsupervised Learning will be the focus:
GAN and auto-encoders,
https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html
38. @aliostad
/// Programming Language Detection
> Sample files collected from Github (2K per language)
> Keras on top of TensorFlow
> Trained on a GPU machine in Azure (8 hours on NC12)
> 16 programming languages
> Test on a different dataset (1K per language)
> Python (https://github.com/aliostad/deep-learning-lang-detection)
39. @aliostad
/// Programming Language Detection - approach
“Text Understanding from Scratch” April 2016
Xiang Zhang, Yann LeCun
Using quantised characters instead of words:
70 characters:
abcdefghijklmnopqrstuvwxyz0123456789-,;.!?:'/|_@#$%^&*~`+-=<>()[]{}"
0 0 0 0 0010000 0 0 0 0 0000000 0 0 0 0 0000000
i => …
41. @aliostad
/// Programming Language Detection -
Network Architecture
CONV 1D
(3)
POOLING
CONV 1D
(5)
POOLING
CONV 1D
(9)
POOLING
CONV 1D
(19)
POOLING
CONCATENATE
DROPOUT
I N P U T
DENSE
DROPOUT
DENSE
SOFTMAX
O U T P U T
[ n_chars, char_dim ]
[ n_classes ]
I N C E P T I O N