Deep learning for developers - oredev

>>> Deep Learning for
Developers
@aliostad
Ali Kheyrollahi, ASOS

@aliostad
: @aliostad
email: the same @gmail.com
http://byterot.blogspot.com
Ali Kheyrollahi,
Solutions Architect at ASOS

@aliostad
/// Agenda
> Building blocks of DL throughout history
> State of the art
> Programming Language detection using DL
from scratch using Keras/TensorFlow (achieving
98.7% accuracy)

@aliostad
/// Not covering
Deep Dream (reverse input)
GAN (General Adverserial Networks)
Monte Carlo Tree Search (MCTS)
LSTM
RNN

2012
AlexNet
2006-2010
BigData, ReLU,
GPU, weight
initialisation
1998
Gradient
Descent
1990
Convolutional
Networks
1986
Back-
propagation
@aliostad
1943
Neural
Networks
1958
Perceptron &
Training
F I R S T W I N T E R S E C O N D W I N T E R
/// Deep Learning history

@aliostad
/// the rebels
Warren McCulloch
Walter Pitts
Jerome Lettvin

@aliostad
/// friends and poets

@aliostad
/// McCulloch-Pitts (1943)

@aliostad
/// Wiener and rebels
Norbert Wiener

@aliostad
/// Pitts
“In no uncertain sense [he
was] the genius of the
group … when you asked
him a question, you would
get back a whole textbook”
Jerome Lettvin on Pitts

@aliostad
/// Wiener
Margaret Wiener

@aliostad
/// What the Frog (1959)

@aliostad
/// Frank Rosenblatt (1957-1962)

@aliostad
/// types of learning
> Supervised Learning (SL): eg digit recognition
> Unsupervised Learning (UL): eg clustering
> Reinforcement Learning (RL): eg game playing

@aliostad
/// Perceptron
from Sebastian Raschka’s blog

@aliostad
/// Perceptron
Activation
Function
x > 5 return 0
else return 1
Input (layer)
w1=2
w2=-1
w3=5
6
3
-2
6x2 + 3x-1 + -2x5 = -1
1
9
-1
-1
9x2 + -1x-1 + 1x5 = 24
0

@aliostad
/// Activation Function
ReLU (rectiﬁer)

@aliostad
/// Minsky/Papert
(1969)
Marvin Minsky Seymour Papert

@aliostad
/// first ai winter (1969-1985)
XOR Fiasco
XOR Implementation

@aliostad
/// new surge: back-propagation (1986)
Geoﬀrey Hinton David Rumelhart

@aliostad
/// Hand-written letter recognition (1989)
Yann LeCun

@aliostad
/// Convolutional Neural Nets (1990)
ConvolutionLocally connected
(1998)

@aliostad
/// second ai winter (1992-2006)

@aliostad
/// 2nd Winter: Curse of Dimentionality
from Christopher Bishop’s
“Neural Networks for Pattern Recognition” (1995)

@aliostad
/// 2nd Winter: Feature extraction

@aliostad
/// 2nd Winter: Generalisation/Overfitting

@aliostad
/// Re-Cap problems
> Overﬁtting (deep networks would overﬁt)
> Trainings never converged (computation, weight decay)
> Human interaction still needed in feature engineering
> Curse of dimensionality (complexity, sparse dimensions)
> Support Vector Machine (SVM) much more practical

@aliostad
/// the new rebels
Yoshua Bengio
Yann LeCun,
Facebook
Geoﬀ Hinton,
Google
Andrew Ng
Baidu (Google)
Canadian Maﬁa (CIFAR)

@aliostad
/// outcasts
Fifteen years ago, Yann LeCun was an outcast… remembers how
LeCun was relegated to the sidelines. “It was clear that he was an
outsider,” said Fergus. “He was talking about these methods.
Everyone was all, ‘Yann, yeah, we felt we had to invite him. These
models he’s talking about he’s been working on for years and
they’ve never really showed anything.
“Smart scientists go there to see their careers end.” Hinton’s lab
was seen as a renegade project, more the stuff of science fiction
than vocation.
from Welcome to the AI Conspiracy: The 'Canadian Mafia' Behind Tech's Latest Craze
https://www.recode.net/2015/7/15/11614684/ai-conspiracy-the-scientists-behind-deep-learning

@aliostad
/// outcasts
“In the late 90s and early 2000s, it was very very difficult to do
research in Neural Nets. In my own lab, I had to twist my students’
arm to do work on Neural Nets. They were afraid of seeing their
papers rejected because they were working on the subject, and
actually it did happen quite a bit for all the wrong reasons like ‘oh,
this is [neural nets]… we don’t do Neural Nets anymore’”.
Yoshua Bengio
from The History of Machine Learning from the Inside Out - Talking Machines Podcast, 26 Feb 2015
“In the 90s, other ML methods which were easier for a novice to
apply did as well or better than NN on many problems and interest
in them died. Three of us knew they would ultimately be the
answer. When we had better hardware, more data and slightly
better techniques, they took off again”.
Geoff Hinton

@aliostad
/// Revolution
> A fast learning algorithm for deep belief nets (2006)
Hinton, et al. - Boltzmann Machine (UL) for initialising weights
> Scaling Learning Algorithms towards AI (2007)
LeCun, et al.
> Large-scale Deep Unsupervised Learning using Graphics
Processors (2009) Hinton, et al. - Importance of GPU in training
> ReLU: Rectiﬁed Linear Units aka Rectiﬁer (2010)
Hinton, et al. - LeCun, et al. - Bengio, et al. - Ng, et al.
> Gradient-based learning applied to Document Recognition (1998)
LeCun, Bengio, et al. - Convolutional networks, Gradient Descent

@aliostad
/// solutions to 2nd winter
> Overﬁtting: DropOut Layers
> Trainings never converged: GPU, initialisation of weights,
ReLU, Stochastic Gradient Descent with batching, pooling
> Human interaction still needed in feature engineering
> Curse of dimensionality: MOAR data!

@aliostad
/// State of the Art
> ImageNet 2013 winner (aka ZF Net):
Matthew Zeiler and Rob Fergus from NYU. error 11.2%
> ImageNet 2012 winner (aka AlexNet):
Alex Krizhevsky, Ilya Sutskever, and Geoﬀrey Hinton (error 15.4% vs 26.2%)
> VGG Net 2014:
Karen Simonyan and Andrew Zisserman from Oxford. error 7.3%
> ImageNet 2014 winner (aka GoogLeNet):
Google. error 6.4%
> ImageNet 2015 winner (aka ResNet):
Microsoft Research Asia. error 3.6%
https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html

@aliostad
/// Future of DL research
> “Start from the beginning…” - Geoﬀ Hinton
> Generally, Unsupervised Learning will be the focus:
GAN and auto-encoders,
https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html

@aliostad
/// Programming Language Detection
> Sample ﬁles collected from Github (2K per language)
> Keras on top of TensorFlow
> Trained on a GPU machine in Azure (8 hours on NC12)
> 16 programming languages
> Test on a diﬀerent dataset (1K per language)
> Python (https://github.com/aliostad/deep-learning-lang-detection)

@aliostad
/// Programming Language Detection - approach
“Text Understanding from Scratch” April 2016
Xiang Zhang, Yann LeCun
Using quantised characters instead of words:
70 characters:
abcdefghijklmnopqrstuvwxyz0123456789-,;.!?:'/|_@#$%^&*~`+-=<>()[]{}"
0 0 0 0 0010000 0 0 0 0 0000000 0 0 0 0 0000000
i => …

@aliostad
/// Programming Language Detection -
Representation
[One-hot-vector]
“import numpy as np”
0 0 0 0 0010000 0 0 0 0 0000000 0 0 0 0 0000000
0 0 0 0 0000000 0 0 0 1 0000000 0 0 0 0 0000000
0 0 0 0 0000000 0 0 0 0 1000000 0 0 0 0 0000000
0 0 0 0 0000000 0 0 0 0 0100000 0 0 0 0 0000000
0 0 0 0 0000000 0 0 0 0 0000000 0 1 0 0 0000000
0 0 0 0 0000000 0 0 0 0 0000000 0 0 1 0 0000000
{“import”
{70

@aliostad
Network Architecture
CONV 1D
(3)
POOLING
CONV 1D
(5)
POOLING
CONV 1D
(9)
POOLING
CONV 1D
(19)
POOLING
CONCATENATE
DROPOUT
I N P U T
DENSE
DROPOUT
DENSE
SOFTMAX
O U T P U T
[ n_chars, char_dim ]
[ n_classes ]
I N C E P T I O N

@aliostad
12 3 8 4
11 9 7 2
3 5 2 6
8 6 10 5
222
10
8
7
222222222222
777777
2222
7777
222
7
8
2x2 Max-Pooling
12 8
8 10
/// Pooling

@aliostad
Layers
> Conv 1D: (using ReLU) finds local relationships (words, symbols)
> Pooling: Dimensionality reduction (anti-overfitting, anti-decay)
> Dense: Generalise to higher level concepts (functions, statements)
> Dropout: Masks input semi-randomly (anti-overfitting)
> Softmax: Turns scalar to probability using normalised exponential

@aliostad
/// C O D E
&
D E M O

@aliostad
Automatic real-time road marking recognition
John McSporran: Winter Picture
Researchgate: Convolution Picture
Perceptron Video
Hsiung: Lights Picture
Text classiﬁcation using Convolutional Neural Networks (CNN)

Deep learning for developers - oredev

Recommended

Recommended

More Related Content

Similar to Deep learning for developers - oredev

Similar to Deep learning for developers - oredev (20)

More from Ali Kheyrollahi

More from Ali Kheyrollahi (17)

Recently uploaded

Recently uploaded (20)

Deep learning for developers - oredev