Gives an Introduction to Deep learning, What can you achieve with deep learning. What is deep learning's relationship with machine learning. Technical basics of working of deep learning. Introduction to LSTM. How LSTM can be used for Text classification. Results obtained.. Practical recommendations.
2. Table Of Contents
Brief Introduction
Part 1 - Deep Learning Background
Part 2 - Deep Learning Technical Understanding
Part 3 - Deep Learning Algorithms Introduction
Part 4 - Miscellaneous Topics
3. Agenda - Part 1 – Introduction
To Deep Learning
Why do you want to know about Deep Learning?
What is Deep Learning?
What can we do with Deep Learning?
Why Deep Learning now?
Deep Learning Technical Applications
Deep Learning Industrial Application Areas
6. Deep Learning Success Stories
Google deep-learning system that had been shown 10 million images from
YouTube videos proved almost twice as good as any previous image recognition
effort at identifying objects such as cats
Google also used the technology to cut the error rate significantly on speech
recognition in its latest Android mobile software
7. Deep Learning Success Stories
Team of three graduate students and two professors won a contest held by Merck
to identify molecules that could lead to new drugs. The group used deep learning
to zero in on the molecules most likely to bind to their targets.
Watson computer, uses some deep-learning techniques and is now being trained
to help doctors make better decisions.
Microsoft has deployed deep learning in its Windows Phone and Bing voice search.
9. What is Deep Learning?
As per Wikipedia
Deep learning (deep structured learning, hierarchical
learning or deep machine learning) is a branch of machine learning
based on a set of algorithms that attempt to model high-level
abstractions in data by using multiple processing layers, with complex
structures or otherwise, composed of multiple non-linear
transformations
As per LISA
Deep Learning is about learning multiple levels of representation and
abstraction that help to make sense of data such as images, sound, and
text
10. What is Deep Learning?
From Chris Nicholson, Co-Founder of Skymind
Deep learning is basically machine perception. What is perception? It is the
power to interpret sensory data. Two main ways we interpret things are by
− Naming what we sense;
• See and recognize your mother’s picture
− If do not know a name find similarities and dissimilarities
• See different photos of faces
• Bucket similar faces together
Deep-learning software attempts to mimic the activity in layers of neurons in the
neocortex, the wrinkly 80 percent of the brain where thinking occurs. The software
learns, in a very real sense, to recognize patterns in digital representations of sounds,
images, and other data.
13. Mcculloch and Pitt’s work
(1943)
Brain could produce highly complex patterns by using many basic cells that are
connected together.
These basic brain cells are called neurons, and McCulloch and Pitts gave a highly
simplified model of a neuron in their paper ( "threshold logic units.“)
A group of MCP neurons that are connected together is called an artificial neural
network. In a sense, the brain is a very large neural network. It has billions of
neurons, and each neuron is connected to thousands of other neurons.
McCulloch and Pitts showed how to encode any logical proposition by an
appropriate network of MCP neurons.
14. Perceptron – Rosenblatt(1957)
In machine learning, the perceptron is an algorithm for supervised learning
of binary classifiers: functions that can decide whether an input (represented by a
vector of numbers) belongs to one class or another.
The perceptron algorithm was invented in 1957 at the Cornell Aeronautical
Laboratory by Frank Rosenblatt
First invented in Software and later implemented in hardware as Mark1
Perceptron
15. AdaLine (1960)
ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element) is an early
single-layer artificial neural network and the name of the physical device that
implemented this network
It was developed by Professor Bernard Widrow and his graduate student Ted
Hoff at Stanford University in 1960. It is based on the McCulloch–Pitts neuron. It
consists of a weight, a bias and a summation function.
16. Lull till 1986 - XOR issue - AI
Winter
1969, Minsky co-authored with Seymour Papert , Perceptrons: An Introduction to
Computational Geometry. In this work they attacked the limitations of the
perceptron. They showed that the perceptron could only solve linearly separable
functions.
Of particular interest was the fact that the perceptron still could not solve the XOR
and NXOR functions.
Likewise, Minsky and Papert stated that the style of research being done on the
perceptron was doomed to failure because of these limitations. This was, of
course, Minsky’s equally ill-timed remark.
As a result, very little research was done in the area until about the 1980’s.
17. Multi Layer Perceptron (1986)
Solution of Non Linear Problems
Applied to areas like Speech recognition, Image Recognition and Machine
Translation
However competition from SVM (1996)
− Simple
− Light Weight
32. Deep Learning Industrial
Application Areas
Medical
• Voice controlled Robotic Surgery
Automotive
− Self Driving cars
Military
− Drones
Security
− Surveillance
33. Deep Learning Industrial
Application Areas
Drug discovery and toxicology
• Multi-task deep neural networks to predict the biomolecular target of a
compound
Customer relationship management
− Deep learning to extract meaningful deep features for latent factor model for content-based
recommendation for music
Recommendation systems
− Deep reinforcement learning to approximate the value of possible direct marketing actions
Bioinformatics
− Predict gene ontology annotations
35. Agenda - Part 2 - Technical
Deep Dive
What is Machine Learning?
What is Artificial Neural Network?
Definition of Machine Learning
General Concepts of Machine Learning
Machine Learning Recipe
Deep Dive into Multi-Layer Perceptron
37. What is Machine Learning?
Arthur Samuel defined machine learning as a
Field of study that gives computers the ability to learn without being
explicitly programmed".
Machine learning explores the study and construction
of algorithms that can learn from and make predictions on data.[3] Such
algorithms operate by building a model from example inputs in order
to make data-driven predictions or decisions,[4]:2 rather than following
strictly static program instructions.
38. Why are we talking of Machine
Learning?
Basic building block of Deep Learning Algorithms
− Multi Layer Perceptron
Deep Learning builds on Multi Layer Perceptron. Differences
− Increase number of Layers
− Connectivity between layers
39. Definition of Machine Learning
“A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P , if its performance at tasks in T, as
measured by P , improves with experience E ” (Mitchell, 1997).
40. Common Tasks (T)
Common Tasks
Classification - Computer program is asked to specify which of k categories some
input belongs to.
Regression - In this type of task, the computer program is asked to predict a
numerical value given some input.
41. Common Tasks (T)
Common Tasks
Transcription - In this type of task, the machine learning system is asked to observe
a relatively unstructured representation of some kind of data and transcribe it into
discrete, textual form. For example, in optical character recognition, the computer
program is shown a photograph containing an image of text and is asked to return
this text in the form of a sequence of characters
Translation: In a translation task, the input already consists of a sequence of
symbols in some language, and the computer program must convert this into a
sequence of symbols in another language.
42. Common Tasks (T)
Other Common Tasks
Structured Output Analysis - Output is a vector contained important relationships
between different elements *
Anomaly Detection - Flag unnatural values
Synthesis and Sampling - Generate new examples similar to those in training data
**
Impute missing values
43. Performance Measure (P)
P measures how well an ML algorithm performs on a tasks
Different tasks will have different performance measures
Also the use case / scenario itself will influence which performance measure we
have to use and what is the threshold for a measure
Accuracy for Classifier can be measured as number of correct classifications
divided by total number of classifications
44. Experience
Experience is the training data set using which learning is done
This learning can happen in below two ways :
− Unsupervised learning involves observing several examples of a random vector x , and
attempting to implicitly or explicitly learn the probability distribution p(x), or some
interesting properties of that distribution.
− Supervised learning involves observing several examples of a random vector x and an
associated value or vector y , and learning to predict y from x , e.g. estimating p(y | x)
45. General concepts in Machine
Learning
Training / Fitting of a Model
− Process of creating a function f which can then be used to do the intended function. Some
of the different categories of the function are :
• Map any input to one of the set of k categories
• Place the inputs in different buckets
• Learn a hierarchical representation of the input data
Validating the model
− Give a performance measure as to how well the model us doing
46. General concepts in Machine
Learning
Over fitting
− Fitting the function weights to have a very high accuracy on training data at the cost of
performing poorly on unseen data
Under fitting
− Not fitting the models well enough to training data and getting a higher error rate on same
Generalization
− Learning the weights and model in a way that model does well even on new unseen data
Regularization - This is used to ensure there is no overfitting
48. What is Artificial Neural
Network?
Artificial Neural networks (ANNs) are a family of models inspired
by biological neural networks (the central nervous systems of animals,
in particular the brain) and are used to estimate
or approximate functions that can depend on a large number
of inputs and are generally unknown. Artificial neural networks are
generally presented as systems of interconnected "neurons" which
exchange messages between each other. The connections have
numeric weights that can be tuned based on experience, making
neural nets adaptive to inputs and capable of learning.
49. ANN Basics
ANN is a finite directed acyclic graph
Learns a non linear function from the data
3 Main Kinds of Nodes
− Source
− Target
− Hidden
56. ANN Recipe ( Similar to ML
recipe)
Combine
Specification of a dataset
Cost function
Optimization procedure
Model
All algorithms follow the above Recipe
The variation will come in terms of the cost function , optimization procedure as
well as the network architecture itself
59. Cost Function
Cost Function calculates the error between the predicted output and actual output
60. Backward Propagation using
Optimization
Backward Pass - Attempt the minimize the error in previous step through
mathematical optimization of the different learnt weight terms
Objective is to find the global minima for the error function – Stochastic Gradient
Descent uses below formulae to update parameters
Calculate the slope (partial derivative ) at every layer
Based on value and sign of slope take decision on
− To increase or decrease weight in layer
− Magnitude of increase and decrease of weight ( Learning rate)
61. Optimization Procedure
Objective - Minimize the error ( Difference between predicted and actual output).
This is done by varying the weights of the different network terms
Different algorithms
− Stochastic Gradient Descent
− L-BFGS algorithm
− Conjugate Gradient Descent
Outline of Forward and Backward propagation algo to learn the weights
Initialize weights
Predict output
Calculate error
Backward propagate the weights to minimize the errors
− Magnitude of weight change and direction determined by the partial derivates as well as
learning rate
62. Model
Output of the optimization process is the built network with weights
This is the trained model
63. Part 3 - Introduction to Long Short Term Memory Network – A Deep
Learning Algorithm
65. LSTM
Used for Sequence to Sequence Learning
RNN - Networks with loops in them, allowing information to persist.
LSTM - Special kind of RNN, capable of learning long-term dependencies.
Additional Special structures as part of each repeating module
Practical Applications –
− Machine Translation
− Question Answering Systems
− Conversational Systems
72. LSTM Variants
Many Variants based on connectivity's
GRU
− Combines the forget and input gates into a single “update gate.”
− Merges the cell state and hidden state, and makes some other changes
Attention Mechanisms
− Additional context while making predictions
73. Part 4 – Building a Text Classifier using LSTM
74. Available Deep Learning
Frameworks
Multiple deep learning frameworks across different languages
Active community support
Popular ones include
− In Python
• Theano
• Keras
• TensorFlow
− In Lua
• Torch
− In Java
• Deeplearning4j
75. Key Steps (To be updated)
Steps which are required for this
79. Learnings
Standard systems sufficient to get started
However for effective results more compute power is required
Patience – Each algorithms takes a lot of time to complete hence patience is
important
Incremental Training
Need lots of data. Traditional algorithms work well for Standard / Small data
size
Data cleaning and curation still helps
80. Advantages of using Deep
Learning
Automated extraction of complex features. Example
• Feature Extraction for Images*
• Implication : Need not be deep domain expert to build solutions
−Performances which took years of feature tuning by domain experts can now
be achieved with no feature engineering
Similar algorithms can be used across domains – Image, Speech , text
understanding
81. Recap
Why Deep Learning algorithms?
− Industrial Applications
− Technical Capabilities
Working of Basic Building Block of Deep Learning algorithm
− Multi Layer Perceptron
Introduction to LSTM
Text Classifier using LSTM
Advantage of Using Deep Learning
84. Attributions
Images used in this presentation are from different sources on the net
Reference list the sources of these images as well as source of some of the
explanations