SlideShare a Scribd company logo
1 of 25
Download to read offline
Deep Learning: Cutting through
the Myths an the Hype
Siby Jose Plathottam
Postdoctoral Appointee, Energy Systems
splathottam@anl.gov
Outline
• AI vs Machine Learning vs Deep learning
• Why all the hype?
• Deep Learning basics
• Data and compute
• Addressing interpretability
• Research tracks
Artificial
Intelligence
(AI)
Machine
Learning
(ML)
Deep
Learning
(DL)
AI vs Machine Learning vs Deep learning
What is Artificial Intelligence?
“... making a machine behave in ways that would be called intelligent if a human
were so behaving.” McCarthy, Minsky Et al. (Dartmouth Conference, 1956)
What is Machine Learning?
“…seeking to provide knowledge to computers through data, observations and
interacting with the world. That acquired knowledge allows computers to correctly
generalize to new settings.” Yoshua Bengio
What is Deep Learning?
“A sub-field of ML which uses the artificial neuron as the basic computing model.”
(my own definition)
Why all the hype for Deep Learning?
Examples of intelligent behavior Examples Deep learning solutions Breakthrough year
Visual perception Image recognition AlexNet, ResNet, NASnet 2012
Object detection YOLO, R-CNN, SSD 2015
Natural language processing
Speech recognition/synthesis Google Assistant 2011/2016
Language translation Neural Machine Translation 2015
Game playing
Board games AlpahGo, AlphaZero 2015/2016
Strategy computer games OpenAI5, AlphaStar 2018
Medical diagnostics Retinal diagnosis U-Net (DeepMind) 2018
Cancer detection LYmph Node Assistant 2018
Scientific Discovery Protein folding AlphaFold 2018
Creativity Image synthesis StyleGAN, BigGAN 2015/2018
The breakthrough’s behind the hype
Figure reference: image-net.org
First use of deep learning
48
38.8
19.3
19.8
19.8
16.1
9.9 5.1
0
25
50
1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 2019
WER levels out in mid 2000s
First use of deep learning
MSFT human parity:
System combination
+ LSTM
Figure reference: ‘The deep learning revolution in automatic speech recognition’ Ananth, Sankar, ODSC India 2018
Image recognition: ILSVRC top-5 error rate
Speech recognition: SWBD-1 word error rate
Human error: 5.1%
AlexNet (2012): 15.4 %
ResNet (2015): 3.5%
SENet (2017): 2.25 %
First use of Deep learning
ILSVRCTop-5
errorrate
The breakthroughs behind the hype (cont…)
0
1000
2000
3000
4000
5000
6000
Go Chess Shogi
AlphaZero Previous best
AlphaGo Zero
Comparing Elo ratings for Go, Chess, and Shogi (2017)
StockFish
Elmo
Figure reference: Waymo’s fleet reaches 4 million
self-driven miles, Waymo Team, Medium, 2017
Autonomous Vehicles
(which use Deep learning for perception, and planning)
Real world miles (L3/L4) Real world miles (L1/L2)
Data reference: A general reinforcement learning
algorithm that masters chess, shogi and Go through self-
play, Silver et al., Science, 2018
Figure reference: Tesla Autopilot Miles, Lex
Fridman, MIT HCAI, 2019
The mathematics behind the hype:
Universal approximation theorem*
What we know: A neural network can approximate any continuous
function to any desired precision using a finite amount of hidden units.
What we don’t know: The optimal way to compute the neural network
parameters (weights & biases) for the function we want to approximate.
What we mostly use now: Backpropagation with gradient descent
*Reference: Approximation by superposition's of a sigmoidal function, G. Cybenko, MCSS, 1989
Forward pass to compute
network output
Generate an error term by comparing
network output with a teaching signal
 
2
1
1

 
N
i i
i
error t y
N
Teaching signal
Network output
Use error to compute and apply gradients
Deep Learning basics: Training vs Inference
Create
Model
Training Inference
Input Data
Target Data
Input Data
Prediction
“Homer” Class ‘Homer’: 90% Class ‘Marge’: 10%
Raw Data
Preprocessing
These take least time. Real-
time inference is possible for
many applications
These take most time
Deep Learning basics: Building Deep Neural Networks
Directed acyclic graphs of desired complexity from a few core units.

1w
f2w
nw
e.g. Feedforward neuron Densely connected
Convolutional
MLP/CNN
AE
Residual Network
VAE
GAN
(with Sigmoid activation)
1
max 0,

 
  
 

n
i i
i
y w X b
(with ReLU activation)
1
1
1 
 



n
i i
i
w X b
y
e
Unit Layers Groups of
layers (Cells)
Models
Deep Learning basics: All three learning problems
Supervised
Unsupervised
Reinforcement
• Classification
• Regression
• Deep generative models
• Segmentation
• DeepRL
• World Models
What motivated me to pursue Deep Learning?
Successful applications.
Approximating optimal control trajectories with a single layer neural network.
w111
w113
w114
w115
w117
w112
w116



 xtanh
w211
b11 b21
w232y3
b22b13
b12
y2
w222
w212
w221
w231
 xtanh
 xtanh
Fdr_f ωr_f
Data
Normalization
Fdr_0 ωr_0
Fdr
ωr
t
y1 Iqs_opt
Ids_opt
        24
1 2 1 4 1 4 1 1
 
        
 
a
dr r
ds
m r
L
i t x t x t x t
L R

   _ 2
1
30
4 1
 
  
 
opt A
qs a
dr
C
i t t t
x K
Single layer with single hidden layer consisting of 3 units Analytical form
     
 
 
2
2_
1
3 3 4
2 116 2 216 4 16
15 3 15 3
240
0
4 1
   
      
  

  

atotal optA
drrloss
r
xxdE x
E
dx R
E E
x

However to find 𝑥, we need to find roots of following
equation:
Deep Learning Challenges
Data
• We may not have enough of it!
• Available data may be biased.
Interpretability
• Disparity in what we believe/want the neural network
to observe versus what it actually observes.
• Unintended emergent behavior
Computing
requirements
• Millions of parameters need to be updated at each
iteration.
Data requirements: Why does it need so much?
Parameters in production models: 1M to 50 M
Three possibilities when training an ML model
Under fitting Optimal fitting Over fitting
This is easily solvable in
Deep Learning due to UAT.
This is what we need.
Deep Neural Networks are
prone to overfitting!
Data requirements: Biased data is the real villain
Untrained deep learning models can’t be pre-conditioned towards
the task they have to learn.
If the data is biased towards a particular outcome the predictions
made by the model would be biased to that particular outcome.
Data requirements: Solutions
• Transfer learning: Networks trained for a particular task (for e.g..
Image recognition) can be retrained for a similar task (for e.g. object
detection) but with lesser amount of data.
• Data augmentation: Apply transformations to original dataset and
generated additional samples.
• Regularization: Make the neural network work harder to solve the
problem.
Figure reference: Deep Learning basics, Lex Fridman, MIT HCAI, 2019
Transfer learning example
Computing: Is Deep Learning computationally feasible?
Neural Network operations are parellizable.
Tens of thousands of matrix operations per clock cycle through vector processing.
1x
2x
2x
1y
2y
1 11 1 12 2 13 3
  y w x w x w x
2 21 1 22 2 23 3
  y w x w x w x
SSE/AVX instructions on CPU
 T
Y WX
Streaming multiprocessors on GPU
(eg. NVIDIA CuDNN)
or
Neural network models are parellizable
Hundreds of models on distributed GPU or CPU nodes
Computing: Is Deep Learning computationally feasible?
Neural network models are parellizable
Distribute on hundreds of GPU or CPU nodes: Data-parallel or model-
parallel
Update model
parameters
Parameter node
Worker nodes
Data pipelines
Each pipeline randomly samples from training
dataset
Each worker node calculates gradients
independently but share model parameters
Parameter node uses gradients from worker
node to update model parameters
Data-parallel training
Interpretability: Is Deep Learning a black box?
Myth: Deep learning uses Magic so we can’t understand what is going on.
1
n
i i
i
y w X b

 
Linear regression model
Question: If we were to do visual inspection of the model parameters in which model would it be easier to quantify
the relationships between of each parameter on the output.
Perceptron model (single layer ANN)
1 1
1
1
m n
o H H o
j j i i i
j i
w f X w b b
y
e  
 
   
 
 

 

X 
,w b 
Model inputs
Model parameters
 Let us compare 3 models:
n  Number of input features
m  Number of hidden neurons
  P I D
de
y K e K edt K
dt
PID controller model
e
, , P I D
K K K
Error input
Controller gains
Interpretability: Adversarial inputs
Adversarial inputs: Interspersing adversarial noise with specific statistical
properties into clean data.
Figure reference: www.pluribus-one.it/research/sec-ml/wild-patterns
 However: The attacker needs access to the network architecture and trained
weights.
Interpretability: Solutions
Intermediate outputs: Ask networks to provide intermediate output.
Figure reference: Clinically applicable deep learning for diagnosis and referral in retinal
disease, Fauw et al., Nature Medicine, 2018
Heat maps: Visualize activations from individual layers or neurons
Figure Reference: Approximating CNN’s with bag-of-local features models works surprisingly well
on imagenet, Brendel, ICLR 2019
Deep Learning R&D Tracks(cont.)Application
Applications
New domains
Bench marks
Infrastructure
Training performance
Optimize inference
Architecture
Modify architectures
Loss
Hybrid models
Eg. Wasserstein
distance
Eg. Horovod
Eg. Imagenet, CINIC-10
E.g. Cancer research
Eg. Transformers
Eg. XLA, TensorRT
Eg. DeepRL
Eg. ANL DeepHyper
Fundamental
Understanding
Interpretable
models
Explore failure
modes
Architectures
Beyond gradient
descent
Novel
architectures
Deep Learning R&D Tracks (cont.)
E.g. Bag of models
Eg. Neural Ordinary Differential
Equations, Neural Turing Machines
Eg. One shot learning
Eg. Adversarial attacks
Power System Application: Load Modelling
Can you model a consumer load at the individual smart meter level?
Distribution plot for consumer load for particular hours over a 30 day period period
Frequency
Load (normalized) Load (normalized)
xt-n
xt-1
xt
Sampling from a known distribution
(e.g. unit Gaussian)
zm
Interpretable latent space feature
(e.g. timeof day, cloudiness index)
xt-n Forecast for the (t-n)th
timeblock
xt-2
xt-3
xt-4
z1
z2
zm
Synthetic load curves at consumer level from deep generative model
Power System Applications (cont.)
Concept: Train neural networks to approximate dynamic behavior of
dynamic components within a certain operating point.
Potential advantages:
Allows us to accelerate parts of simulation on a dedicated GPU.
Reduces number of ODE’s.
Intelligent agents
Accelerating Simulations
Concept: Deep Reinforcement learning agents for supervisory control.
Allow agents to learn through interactions with a power system simulator.
Currently being researched by GEIRI NA.
Concluding remarks
Deep Learning is powerful machine learning tool to develop Artificial
Narrow Intelligence programs.
Developing Deep Learning solutions is a non-trivial engineering problem.
The performance of a Deep Learning model is intimately tied to the
quality and quantity of training data.
Neural network parameters cannot be interpreted by direct observation
and require specialized software tools.

More Related Content

What's hot

What's hot (20)

Deep learning
Deep learningDeep learning
Deep learning
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptx
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
 
1.Introduction to deep learning
1.Introduction to deep learning1.Introduction to deep learning
1.Introduction to deep learning
 
Deep learning
Deep learningDeep learning
Deep learning
 
Andrew Ng, Chief Scientist at Baidu
Andrew Ng, Chief Scientist at BaiduAndrew Ng, Chief Scientist at Baidu
Andrew Ng, Chief Scientist at Baidu
 
Bhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueBhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogue
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye view
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
introduction to deep Learning with full detail
introduction to deep Learning with full detailintroduction to deep Learning with full detail
introduction to deep Learning with full detail
 
Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
 
Deep learning - what is it and why now?
Deep learning - what is it and why now?Deep learning - what is it and why now?
Deep learning - what is it and why now?
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work II
 

Similar to Deep learning: Cutting through the Myths and Hype

Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
butest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 

Similar to Deep learning: Cutting through the Myths and Hype (20)

Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Deep Learning Demystified
Deep Learning DemystifiedDeep Learning Demystified
Deep Learning Demystified
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine Learning
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
AI Presentation 1
AI Presentation 1AI Presentation 1
AI Presentation 1
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolution
 
Deep learning
Deep learningDeep learning
Deep learning
 
2017 07 03_meetup_d
2017 07 03_meetup_d2017 07 03_meetup_d
2017 07 03_meetup_d
 
2017 07 03_meetup_d
2017 07 03_meetup_d2017 07 03_meetup_d
2017 07 03_meetup_d
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)
 
Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing
 
20181212 ibm aot
20181212 ibm aot20181212 ibm aot
20181212 ibm aot
 

Recently uploaded

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Recently uploaded (20)

UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 

Deep learning: Cutting through the Myths and Hype

  • 1. Deep Learning: Cutting through the Myths an the Hype Siby Jose Plathottam Postdoctoral Appointee, Energy Systems splathottam@anl.gov
  • 2. Outline • AI vs Machine Learning vs Deep learning • Why all the hype? • Deep Learning basics • Data and compute • Addressing interpretability • Research tracks Artificial Intelligence (AI) Machine Learning (ML) Deep Learning (DL)
  • 3. AI vs Machine Learning vs Deep learning What is Artificial Intelligence? “... making a machine behave in ways that would be called intelligent if a human were so behaving.” McCarthy, Minsky Et al. (Dartmouth Conference, 1956) What is Machine Learning? “…seeking to provide knowledge to computers through data, observations and interacting with the world. That acquired knowledge allows computers to correctly generalize to new settings.” Yoshua Bengio What is Deep Learning? “A sub-field of ML which uses the artificial neuron as the basic computing model.” (my own definition)
  • 4. Why all the hype for Deep Learning? Examples of intelligent behavior Examples Deep learning solutions Breakthrough year Visual perception Image recognition AlexNet, ResNet, NASnet 2012 Object detection YOLO, R-CNN, SSD 2015 Natural language processing Speech recognition/synthesis Google Assistant 2011/2016 Language translation Neural Machine Translation 2015 Game playing Board games AlpahGo, AlphaZero 2015/2016 Strategy computer games OpenAI5, AlphaStar 2018 Medical diagnostics Retinal diagnosis U-Net (DeepMind) 2018 Cancer detection LYmph Node Assistant 2018 Scientific Discovery Protein folding AlphaFold 2018 Creativity Image synthesis StyleGAN, BigGAN 2015/2018
  • 5. The breakthrough’s behind the hype Figure reference: image-net.org First use of deep learning 48 38.8 19.3 19.8 19.8 16.1 9.9 5.1 0 25 50 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017 2019 WER levels out in mid 2000s First use of deep learning MSFT human parity: System combination + LSTM Figure reference: ‘The deep learning revolution in automatic speech recognition’ Ananth, Sankar, ODSC India 2018 Image recognition: ILSVRC top-5 error rate Speech recognition: SWBD-1 word error rate Human error: 5.1% AlexNet (2012): 15.4 % ResNet (2015): 3.5% SENet (2017): 2.25 % First use of Deep learning ILSVRCTop-5 errorrate
  • 6. The breakthroughs behind the hype (cont…) 0 1000 2000 3000 4000 5000 6000 Go Chess Shogi AlphaZero Previous best AlphaGo Zero Comparing Elo ratings for Go, Chess, and Shogi (2017) StockFish Elmo Figure reference: Waymo’s fleet reaches 4 million self-driven miles, Waymo Team, Medium, 2017 Autonomous Vehicles (which use Deep learning for perception, and planning) Real world miles (L3/L4) Real world miles (L1/L2) Data reference: A general reinforcement learning algorithm that masters chess, shogi and Go through self- play, Silver et al., Science, 2018 Figure reference: Tesla Autopilot Miles, Lex Fridman, MIT HCAI, 2019
  • 7. The mathematics behind the hype: Universal approximation theorem* What we know: A neural network can approximate any continuous function to any desired precision using a finite amount of hidden units. What we don’t know: The optimal way to compute the neural network parameters (weights & biases) for the function we want to approximate. What we mostly use now: Backpropagation with gradient descent *Reference: Approximation by superposition's of a sigmoidal function, G. Cybenko, MCSS, 1989 Forward pass to compute network output Generate an error term by comparing network output with a teaching signal   2 1 1    N i i i error t y N Teaching signal Network output Use error to compute and apply gradients
  • 8. Deep Learning basics: Training vs Inference Create Model Training Inference Input Data Target Data Input Data Prediction “Homer” Class ‘Homer’: 90% Class ‘Marge’: 10% Raw Data Preprocessing These take least time. Real- time inference is possible for many applications These take most time
  • 9. Deep Learning basics: Building Deep Neural Networks Directed acyclic graphs of desired complexity from a few core units.  1w f2w nw e.g. Feedforward neuron Densely connected Convolutional MLP/CNN AE Residual Network VAE GAN (with Sigmoid activation) 1 max 0,          n i i i y w X b (with ReLU activation) 1 1 1       n i i i w X b y e Unit Layers Groups of layers (Cells) Models
  • 10. Deep Learning basics: All three learning problems Supervised Unsupervised Reinforcement • Classification • Regression • Deep generative models • Segmentation • DeepRL • World Models
  • 11. What motivated me to pursue Deep Learning? Successful applications. Approximating optimal control trajectories with a single layer neural network. w111 w113 w114 w115 w117 w112 w116     xtanh w211 b11 b21 w232y3 b22b13 b12 y2 w222 w212 w221 w231  xtanh  xtanh Fdr_f ωr_f Data Normalization Fdr_0 ωr_0 Fdr ωr t y1 Iqs_opt Ids_opt         24 1 2 1 4 1 4 1 1              a dr r ds m r L i t x t x t x t L R     _ 2 1 30 4 1        opt A qs a dr C i t t t x K Single layer with single hidden layer consisting of 3 units Analytical form           2 2_ 1 3 3 4 2 116 2 216 4 16 15 3 15 3 240 0 4 1                    atotal optA drrloss r xxdE x E dx R E E x  However to find 𝑥, we need to find roots of following equation:
  • 12. Deep Learning Challenges Data • We may not have enough of it! • Available data may be biased. Interpretability • Disparity in what we believe/want the neural network to observe versus what it actually observes. • Unintended emergent behavior Computing requirements • Millions of parameters need to be updated at each iteration.
  • 13. Data requirements: Why does it need so much? Parameters in production models: 1M to 50 M Three possibilities when training an ML model Under fitting Optimal fitting Over fitting This is easily solvable in Deep Learning due to UAT. This is what we need. Deep Neural Networks are prone to overfitting!
  • 14. Data requirements: Biased data is the real villain Untrained deep learning models can’t be pre-conditioned towards the task they have to learn. If the data is biased towards a particular outcome the predictions made by the model would be biased to that particular outcome.
  • 15. Data requirements: Solutions • Transfer learning: Networks trained for a particular task (for e.g.. Image recognition) can be retrained for a similar task (for e.g. object detection) but with lesser amount of data. • Data augmentation: Apply transformations to original dataset and generated additional samples. • Regularization: Make the neural network work harder to solve the problem. Figure reference: Deep Learning basics, Lex Fridman, MIT HCAI, 2019 Transfer learning example
  • 16. Computing: Is Deep Learning computationally feasible? Neural Network operations are parellizable. Tens of thousands of matrix operations per clock cycle through vector processing. 1x 2x 2x 1y 2y 1 11 1 12 2 13 3   y w x w x w x 2 21 1 22 2 23 3   y w x w x w x SSE/AVX instructions on CPU  T Y WX Streaming multiprocessors on GPU (eg. NVIDIA CuDNN) or Neural network models are parellizable Hundreds of models on distributed GPU or CPU nodes
  • 17. Computing: Is Deep Learning computationally feasible? Neural network models are parellizable Distribute on hundreds of GPU or CPU nodes: Data-parallel or model- parallel Update model parameters Parameter node Worker nodes Data pipelines Each pipeline randomly samples from training dataset Each worker node calculates gradients independently but share model parameters Parameter node uses gradients from worker node to update model parameters Data-parallel training
  • 18. Interpretability: Is Deep Learning a black box? Myth: Deep learning uses Magic so we can’t understand what is going on. 1 n i i i y w X b    Linear regression model Question: If we were to do visual inspection of the model parameters in which model would it be easier to quantify the relationships between of each parameter on the output. Perceptron model (single layer ANN) 1 1 1 1 m n o H H o j j i i i j i w f X w b b y e                 X  ,w b  Model inputs Model parameters  Let us compare 3 models: n  Number of input features m  Number of hidden neurons   P I D de y K e K edt K dt PID controller model e , , P I D K K K Error input Controller gains
  • 19. Interpretability: Adversarial inputs Adversarial inputs: Interspersing adversarial noise with specific statistical properties into clean data. Figure reference: www.pluribus-one.it/research/sec-ml/wild-patterns  However: The attacker needs access to the network architecture and trained weights.
  • 20. Interpretability: Solutions Intermediate outputs: Ask networks to provide intermediate output. Figure reference: Clinically applicable deep learning for diagnosis and referral in retinal disease, Fauw et al., Nature Medicine, 2018 Heat maps: Visualize activations from individual layers or neurons Figure Reference: Approximating CNN’s with bag-of-local features models works surprisingly well on imagenet, Brendel, ICLR 2019
  • 21. Deep Learning R&D Tracks(cont.)Application Applications New domains Bench marks Infrastructure Training performance Optimize inference Architecture Modify architectures Loss Hybrid models Eg. Wasserstein distance Eg. Horovod Eg. Imagenet, CINIC-10 E.g. Cancer research Eg. Transformers Eg. XLA, TensorRT Eg. DeepRL Eg. ANL DeepHyper
  • 22. Fundamental Understanding Interpretable models Explore failure modes Architectures Beyond gradient descent Novel architectures Deep Learning R&D Tracks (cont.) E.g. Bag of models Eg. Neural Ordinary Differential Equations, Neural Turing Machines Eg. One shot learning Eg. Adversarial attacks
  • 23. Power System Application: Load Modelling Can you model a consumer load at the individual smart meter level? Distribution plot for consumer load for particular hours over a 30 day period period Frequency Load (normalized) Load (normalized) xt-n xt-1 xt Sampling from a known distribution (e.g. unit Gaussian) zm Interpretable latent space feature (e.g. timeof day, cloudiness index) xt-n Forecast for the (t-n)th timeblock xt-2 xt-3 xt-4 z1 z2 zm Synthetic load curves at consumer level from deep generative model
  • 24. Power System Applications (cont.) Concept: Train neural networks to approximate dynamic behavior of dynamic components within a certain operating point. Potential advantages: Allows us to accelerate parts of simulation on a dedicated GPU. Reduces number of ODE’s. Intelligent agents Accelerating Simulations Concept: Deep Reinforcement learning agents for supervisory control. Allow agents to learn through interactions with a power system simulator. Currently being researched by GEIRI NA.
  • 25. Concluding remarks Deep Learning is powerful machine learning tool to develop Artificial Narrow Intelligence programs. Developing Deep Learning solutions is a non-trivial engineering problem. The performance of a Deep Learning model is intimately tied to the quality and quantity of training data. Neural network parameters cannot be interpreted by direct observation and require specialized software tools.