SlideShare a Scribd company logo
1 of 32
Download to read offline
1
CNRS & Université Paris-Saclay
BALÁZS KÉGL
DEEP LEARNING	

A STORY OF HARDWARE, DATA, AND
TECHNIQUES&TRICKS
2
The bumpy 60-year history that
led to the current state of the
art
and an overview of what you
can do with it
3
DEEP LEARNING = THREE INTERTWINING STORY
techniques / tricks hardware data
1957-69

dawn
perceptron early mainframes toy linear, small images, XOR
1986-95

golden age
early NNs workstations MNIST
2006-

deep learning
deep NNs GPU,TPU, Intel Xeon Phi Imagenet
Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities
between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and
the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264–
4096–4096–1000.
neurons in a kernel map). The second convolutional layer takes as input the (response-normalized
and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 ⇥ 5 ⇥ 48.
The third, fourth, and fifth convolutional layers are connected to one another without any intervening
pooling or normalization layers. The third convolutional layer has 384 kernels of size 3 ⇥ 3 ⇥
256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth
• Classification problem y = f(x)
4
DATA-DRIVEN INFERENCE
x
f y
‘Stomorhina’
f y
‘Scaeva’
x
• Classification problem y = f(x)
• No model to fit, but a large set of (x, y) pairs	

• The source is typically observation + human labeling
• And a loss function L(y, ypred)
5
DATA-DRIVEN INFERENCE
6
THE PERCEPTRON (ROSENBLATT 1957)
Weights were encoded in potentiometers, and weight
updates during learning were performed by electric motors.
7
THE PERCEPTRON (ROSENBLATT 1957)
Based on Rosenblatt's
statements, The New York
Times reported the perceptron
to be "the embryo of an
electronic computer that [the
Navy] expects will be able to
walk, talk, see, write, reproduce
itself and be conscious of its
existence."
8
MINSKY-PAPERT (1969)
It is impossible to linearly
separate the XOR function
The first winter: 1969 - 86
Is it?
• We knew in the early seventies that the nonlinearity
problem was possible to overcome (Grossberg 72)	

• It was also suggested by several authors that error
gradients could be back propagated through the chain
rule	

• So why the winter? floating point multiplication was
expensive
9
MULTI-LAYER PERCEPTRON
The XOR net
10
BACK PROPAGATION
• Convolutional nets	

• The first algorithmic tricks: initialization, weight decay,
early stopping	

• Some limited understanding of the theory	

• First commercial success:AT&T check reader (Bottou,
LeCun, Burges, Nohl, Bengio, Haffner, 1996)
11
THE GOLDEN AGE (1986-95)
12
THE AT&T CHECK READER
• Reading checks is more
than character
recognition	

• If all steps are
differentiable, the whole
system can be trained
end-to-end by backdrop	

• Does it ring a bell?
(Google’s TensorFlow)
13
THE AT&T CHECK READER
• The mainstream narrative	

• Nonconvexity, local minima, lack of theoretical understanding - BS,
looking for your key where its lit	

• The vanishing gradient - true	

• Lots of nuts and bolts - partially true but would not explain if it was
worth the effort	

• Strong competitors with an order of magnitude less engineering:
the support vector machine, forests, boosting - true
14
THE SECOND WINTER (1996-2006-2012)
• The real story	

• We didn’t have the computational power and architecture and
large enough data to train deep nets	

• Random forests are way less high-maintenance and they are on par
with single-hidden-layer (shallow) nets, even today	

• We missed some of the tricks due to lack of critical mass in
research: trial and error is expensive	

• NNs didn’t disappear from industry: the check reader processes
20M checks per day, today
15
THE SECOND WINTER (1996-2006-2012)
16
FIRST TAKE-HOME MESSAGE
Before you jump on
the deep learning
bandwagon: scikit-
learn forests +
xgboost gets 

>90% performance
on >90% of the
industrial problems,
cautious estimate
• NNs are back on the research agenda
17
2006: A NEW WAVE BEGINS
18
2009: IMAGENET
“We believe that a large-scale ontology of images is a critical
resource for developing advanced, large-scale content-based
image search and image understanding algorithms, as well as for
providing critical training and benchmarking data for such
algorithms.” (Fei Fei Li et al CVPR09)	

!
• 80K hierarchical categories	

• 80M images of size >100x100	

• labeled by 50K Amazon Turks
19
2009: IMAGENET
20
GPUS (2004 - )
21
GPUS (2004 - )
• dropout, ReLU, max-pooling, data augmentation, batch normalization,
automatic differentiation, end-to-end training, lots of layers	

• Krizhevsky, Sutskever, Hinton (2012): 1.2M images, 60M parameters,
6 days training on two GPUs
22
TECHNIQUES & TRICKS
23
IMAGENET COMPETITIONS
24
SECOND TAKE-HOME MESSAGE
To make deep learning shine,
you need huge labeled data sets and time to train
• Imagenet (80M >100x100 color images, 80K classes)	

• FaceBook (300M photos/day)	

• Google (300h of video/minute)
25
SECOND TAKE-HOME MESSAGE
To make deep learning shine,
you need huge labeled data sets and time to train
• Theano	

• TensorFlow	

• Keras	

• Caffe	

• Torch
26
TODAY: EASY-TO-USE LIBRARIES
27
TODAY: HARDWARE
Google TPU
28
COMMERCIAL APPLICATIONS
29
GOOGLE IMAGE SEARCH
30
FACE RECOGNITION/DETECTION	

A 6B$ MARKET IN 2020
31
SELF-DRVING CARS
32

More Related Content

Similar to A historical introduction to deep learning: hardware, data, and tricks

AI is Impacting HPC Everywhere
AI is Impacting HPC EverywhereAI is Impacting HPC Everywhere
AI is Impacting HPC Everywhereinside-BigData.com
 
TensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsSeldon
 
Neural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsNeural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsHéloïse Nonne
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...BigDataEverywhere
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraLarry Smarr
 
Quantum Computing and its security implications
Quantum Computing and its security implicationsQuantum Computing and its security implications
Quantum Computing and its security implicationsInnoTech
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural NetworksDean Wyatte
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima PanditPurnima Pandit
 
Quantum Information FINAL.pptx
Quantum Information FINAL.pptxQuantum Information FINAL.pptx
Quantum Information FINAL.pptxgitrahekno
 
Big Data - Yesterday, Today and Tomorrow by John Mashey, Techviser
Big Data - Yesterday, Today and Tomorrow by John Mashey, TechviserBig Data - Yesterday, Today and Tomorrow by John Mashey, Techviser
Big Data - Yesterday, Today and Tomorrow by John Mashey, TechviserAngela Hey
 
Deep Learning Training at Intel
Deep Learning Training at IntelDeep Learning Training at Intel
Deep Learning Training at IntelAtul Vaish
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaFacultad de Informática UCM
 
Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...
Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...
Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...Larry Smarr
 
ETE444-lec1-nano-introduction.pdf
ETE444-lec1-nano-introduction.pdfETE444-lec1-nano-introduction.pdf
ETE444-lec1-nano-introduction.pdfmashiur
 
Quantum Computing Applications in Power Systems
Quantum Computing Applications in Power SystemsQuantum Computing Applications in Power Systems
Quantum Computing Applications in Power SystemsPower System Operation
 
Quantum computation - Introduction
Quantum computation - IntroductionQuantum computation - Introduction
Quantum computation - IntroductionAakash Martand
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in PythonKv Sagar
 
Quantum nature poli_mi_ddm_200115
Quantum nature poli_mi_ddm_200115Quantum nature poli_mi_ddm_200115
Quantum nature poli_mi_ddm_200115domenico di mola
 

Similar to A historical introduction to deep learning: hardware, data, and tricks (20)

AI is Impacting HPC Everywhere
AI is Impacting HPC EverywhereAI is Impacting HPC Everywhere
AI is Impacting HPC Everywhere
 
TensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative models
 
Neural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsNeural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for Physicists
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era
 
Quantum Computing and its security implications
Quantum Computing and its security implicationsQuantum Computing and its security implications
Quantum Computing and its security implications
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
 
Quantum computers
Quantum computersQuantum computers
Quantum computers
 
Quantum Information FINAL.pptx
Quantum Information FINAL.pptxQuantum Information FINAL.pptx
Quantum Information FINAL.pptx
 
Big Data - Yesterday, Today and Tomorrow by John Mashey, Techviser
Big Data - Yesterday, Today and Tomorrow by John Mashey, TechviserBig Data - Yesterday, Today and Tomorrow by John Mashey, Techviser
Big Data - Yesterday, Today and Tomorrow by John Mashey, Techviser
 
Deep Learning Training at Intel
Deep Learning Training at IntelDeep Learning Training at Intel
Deep Learning Training at Intel
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de Riqueza
 
Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...
Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...
Metacomputer Architecture of the Global LambdaGrid: How Personal Light Paths ...
 
ETE444-lec1-nano-introduction.pdf
ETE444-lec1-nano-introduction.pdfETE444-lec1-nano-introduction.pdf
ETE444-lec1-nano-introduction.pdf
 
Quantum Computing Applications in Power Systems
Quantum Computing Applications in Power SystemsQuantum Computing Applications in Power Systems
Quantum Computing Applications in Power Systems
 
Quantum computation - Introduction
Quantum computation - IntroductionQuantum computation - Introduction
Quantum computation - Introduction
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
 
Quantum nature poli_mi_ddm_200115
Quantum nature poli_mi_ddm_200115Quantum nature poli_mi_ddm_200115
Quantum nature poli_mi_ddm_200115
 

More from Balázs Kégl

Data-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural netsData-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural netsBalázs Kégl
 
Model-based reinforcement learning and self-driving engineering systems
Model-based reinforcement learning and self-driving engineering systemsModel-based reinforcement learning and self-driving engineering systems
Model-based reinforcement learning and self-driving engineering systemsBalázs Kégl
 
Managing the AI process: putting humans (back) in the loop
Managing the AI process: putting humans (back) in the loopManaging the AI process: putting humans (back) in the loop
Managing the AI process: putting humans (back) in the loopBalázs Kégl
 
DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
   DARMDN: Deep autoregressive mixture density nets for dynamical system mode...   DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
DARMDN: Deep autoregressive mixture density nets for dynamical system mode...Balázs Kégl
 
Build your own data challenge, or just organize team work
Build your own data challenge, or just organize team workBuild your own data challenge, or just organize team work
Build your own data challenge, or just organize team workBalázs Kégl
 
RAMP: Collaborative challenge with code submission
RAMP: Collaborative challenge with code submissionRAMP: Collaborative challenge with code submission
RAMP: Collaborative challenge with code submissionBalázs Kégl
 
Deep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesDeep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesBalázs Kégl
 
What is wrong with data challenges
What is wrong with data challengesWhat is wrong with data challenges
What is wrong with data challengesBalázs Kégl
 
The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)Balázs Kégl
 
Learning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physicsLearning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physicsBalázs Kégl
 
The Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data ScienceThe Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data ScienceBalázs Kégl
 

More from Balázs Kégl (11)

Data-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural netsData-driven hypothesis generation using deep neural nets
Data-driven hypothesis generation using deep neural nets
 
Model-based reinforcement learning and self-driving engineering systems
Model-based reinforcement learning and self-driving engineering systemsModel-based reinforcement learning and self-driving engineering systems
Model-based reinforcement learning and self-driving engineering systems
 
Managing the AI process: putting humans (back) in the loop
Managing the AI process: putting humans (back) in the loopManaging the AI process: putting humans (back) in the loop
Managing the AI process: putting humans (back) in the loop
 
DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
   DARMDN: Deep autoregressive mixture density nets for dynamical system mode...   DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
 
Build your own data challenge, or just organize team work
Build your own data challenge, or just organize team workBuild your own data challenge, or just organize team work
Build your own data challenge, or just organize team work
 
RAMP: Collaborative challenge with code submission
RAMP: Collaborative challenge with code submissionRAMP: Collaborative challenge with code submission
RAMP: Collaborative challenge with code submission
 
Deep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesDeep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiatives
 
What is wrong with data challenges
What is wrong with data challengesWhat is wrong with data challenges
What is wrong with data challenges
 
The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)The systemic challenges in data science initiatives (and some solutions)
The systemic challenges in data science initiatives (and some solutions)
 
Learning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physicsLearning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physics
 
The Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data ScienceThe Paris-Saclay Center for Data Science
The Paris-Saclay Center for Data Science
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

A historical introduction to deep learning: hardware, data, and tricks

  • 1. 1 CNRS & Université Paris-Saclay BALÁZS KÉGL DEEP LEARNING A STORY OF HARDWARE, DATA, AND TECHNIQUES&TRICKS
  • 2. 2 The bumpy 60-year history that led to the current state of the art and an overview of what you can do with it
  • 3. 3 DEEP LEARNING = THREE INTERTWINING STORY techniques / tricks hardware data 1957-69
 dawn perceptron early mainframes toy linear, small images, XOR 1986-95
 golden age early NNs workstations MNIST 2006-
 deep learning deep NNs GPU,TPU, Intel Xeon Phi Imagenet Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264– 4096–4096–1000. neurons in a kernel map). The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 ⇥ 5 ⇥ 48. The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers. The third convolutional layer has 384 kernels of size 3 ⇥ 3 ⇥ 256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth
  • 4. • Classification problem y = f(x) 4 DATA-DRIVEN INFERENCE x f y ‘Stomorhina’ f y ‘Scaeva’ x
  • 5. • Classification problem y = f(x) • No model to fit, but a large set of (x, y) pairs • The source is typically observation + human labeling • And a loss function L(y, ypred) 5 DATA-DRIVEN INFERENCE
  • 6. 6 THE PERCEPTRON (ROSENBLATT 1957) Weights were encoded in potentiometers, and weight updates during learning were performed by electric motors.
  • 7. 7 THE PERCEPTRON (ROSENBLATT 1957) Based on Rosenblatt's statements, The New York Times reported the perceptron to be "the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence."
  • 8. 8 MINSKY-PAPERT (1969) It is impossible to linearly separate the XOR function The first winter: 1969 - 86 Is it?
  • 9. • We knew in the early seventies that the nonlinearity problem was possible to overcome (Grossberg 72) • It was also suggested by several authors that error gradients could be back propagated through the chain rule • So why the winter? floating point multiplication was expensive 9 MULTI-LAYER PERCEPTRON The XOR net
  • 11. • Convolutional nets • The first algorithmic tricks: initialization, weight decay, early stopping • Some limited understanding of the theory • First commercial success:AT&T check reader (Bottou, LeCun, Burges, Nohl, Bengio, Haffner, 1996) 11 THE GOLDEN AGE (1986-95)
  • 13. • Reading checks is more than character recognition • If all steps are differentiable, the whole system can be trained end-to-end by backdrop • Does it ring a bell? (Google’s TensorFlow) 13 THE AT&T CHECK READER
  • 14. • The mainstream narrative • Nonconvexity, local minima, lack of theoretical understanding - BS, looking for your key where its lit • The vanishing gradient - true • Lots of nuts and bolts - partially true but would not explain if it was worth the effort • Strong competitors with an order of magnitude less engineering: the support vector machine, forests, boosting - true 14 THE SECOND WINTER (1996-2006-2012)
  • 15. • The real story • We didn’t have the computational power and architecture and large enough data to train deep nets • Random forests are way less high-maintenance and they are on par with single-hidden-layer (shallow) nets, even today • We missed some of the tricks due to lack of critical mass in research: trial and error is expensive • NNs didn’t disappear from industry: the check reader processes 20M checks per day, today 15 THE SECOND WINTER (1996-2006-2012)
  • 16. 16 FIRST TAKE-HOME MESSAGE Before you jump on the deep learning bandwagon: scikit- learn forests + xgboost gets 
 >90% performance on >90% of the industrial problems, cautious estimate
  • 17. • NNs are back on the research agenda 17 2006: A NEW WAVE BEGINS
  • 18. 18 2009: IMAGENET “We believe that a large-scale ontology of images is a critical resource for developing advanced, large-scale content-based image search and image understanding algorithms, as well as for providing critical training and benchmarking data for such algorithms.” (Fei Fei Li et al CVPR09) !
  • 19. • 80K hierarchical categories • 80M images of size >100x100 • labeled by 50K Amazon Turks 19 2009: IMAGENET
  • 22. • dropout, ReLU, max-pooling, data augmentation, batch normalization, automatic differentiation, end-to-end training, lots of layers • Krizhevsky, Sutskever, Hinton (2012): 1.2M images, 60M parameters, 6 days training on two GPUs 22 TECHNIQUES & TRICKS
  • 24. 24 SECOND TAKE-HOME MESSAGE To make deep learning shine, you need huge labeled data sets and time to train
  • 25. • Imagenet (80M >100x100 color images, 80K classes) • FaceBook (300M photos/day) • Google (300h of video/minute) 25 SECOND TAKE-HOME MESSAGE To make deep learning shine, you need huge labeled data sets and time to train
  • 26. • Theano • TensorFlow • Keras • Caffe • Torch 26 TODAY: EASY-TO-USE LIBRARIES
  • 32. 32