SlideShare a Scribd company logo
1 of 31
Download to read offline
Successes and Frontiers of Deep Learning
Sebastian Ruder

Insight @ NUIG
Aylien
Insight@DCU Deep Learning Workshop, 21 May 2018
Agenda
1. Deep Learning fundamentals
2. Deep Learning successes
3. Frontiers
• Unsupervised learning and transfer learning
2
Deep Learning fundamentals
AI and Deep Learning
4
Artificial Intelligence
Machine Learning
Deep Learning
A Deep Neural Network
• A sequence of linear transformations (matrix multiplications)

with non-linear activation functions in between
• Maps from an input to (typically) output probabilities
5
Deep Learning successes
Image recognition
• AlexNet in 2012
• Deeper architectures, more connections:

Inception, ResNet, DenseNet, ResNeXt
• SotA (May ’18): 2.4% Top-5, 14.6% Top-1
7
Sources: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf, http://kaiminghe.com/cvpr16resnet/cvpr2016_deep_residual_learning_kaiminghe.pdf, https://
research.fb.com/publications/exploring-the-limits-of-weakly-supervised-pretraining/
Image recognition
• Many applications, e.g. Google Photos
8
Automatic speech recognition
• Production systems are based on a pipeline, using NNs
• End-to-end models are catching up (Google reports 

5.6% WER)
• Advances enable voice 

as pervasive input
• 50% of all searches 

will use voice by 2020
9
Sources: https://www.techleer.com/articles/30-google-gives-just-49-word-error-rate-in-speech-recognition/, https://research.googleblog.com/2017/12/improving-end-to-end-models-for-speech.html, https://
www.forbes.com/sites/forbesagencycouncil/2017/11/27/optimizing-for-voice-search-is-more-important-than-ever/#7db41e484a7b
Speech synthesis
• WaveNet, Parallel WaveNet enable human-like speech synthesis
• Adapted from PixelRNNs, PixelCNNs
• Based on fully convolutional nets with different dilation factors
• Vast number of applications, e.g. personal assistants
10
Sources: https://deepmind.com/blog/wavenet-generative-model-raw-audio/, https://deepmind.com/blog/wavenet-launches-google-assistant/, https://arxiv.org/abs/1711.10433, https://arxiv.org/pdf/1601.06759.pdf
Machine translation
• State-of-the-art: Neural machine translation
• Transformer, based on self-attention; RNMT+
• Combined with other improvements: human parity

on English-Chinese (Microsoft)
• Babel-fish earbuds (Google)
• Phrase-based still better on low-resource languages
• Exciting future directions:
• Unsupervised MT
• But: Only superficial understanding; not close to a

‘real’ understanding of language yet
11
Sources: https://arxiv.org/abs/1706.03762, https://arxiv.org/abs/1804.09849, https://arxiv.org/abs/1803.05567, https://www.citi.io/2018/03/09/top-breakthrough-technologies-for-2018-babel-fish-earbuds/, https://
arxiv.org/abs/1711.00043, https://arxiv.org/abs/1710.11041, https://www.theatlantic.com/technology/archive/2018/01/the-shallowness-of-google-translate/551570/
Making ML more accessible
• Accessible libraries such as

TensorFlow, PyTorch, Keras,
etc.
• Mobile, on-device ML
• Opens up many applications:
• Predicting disease in
Cassava plants
• Sorting cucumbers
• Identifying wild animals in
trap images
• Identifying hot dogs
12
Sources: https://medium.com/tensorflow/highlights-from-tensorflow-developer-summit-2018-cd86615714b2, https://cloud.google.com/blog/big-data/2016/08/how-a-japanese-cucumber-farmer-is-using-deep-learning-
and-tensorflow, https://news.developer.nvidia.com/automatically-identify-wild-animals-in-camera-trap-images/, https://medium.com/@timanglade/how-hbos-silicon-valley-built-not-hotdog-with-mobile-tensorflow-
keras-react-native-ef03260747f3
Reading comprehension
Sources: https://blogs.microsoft.com/ai/microsoft-creates-ai-can-read-document-answer-questions-well-person/, http://www.alizila.com/alibaba-ai-model-tops-humans-in-reading-comprehension/, http://
www.newsweek.com/robots-can-now-read-better-humans-putting-millions-jobs-risk-781393, http://money.cnn.com/2018/01/15/technology/reading-robot-alibaba-microsoft-stanford/index.html
13
Reading comprehension — success?
• Constrained task
• Models exploit superficial pattern-matching
• Can be easily fooled

—> adversarial examples
14
Sources: http://u.cs.biu.ac.il/~yogo/squad-vs-human.pdf, http://www.aclweb.org/anthology/D17-1215
Adversarial examples
• Adversarial examples: intentionally designed examples to cause the model to
make a mistake
• Active research area, no general-purpose defence yet
• E.g. obfuscated gradients beat 7/8 defences at ICLR 2018
• Implications for other disciplines (language, speech)
15
Sources: https://www.sri.inf.ethz.ch/riai2017/Explaining%20and%20Harnessing%20Adversarial%20Examples.pdf, https://blog.openai.com/adversarial-example-research/, https://github.com/anishathalye/obfuscated-
gradients, https://thegradient.pub/adversarial-speech/
Frontiers
Unsupervised learning
• Unsupervised learning

is important
• Still elusive
• Most successful recent

direction: Generative

Adversarial Networks (GANs)
• Still hard to train
• Mainly useful so far for 

generative tasks in

computer vision
17
Sources: http://ruder.io/highlights-nips-2016/
Semi-supervised learning
• Advances in semi-supervised learning
(SSL)
• 19.8% Top-5 error on ImageNet with

only 10% of the labels
• However:
• Simple baselines—when properly
tuned—or heavy regularised

models are often competitive
• Transfer learning often works better
• Performance can quickly degrade

with a domain shift
18
Sources: https://github.com/CuriousAI/mean-teacher/blob/master/nips_2017_slides.pdf, https://arxiv.org/abs/1804.09170
Semi-supervised learning
• New methods for SSL and domain adaptation rarely compare against classic
SSL algorithms
• Re-evaluate general-purpose bootstrapping approaches with NNs vs. state-of-
the-art
• Classic tri-training (with some modifications) 

performs best
• Tri-training with neural networks is expensive
• Propose a more time- and space-efficient

version, Multi-task tri-training
19
S. Ruder, B. Plank. Strong Baselines for Neural Semi-supervised Learning under Domain Shift. ACL 2018.
Transfer learning
• Solve a problem by leveraging knowledge from a related problem
• Most useful when few labeled examples are available
• Most common use-case: Fine-tune a pre-trained ImageNet model
20
Sources: https://www.saagie.com/blog/object-detection-part1
Transfer learning
1. Remove final classification layer
21
Transfer learning
22
1. Remove final classification layer
2. Replace with own classification layer
10
Transfer learning
23
1. Remove final classification layer
2. Replace with own classification layer
3. Freeze other model parameters
10
Transfer learning
24
1. Remove final classification layer
2. Replace with own classification layer
3. Freeze other model parameters
4. Train final layer on new labeled examples
10
Train
Transfer learning for NLP
• Transfer via pre-trained word embeddings is common
• Only affects first layer
• Recent methods still initialize most parameters randomly
• Universal Language Model Fine-tuning (ULMFit)
25
J. Howard*, S. Ruder*. Universal Language Model Fine-tuning for Text Classification. ACL 2018.
Transfer learning for NLP
• Transfer via pre-trained word embeddings is common
• Only affects first layer
• Recent methods still initialize most parameters randomly
• Universal Language Model Fine-tuning (ULMFit)
1. Pre-train a language model (LM) on a large corpus, e.g. Wikipedia
26
J. Howard*, S. Ruder*. Universal Language Model Fine-tuning for Text Classification. ACL 2018.
Transfer learning for NLP
• Transfer via pre-trained word embeddings is common
• Only affects first layer
• Recent methods still initialize most parameters randomly
• Universal Language Model Fine-tuning (ULMFit)
1. Pre-train a language model (LM) on a large corpus, e.g. Wikipedia
2. Fine-tune the LM on unlabelled data with new techniques
27
J. Howard*, S. Ruder*. Universal Language Model Fine-tuning for Text Classification. ACL 2018.
Transfer learning for NLP
• Transfer via pre-trained word embeddings is common
• Only affects first layer
• Recent methods still initialize most parameters randomly
• Universal Language Model Fine-tuning (ULMFit)
1. Pre-train a language model (LM) on a large corpus, e.g. Wikipedia
2. Fine-tune the LM on unlabelled data with new techniques
3. Fine-tune classifier on labeled examples
28
J. Howard*, S. Ruder*. Universal Language Model Fine-tuning for Text Classification. ACL 2018.
Transfer learning for NLP
• State-of-the-art on six widely used text classification datasets
• Enables training with orders of magnitude less data
29
J. Howard*, S. Ruder*. Universal Language Model Fine-tuning for Text Classification. ACL 2018.
Many more frontiers
• Reasoning
• Bias
• Interpretability
• Adversarial examples
• Generalization
30
Thanks for your attention!
More info: http://ruder.io/
Twitter: @seb_ruder
Aylien: jobs@aylien.com — We are hiring!

More Related Content

What's hot

CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools Lifeng (Aaron) Han
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationSeonghyun Kim
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingSeonghyun Kim
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Practical machine learning - Part 1
Practical machine learning - Part 1Practical machine learning - Part 1
Practical machine learning - Part 1Traian Rebedea
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...Ahmed Magdy Ezzeldin, MSc.
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovBhaskar Mitra
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLPAnuj Gupta
 
Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Tasnim Ara Islam
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoLidia Pivovarova
 

What's hot (20)

CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Practical machine learning - Part 1
Practical machine learning - Part 1Practical machine learning - Part 1
Practical machine learning - Part 1
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas Mikolov
 
NLP Project Full Cycle
NLP Project Full CycleNLP Project Full Cycle
NLP Project Full Cycle
 
Aspects of NLP Practice
Aspects of NLP PracticeAspects of NLP Practice
Aspects of NLP Practice
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLP
 
Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, Couto
 

Similar to Successes and Frontiers of Deep Learning

GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model reviewSeoung-Ho Choi
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupYves Peirsman
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH WarNik Chow
 
Proactive Empirical Assessment of New Language Feature Adoption via Automated...
Proactive Empirical Assessment of New Language Feature Adoption via Automated...Proactive Empirical Assessment of New Language Feature Adoption via Automated...
Proactive Empirical Assessment of New Language Feature Adoption via Automated...Raffi Khatchadourian
 
Some perspectives from the Astropy Project
Some perspectives from the Astropy ProjectSome perspectives from the Astropy Project
Some perspectives from the Astropy ProjectKelle Cruz
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Fwdays
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...Andres Garcia-Silva
 
Pathways to Technology Transfer and Adoption: Achievements and Challenges
Pathways to Technology Transfer and Adoption: Achievements and ChallengesPathways to Technology Transfer and Adoption: Achievements and Challenges
Pathways to Technology Transfer and Adoption: Achievements and ChallengesTao Xie
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science CommunicationIsabelle Augenstein
 
NLP Workshop Presentation at Universitat de Barcelona
NLP Workshop Presentation at Universitat de BarcelonaNLP Workshop Presentation at Universitat de Barcelona
NLP Workshop Presentation at Universitat de BarcelonaSergiPons5
 
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)Alexander Borzunov
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignAnubhav Jain
 

Similar to Successes and Frontiers of Deep Learning (20)

GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
Proactive Empirical Assessment of New Language Feature Adoption via Automated...
Proactive Empirical Assessment of New Language Feature Adoption via Automated...Proactive Empirical Assessment of New Language Feature Adoption via Automated...
Proactive Empirical Assessment of New Language Feature Adoption via Automated...
 
Some perspectives from the Astropy Project
Some perspectives from the Astropy ProjectSome perspectives from the Astropy Project
Some perspectives from the Astropy Project
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
 
Pathways to Technology Transfer and Adoption: Achievements and Challenges
Pathways to Technology Transfer and Adoption: Achievements and ChallengesPathways to Technology Transfer and Adoption: Achievements and Challenges
Pathways to Technology Transfer and Adoption: Achievements and Challenges
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
 
NLP Workshop Presentation at Universitat de Barcelona
NLP Workshop Presentation at Universitat de BarcelonaNLP Workshop Presentation at Universitat de Barcelona
NLP Workshop Presentation at Universitat de Barcelona
 
FLOSS Case Studies
FLOSS Case StudiesFLOSS Case Studies
FLOSS Case Studies
 
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
 

More from Sebastian Ruder

Strong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain ShiftStrong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain ShiftSebastian Ruder
 
On the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary InductionOn the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary InductionSebastian Ruder
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftSebastian Ruder
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep LearningSebastian Ruder
 
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoHuman Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoSebastian Ruder
 
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiMachine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiSebastian Ruder
 
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana Ifrim
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana IfrimHashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana Ifrim
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana IfrimSebastian Ruder
 
Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...Sebastian Ruder
 
Spoken Dialogue Systems and Social Talk - Emer Gilmartin
Spoken Dialogue Systems and Social Talk - Emer GilmartinSpoken Dialogue Systems and Social Talk - Emer Gilmartin
Spoken Dialogue Systems and Social Talk - Emer GilmartinSebastian Ruder
 
Modeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverModeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverSebastian Ruder
 
Funded PhD/MSc. Opportunities at AYLIEN
Funded PhD/MSc. Opportunities at AYLIENFunded PhD/MSc. Opportunities at AYLIEN
Funded PhD/MSc. Opportunities at AYLIENSebastian Ruder
 
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...Sebastian Ruder
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Sebastian Ruder
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Sebastian Ruder
 
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...Sebastian Ruder
 
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
A Hierarchical Model of Reviews for Aspect-based Sentiment AnalysisA Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
A Hierarchical Model of Reviews for Aspect-based Sentiment AnalysisSebastian Ruder
 
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...Sebastian Ruder
 

More from Sebastian Ruder (17)

Strong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain ShiftStrong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain Shift
 
On the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary InductionOn the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary Induction
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain Shift
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoHuman Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
 
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiMachine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
 
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana Ifrim
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana IfrimHashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana Ifrim
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana Ifrim
 
Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...
 
Spoken Dialogue Systems and Social Talk - Emer Gilmartin
Spoken Dialogue Systems and Social Talk - Emer GilmartinSpoken Dialogue Systems and Social Talk - Emer Gilmartin
Spoken Dialogue Systems and Social Talk - Emer Gilmartin
 
Modeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverModeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John Glover
 
Funded PhD/MSc. Opportunities at AYLIEN
Funded PhD/MSc. Opportunities at AYLIENFunded PhD/MSc. Opportunities at AYLIEN
Funded PhD/MSc. Opportunities at AYLIEN
 
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
 
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
 
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
A Hierarchical Model of Reviews for Aspect-based Sentiment AnalysisA Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
 
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...
 

Recently uploaded

3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docx3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docxUlahVanessaBasa
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsMarkus Roggen
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxpriyankatabhane
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
AICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awarenessAICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awareness1hk20is002
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptAmirRaziq1
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx201bo007
 
Interpreting SDSS extragalactic data in the era of JWST
Interpreting SDSS extragalactic data in the era of JWSTInterpreting SDSS extragalactic data in the era of JWST
Interpreting SDSS extragalactic data in the era of JWSTAlexander F. Mayer
 
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...jana861314
 
lect1 introduction.pptx microbiology ppt
lect1 introduction.pptx microbiology pptlect1 introduction.pptx microbiology ppt
lect1 introduction.pptx microbiology pptzbyb6vmmsd
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerLuis Miguel Chong Chong
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionJadeNovelo1
 
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary MicrobiologyLAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary MicrobiologyChayanika Das
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 

Recently uploaded (20)

3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docx3.-Acknowledgment-Dedication-Abstract.docx
3.-Acknowledgment-Dedication-Abstract.docx
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptx
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
AICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awarenessAICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awareness
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.ppt
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx
 
Interpreting SDSS extragalactic data in the era of JWST
Interpreting SDSS extragalactic data in the era of JWSTInterpreting SDSS extragalactic data in the era of JWST
Interpreting SDSS extragalactic data in the era of JWST
 
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
Speed Breeding in Vegetable Crops- innovative approach for present era of cro...
 
lect1 introduction.pptx microbiology ppt
lect1 introduction.pptx microbiology pptlect1 introduction.pptx microbiology ppt
lect1 introduction.pptx microbiology ppt
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of Cancer
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and Function
 
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary MicrobiologyLAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 

Successes and Frontiers of Deep Learning

  • 1. Successes and Frontiers of Deep Learning Sebastian Ruder
 Insight @ NUIG Aylien Insight@DCU Deep Learning Workshop, 21 May 2018
  • 2. Agenda 1. Deep Learning fundamentals 2. Deep Learning successes 3. Frontiers • Unsupervised learning and transfer learning 2
  • 4. AI and Deep Learning 4 Artificial Intelligence Machine Learning Deep Learning
  • 5. A Deep Neural Network • A sequence of linear transformations (matrix multiplications)
 with non-linear activation functions in between • Maps from an input to (typically) output probabilities 5
  • 7. Image recognition • AlexNet in 2012 • Deeper architectures, more connections:
 Inception, ResNet, DenseNet, ResNeXt • SotA (May ’18): 2.4% Top-5, 14.6% Top-1 7 Sources: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf, http://kaiminghe.com/cvpr16resnet/cvpr2016_deep_residual_learning_kaiminghe.pdf, https:// research.fb.com/publications/exploring-the-limits-of-weakly-supervised-pretraining/
  • 8. Image recognition • Many applications, e.g. Google Photos 8
  • 9. Automatic speech recognition • Production systems are based on a pipeline, using NNs • End-to-end models are catching up (Google reports 
 5.6% WER) • Advances enable voice 
 as pervasive input • 50% of all searches 
 will use voice by 2020 9 Sources: https://www.techleer.com/articles/30-google-gives-just-49-word-error-rate-in-speech-recognition/, https://research.googleblog.com/2017/12/improving-end-to-end-models-for-speech.html, https:// www.forbes.com/sites/forbesagencycouncil/2017/11/27/optimizing-for-voice-search-is-more-important-than-ever/#7db41e484a7b
  • 10. Speech synthesis • WaveNet, Parallel WaveNet enable human-like speech synthesis • Adapted from PixelRNNs, PixelCNNs • Based on fully convolutional nets with different dilation factors • Vast number of applications, e.g. personal assistants 10 Sources: https://deepmind.com/blog/wavenet-generative-model-raw-audio/, https://deepmind.com/blog/wavenet-launches-google-assistant/, https://arxiv.org/abs/1711.10433, https://arxiv.org/pdf/1601.06759.pdf
  • 11. Machine translation • State-of-the-art: Neural machine translation • Transformer, based on self-attention; RNMT+ • Combined with other improvements: human parity
 on English-Chinese (Microsoft) • Babel-fish earbuds (Google) • Phrase-based still better on low-resource languages • Exciting future directions: • Unsupervised MT • But: Only superficial understanding; not close to a
 ‘real’ understanding of language yet 11 Sources: https://arxiv.org/abs/1706.03762, https://arxiv.org/abs/1804.09849, https://arxiv.org/abs/1803.05567, https://www.citi.io/2018/03/09/top-breakthrough-technologies-for-2018-babel-fish-earbuds/, https:// arxiv.org/abs/1711.00043, https://arxiv.org/abs/1710.11041, https://www.theatlantic.com/technology/archive/2018/01/the-shallowness-of-google-translate/551570/
  • 12. Making ML more accessible • Accessible libraries such as
 TensorFlow, PyTorch, Keras, etc. • Mobile, on-device ML • Opens up many applications: • Predicting disease in Cassava plants • Sorting cucumbers • Identifying wild animals in trap images • Identifying hot dogs 12 Sources: https://medium.com/tensorflow/highlights-from-tensorflow-developer-summit-2018-cd86615714b2, https://cloud.google.com/blog/big-data/2016/08/how-a-japanese-cucumber-farmer-is-using-deep-learning- and-tensorflow, https://news.developer.nvidia.com/automatically-identify-wild-animals-in-camera-trap-images/, https://medium.com/@timanglade/how-hbos-silicon-valley-built-not-hotdog-with-mobile-tensorflow- keras-react-native-ef03260747f3
  • 13. Reading comprehension Sources: https://blogs.microsoft.com/ai/microsoft-creates-ai-can-read-document-answer-questions-well-person/, http://www.alizila.com/alibaba-ai-model-tops-humans-in-reading-comprehension/, http:// www.newsweek.com/robots-can-now-read-better-humans-putting-millions-jobs-risk-781393, http://money.cnn.com/2018/01/15/technology/reading-robot-alibaba-microsoft-stanford/index.html 13
  • 14. Reading comprehension — success? • Constrained task • Models exploit superficial pattern-matching • Can be easily fooled
 —> adversarial examples 14 Sources: http://u.cs.biu.ac.il/~yogo/squad-vs-human.pdf, http://www.aclweb.org/anthology/D17-1215
  • 15. Adversarial examples • Adversarial examples: intentionally designed examples to cause the model to make a mistake • Active research area, no general-purpose defence yet • E.g. obfuscated gradients beat 7/8 defences at ICLR 2018 • Implications for other disciplines (language, speech) 15 Sources: https://www.sri.inf.ethz.ch/riai2017/Explaining%20and%20Harnessing%20Adversarial%20Examples.pdf, https://blog.openai.com/adversarial-example-research/, https://github.com/anishathalye/obfuscated- gradients, https://thegradient.pub/adversarial-speech/
  • 17. Unsupervised learning • Unsupervised learning
 is important • Still elusive • Most successful recent
 direction: Generative
 Adversarial Networks (GANs) • Still hard to train • Mainly useful so far for 
 generative tasks in
 computer vision 17 Sources: http://ruder.io/highlights-nips-2016/
  • 18. Semi-supervised learning • Advances in semi-supervised learning (SSL) • 19.8% Top-5 error on ImageNet with
 only 10% of the labels • However: • Simple baselines—when properly tuned—or heavy regularised
 models are often competitive • Transfer learning often works better • Performance can quickly degrade
 with a domain shift 18 Sources: https://github.com/CuriousAI/mean-teacher/blob/master/nips_2017_slides.pdf, https://arxiv.org/abs/1804.09170
  • 19. Semi-supervised learning • New methods for SSL and domain adaptation rarely compare against classic SSL algorithms • Re-evaluate general-purpose bootstrapping approaches with NNs vs. state-of- the-art • Classic tri-training (with some modifications) 
 performs best • Tri-training with neural networks is expensive • Propose a more time- and space-efficient
 version, Multi-task tri-training 19 S. Ruder, B. Plank. Strong Baselines for Neural Semi-supervised Learning under Domain Shift. ACL 2018.
  • 20. Transfer learning • Solve a problem by leveraging knowledge from a related problem • Most useful when few labeled examples are available • Most common use-case: Fine-tune a pre-trained ImageNet model 20 Sources: https://www.saagie.com/blog/object-detection-part1
  • 21. Transfer learning 1. Remove final classification layer 21
  • 22. Transfer learning 22 1. Remove final classification layer 2. Replace with own classification layer 10
  • 23. Transfer learning 23 1. Remove final classification layer 2. Replace with own classification layer 3. Freeze other model parameters 10
  • 24. Transfer learning 24 1. Remove final classification layer 2. Replace with own classification layer 3. Freeze other model parameters 4. Train final layer on new labeled examples 10 Train
  • 25. Transfer learning for NLP • Transfer via pre-trained word embeddings is common • Only affects first layer • Recent methods still initialize most parameters randomly • Universal Language Model Fine-tuning (ULMFit) 25 J. Howard*, S. Ruder*. Universal Language Model Fine-tuning for Text Classification. ACL 2018.
  • 26. Transfer learning for NLP • Transfer via pre-trained word embeddings is common • Only affects first layer • Recent methods still initialize most parameters randomly • Universal Language Model Fine-tuning (ULMFit) 1. Pre-train a language model (LM) on a large corpus, e.g. Wikipedia 26 J. Howard*, S. Ruder*. Universal Language Model Fine-tuning for Text Classification. ACL 2018.
  • 27. Transfer learning for NLP • Transfer via pre-trained word embeddings is common • Only affects first layer • Recent methods still initialize most parameters randomly • Universal Language Model Fine-tuning (ULMFit) 1. Pre-train a language model (LM) on a large corpus, e.g. Wikipedia 2. Fine-tune the LM on unlabelled data with new techniques 27 J. Howard*, S. Ruder*. Universal Language Model Fine-tuning for Text Classification. ACL 2018.
  • 28. Transfer learning for NLP • Transfer via pre-trained word embeddings is common • Only affects first layer • Recent methods still initialize most parameters randomly • Universal Language Model Fine-tuning (ULMFit) 1. Pre-train a language model (LM) on a large corpus, e.g. Wikipedia 2. Fine-tune the LM on unlabelled data with new techniques 3. Fine-tune classifier on labeled examples 28 J. Howard*, S. Ruder*. Universal Language Model Fine-tuning for Text Classification. ACL 2018.
  • 29. Transfer learning for NLP • State-of-the-art on six widely used text classification datasets • Enables training with orders of magnitude less data 29 J. Howard*, S. Ruder*. Universal Language Model Fine-tuning for Text Classification. ACL 2018.
  • 30. Many more frontiers • Reasoning • Bias • Interpretability • Adversarial examples • Generalization 30
  • 31. Thanks for your attention! More info: http://ruder.io/ Twitter: @seb_ruder Aylien: jobs@aylien.com — We are hiring!