SlideShare a Scribd company logo
1 of 20
Download to read offline
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A Pragmatic Introduction to
Natural Language Processing models
Julien Simon
Global Evangelist, AI & Machine Learning
@julsimon
https://medium.com/@julsimon
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Problem statement
• Since the dawn of AI, NLP has been a major research field
• Text classification, machine translation, text generation, chat bots, voice assistants, etc.
• NLP apps require a language model in order to predict the next word
• Given a sequence of words (w1, … , wn), predict wn+1 that has the highest probability
• Vocabulary size can be hundreds of thousands of words
… in millions of documents
• Can we build a compact mathematical representation of language,
that would help us with a variety of downstream NLP tasks?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
« You shall know a word by the company it keeps », Firth (1957)
• Initial idea: build word vectors from co-occurrence counts
• Also called embeddings
• High dimensional: at least 50, up to 300
• Much more compact than an n*n matrix, which would be extremely sparse
• Words with similar meanings should have similar vectors
• “car” ≈ “automobile” ≈ “sedan”
• The distance between related vectors should be similar
• distance (“Paris”, ”France”) ≈ distance(“Berlin”, ”Germany”)
• distance(“hot”, ”hotter”) ≈ distance(“cold”, ”colder”)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
High-level view
1. Start from a large text corpus (100s of millions of words, even billions)
2. Preprocess the corpus
• Tokenize: « hello, world! » → « <BOS>hello<SP>world<SP>!<EOS>»
• Multi-word entities: « Rio de Janeiro » → « rio_de_janeiro »
3. Build the vocabulary
4. Learn vector representations for all words
… or simply use (or fine-tune) pre-trained vectors (more on this later)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Word2Vec (2013)
https://arxiv.org/abs/1301.3781
https://code.google.com/archive/p/word2vec/
• Continuous bag of words (CBOW):
• the model predicts the current word from surrounding context words
• Word order doesn’t matter (hence the ‘bag of words’)
• Skipgram
• the model uses the current word to predict the surrounding window of
context words
• This may work better when little data is available
• CBOW trains faster, skipgram is more accurate
• C code, based on shallow neural network
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Global Vectors aka GloVe (2014)
https://nlp.stanford.edu/projects/glove/
https://github.com/stanfordnlp/GloVe
https://www.quora.com/How-is-GloVe-different-from-word2vec
• Performance generally similar to Word2Vec
• Several pre-trained embedding collections:
up to 840 billion tokens, 2.2 million vocabulary, 300 dimensions
• C code, based on matrix factorization
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
FastText (2016)
https://arxiv.org/abs/1607.04606 + https://arxiv.org/abs/1802.06893
https://fasttext.cc/
https://www.quora.com/What-is-the-main-difference-between-word2vec-and-fastText
• Extension of Word2Vec: each word is a set of subwords, aka n-grams
• « Computer », n=5 : <START>Comp , compu, omput, mpute, puter, uter<END>
• A word vector is the average of its subword vectors
• Good for unknown/mispelled words, as they share subwords with known words
• « Computerized » and « Cmputer » should be close to « Computer »
• Three modes
• Unsupervised learning: compute embeddings (294 languages available)
• Supervised learning: use / fine-tune embeddings for multi-label, multi-class classification
• Also language detection for 170 languages
• Multithreaded C++ code, with Python API
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://dl.acm.org/citation.cfm?id=3146354
https://aws.amazon.com/blogs/machine-learning/enhanced-text-classification-and-
word-vectors-using-amazon-sagemaker-blazingtext/
• Amazon-invented algorithm,
available in Amazon SageMaker
• Extends FastText with GPU capabilities
• Unsupervised learning: word vectors
• 20x faster
• CBOW and skip-gram with subword support
• Batch skip-gram for distributed training
• Supervised learning: text classification
• 100x faster
• Models are compatible with FastText
BlazingText (2017)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Limitations of Word2Vec (and family)
• Some words have different meanings (aka polysemy)
• « Kevin, stop throwing rocks! » vs. « Machine Learning rocks »
• Word2Vec encodes the different meanings of a word into the same vector
• Bidirectional context is not taken into account
• Previous words (left-to-right) and next words (right-to-left)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Embeddings from Language Models aka ELMo (02/2018)
https://arxiv.org/abs/1802.05365
https://allennlp.org/elmo
• ELMo generates a context-aware vector
for each word
• Character-level CNN + two unidirectional LSTMs
• Bidirectional context, no cheating possible
(can’t peek at the next word)
• “Deep” embeddings, reflecting output from all layers
• No vocabulary, no vector file:
you need to use the model itself
• Reference implementation
with TensorFlow
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bidirectional EncoderRepresentationsfrom Transformers aka BERT (10/2018)
https://arxiv.org/abs/1810.04805
https://github.com/google-research/bert
https://www.quora.com/What-are-the-main-differences-between-the-word-embeddings-of-ELMo-BERT-Word2vec-and-GloVe
• BERT improves on ELMo
• Replace LSTM with Transformers,
which deal better with long-term dependencies
• True bidirectional architecture: left-to-right and right-to-left
contexts are learned by the same network
• 15% of words are randomly masked during training
to prevent cheating
• Pre-trained models: BERT Base and BERT Large
• Masked word prediction
• Next sentence prediction
• Reference implementation with TensorFlow
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Limitations of BERT
• BERT cannot handle more than 512 input tokens
• BERT masks words during training, but not during fine-tuning
(aka training/fine-tuning discrepancy)
• BERT isn’t trained to predict the next word, so it’s not great at text generation
• BERT doesn’t learn dependencies for masked words
• Train « I am going to <MASK> my <MASK> » on « walk » / « dog », « eat » / « breakfast », and « debug » / « code ».
• BERT could legitimately predict « I am going to eat my code » or « I am going to debug my dog » :-/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
XLNet (06/2019)
https://arxiv.org/abs/1906.08237
https://github.com/zihangdai/xlnet
• XLNet beats BERT at 20 tasks
• XLNet uses bidirectional context,
but words are randomly permuted
• No cheating possible
• No masking required
• XLNet Base and XLNet Large
• Reference implementation with TensorFlow
07/2019: ERNIE 2.0 (Baidu)
beats BERT & XLNet
https://github.com/PaddlePaddle/ERNIE/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summing things up
• Word2Vec and friends
• Try pre-trained embeddings first
• Check that the training corpus is similar to your own data
• Same language, similar vocabulary
• Remember that subword models will help with unknown / mispelled words
• If you have exotic requirements AND lots of data, training is not expensive
• ElMo, BERT, XLNet
• Training is very expensive: several days using several GPUs
• Fine-tuning is cheap: just a few GPU hours for SOTA results
• Fine-tuning scripts and pre-trained models are available: start there!
• The ”best model” is the one that works best on your business problem
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo: finding similarities and analogies
with Gluon NLP and pre-trained GloVe embeddings
https://gitlab.com/juliensimon/dlnotebooks/gluonnlp/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo: embeddings with ELMo on TensorFlow
https://gitlab.com/juliensimon/dlnotebooks/blob/master/nlp/ELMO%20TensorFlow.ipynb
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Getting started on AWS
https://ml.aws
https://aws.amazon.com/marketplace/solutions/machine-learning/natural-
language-processing → off the shelf algos and models that might just save you from
this madness ;)
https://aws.amazon.com/sagemaker
https://github.com/awslabs/amazon-sagemaker-examples
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Julien Simon
Global Evangelist, AI & Machine Learning
@julsimon
https://medium.com/@julsimon

More Related Content

What's hot

AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...Julien SIMON
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Julien SIMON
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Julien SIMON
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)Julien SIMON
 
Build, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdfBuild, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdfAmazon Web Services
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)Julien SIMON
 
Workshop: Build Deep Learning Applications with TensorFlow and SageMaker
Workshop: Build Deep Learning Applications with TensorFlow and SageMakerWorkshop: Build Deep Learning Applications with TensorFlow and SageMaker
Workshop: Build Deep Learning Applications with TensorFlow and SageMakerAmazon Web Services
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Julien SIMON
 
Build Text Analytics Solutions with Amazon Comprehend and Amazon Translate
Build Text Analytics Solutions with Amazon Comprehend and Amazon TranslateBuild Text Analytics Solutions with Amazon Comprehend and Amazon Translate
Build Text Analytics Solutions with Amazon Comprehend and Amazon TranslateAmazon Web Services
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Julien SIMON
 
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)Julien SIMON
 
Speed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithmsSpeed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithmsJulien SIMON
 
Machine Learning on AWS (December 2018)
Machine Learning on AWS (December 2018)Machine Learning on AWS (December 2018)
Machine Learning on AWS (December 2018)Julien SIMON
 
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMakerDeep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMakerAmazon Web Services
 
AWS re:Invent 2018 - AIM302 - Machine Learning at the Edge
AWS re:Invent 2018 - AIM302  - Machine Learning at the Edge AWS re:Invent 2018 - AIM302  - Machine Learning at the Edge
AWS re:Invent 2018 - AIM302 - Machine Learning at the Edge Julien SIMON
 
Amazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SI
Amazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SIAmazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SI
Amazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SIAmazon Web Services
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...AWS Summits
 
Workshop: Build an Image-Based Automatic Alert System with Amazon Rekognition
Workshop: Build an Image-Based Automatic Alert System with Amazon RekognitionWorkshop: Build an Image-Based Automatic Alert System with Amazon Rekognition
Workshop: Build an Image-Based Automatic Alert System with Amazon RekognitionAmazon Web Services
 
Workshop: Using Amazon ML Services for Video Transcription and Translation Wo...
Workshop: Using Amazon ML Services for Video Transcription and Translation Wo...Workshop: Using Amazon ML Services for Video Transcription and Translation Wo...
Workshop: Using Amazon ML Services for Video Transcription and Translation Wo...Amazon Web Services
 
AI & ML on AWS: State of the Union
AI & ML on AWS: State of the UnionAI & ML on AWS: State of the Union
AI & ML on AWS: State of the UnionJulien SIMON
 

What's hot (20)

AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
 
Build, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdfBuild, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdf
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
Workshop: Build Deep Learning Applications with TensorFlow and SageMaker
Workshop: Build Deep Learning Applications with TensorFlow and SageMakerWorkshop: Build Deep Learning Applications with TensorFlow and SageMaker
Workshop: Build Deep Learning Applications with TensorFlow and SageMaker
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)
 
Build Text Analytics Solutions with Amazon Comprehend and Amazon Translate
Build Text Analytics Solutions with Amazon Comprehend and Amazon TranslateBuild Text Analytics Solutions with Amazon Comprehend and Amazon Translate
Build Text Analytics Solutions with Amazon Comprehend and Amazon Translate
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)
 
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
Machine Learning: From Notebook to Production with Amazon Sagemaker (April 2018)
 
Speed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithmsSpeed up your Machine Learning workflows with build-in algorithms
Speed up your Machine Learning workflows with build-in algorithms
 
Machine Learning on AWS (December 2018)
Machine Learning on AWS (December 2018)Machine Learning on AWS (December 2018)
Machine Learning on AWS (December 2018)
 
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMakerDeep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
 
AWS re:Invent 2018 - AIM302 - Machine Learning at the Edge
AWS re:Invent 2018 - AIM302  - Machine Learning at the Edge AWS re:Invent 2018 - AIM302  - Machine Learning at the Edge
AWS re:Invent 2018 - AIM302 - Machine Learning at the Edge
 
Amazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SI
Amazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SIAmazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SI
Amazon SageMaker Deep Dive - Meetup AWS Toulouse at D2SI
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
 
Workshop: Build an Image-Based Automatic Alert System with Amazon Rekognition
Workshop: Build an Image-Based Automatic Alert System with Amazon RekognitionWorkshop: Build an Image-Based Automatic Alert System with Amazon Rekognition
Workshop: Build an Image-Based Automatic Alert System with Amazon Rekognition
 
Workshop: Using Amazon ML Services for Video Transcription and Translation Wo...
Workshop: Using Amazon ML Services for Video Transcription and Translation Wo...Workshop: Using Amazon ML Services for Video Transcription and Translation Wo...
Workshop: Using Amazon ML Services for Video Transcription and Translation Wo...
 
AI & ML on AWS: State of the Union
AI & ML on AWS: State of the UnionAI & ML on AWS: State of the Union
AI & ML on AWS: State of the Union
 

Similar to A pragmatic introduction to natural language processing models (October 2019)

NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsSanghamitra Deb
 
Introduction to GluonCV
Introduction to GluonCVIntroduction to GluonCV
Introduction to GluonCVApache MXNet
 
Envisioning the Future of Language Workbenches
Envisioning the Future of Language WorkbenchesEnvisioning the Future of Language Workbenches
Envisioning the Future of Language WorkbenchesMarkus Voelter
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsOVHcloud
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchrohitcse52
 
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...Amazon Web Services
 
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jRobotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jKevin Watters
 
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...Lucidworks
 
Java And Community Support
Java And Community SupportJava And Community Support
Java And Community SupportWilliam Grosso
 
Building Your Smart Applications with Machine Learning on AWS | AWS Webinar
Building Your Smart Applications with Machine Learning on AWS | AWS WebinarBuilding Your Smart Applications with Machine Learning on AWS | AWS Webinar
Building Your Smart Applications with Machine Learning on AWS | AWS WebinarAmazon Web Services
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013Iván Montes
 
MCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex Pollexy
MCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex PollexyMCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex Pollexy
MCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex PollexyAmazon Web Services
 
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und ExpertenMaschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und ExpertenAWS Germany
 
Machine Learning State of the Union - MCL210 - re:Invent 2017
Machine Learning State of the Union - MCL210 - re:Invent 2017Machine Learning State of the Union - MCL210 - re:Invent 2017
Machine Learning State of the Union - MCL210 - re:Invent 2017Amazon Web Services
 
Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...
Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...
Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...Amazon Web Services
 
Tensors for topic modeling and deep learning on AWS Sagemaker
Tensors for topic modeling and deep learning on AWS SagemakerTensors for topic modeling and deep learning on AWS Sagemaker
Tensors for topic modeling and deep learning on AWS SagemakerAnima Anandkumar
 
AWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAmazon Web Services
 
How Amazon AI Can Help You Transform Your Education Business | AWS Webinar
How Amazon AI Can Help You Transform Your Education Business | AWS WebinarHow Amazon AI Can Help You Transform Your Education Business | AWS Webinar
How Amazon AI Can Help You Transform Your Education Business | AWS WebinarAmazon Web Services
 

Similar to A pragmatic introduction to natural language processing models (October 2019) (20)

NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-experts
 
GluonCV
GluonCVGluonCV
GluonCV
 
Introduction to GluonCV
Introduction to GluonCVIntroduction to GluonCV
Introduction to GluonCV
 
Envisioning the Future of Language Workbenches
Envisioning the Future of Language WorkbenchesEnvisioning the Future of Language Workbenches
Envisioning the Future of Language Workbenches
 
Dean4j@Njug5
Dean4j@Njug5Dean4j@Njug5
Dean4j@Njug5
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
 
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jRobotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
 
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
 
Java And Community Support
Java And Community SupportJava And Community Support
Java And Community Support
 
Building Your Smart Applications with Machine Learning on AWS | AWS Webinar
Building Your Smart Applications with Machine Learning on AWS | AWS WebinarBuilding Your Smart Applications with Machine Learning on AWS | AWS Webinar
Building Your Smart Applications with Machine Learning on AWS | AWS Webinar
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
 
MCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex Pollexy
MCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex PollexyMCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex Pollexy
MCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex Pollexy
 
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und ExpertenMaschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
 
Machine Learning State of the Union - MCL210 - re:Invent 2017
Machine Learning State of the Union - MCL210 - re:Invent 2017Machine Learning State of the Union - MCL210 - re:Invent 2017
Machine Learning State of the Union - MCL210 - re:Invent 2017
 
Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...
Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...
Accelerating Apache MXNet Models on Apple Platforms Using Core ML - MCL311 - ...
 
Tensors for topic modeling and deep learning on AWS Sagemaker
Tensors for topic modeling and deep learning on AWS SagemakerTensors for topic modeling and deep learning on AWS Sagemaker
Tensors for topic modeling and deep learning on AWS Sagemaker
 
AWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AIAWS 機器學習 I ─ 人工智慧 AI
AWS 機器學習 I ─ 人工智慧 AI
 
How Amazon AI Can Help You Transform Your Education Business | AWS Webinar
How Amazon AI Can Help You Transform Your Education Business | AWS WebinarHow Amazon AI Can Help You Transform Your Education Business | AWS Webinar
How Amazon AI Can Help You Transform Your Education Business | AWS Webinar
 

More from Julien SIMON

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceJulien SIMON
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersJulien SIMON
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with TransformersJulien SIMON
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Julien SIMON
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Julien SIMON
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)Julien SIMON
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...Julien SIMON
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Julien SIMON
 
Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Julien SIMON
 
Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...Julien SIMON
 
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Julien SIMON
 
Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)Julien SIMON
 
Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Julien SIMON
 
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Julien SIMON
 
Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)Julien SIMON
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Julien SIMON
 

More from Julien SIMON (16)

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)
 
Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)Become a Machine Learning developer with AWS (Avril 2019)
Become a Machine Learning developer with AWS (Avril 2019)
 
Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...Solve complex business problems with Amazon Personalize and Amazon Forecast (...
Solve complex business problems with Amazon Personalize and Amazon Forecast (...
 
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
 
Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)Optimize your Machine Learning workloads (April 2019)
Optimize your Machine Learning workloads (April 2019)
 
Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)
 
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
 
Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)Optimize your machine learning workloads on AWS (March 2019)
Optimize your machine learning workloads on AWS (March 2019)
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

A pragmatic introduction to natural language processing models (October 2019)

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. A Pragmatic Introduction to Natural Language Processing models Julien Simon Global Evangelist, AI & Machine Learning @julsimon https://medium.com/@julsimon
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Problem statement • Since the dawn of AI, NLP has been a major research field • Text classification, machine translation, text generation, chat bots, voice assistants, etc. • NLP apps require a language model in order to predict the next word • Given a sequence of words (w1, … , wn), predict wn+1 that has the highest probability • Vocabulary size can be hundreds of thousands of words … in millions of documents • Can we build a compact mathematical representation of language, that would help us with a variety of downstream NLP tasks?
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. « You shall know a word by the company it keeps », Firth (1957) • Initial idea: build word vectors from co-occurrence counts • Also called embeddings • High dimensional: at least 50, up to 300 • Much more compact than an n*n matrix, which would be extremely sparse • Words with similar meanings should have similar vectors • “car” ≈ “automobile” ≈ “sedan” • The distance between related vectors should be similar • distance (“Paris”, ”France”) ≈ distance(“Berlin”, ”Germany”) • distance(“hot”, ”hotter”) ≈ distance(“cold”, ”colder”)
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. High-level view 1. Start from a large text corpus (100s of millions of words, even billions) 2. Preprocess the corpus • Tokenize: « hello, world! » → « <BOS>hello<SP>world<SP>!<EOS>» • Multi-word entities: « Rio de Janeiro » → « rio_de_janeiro » 3. Build the vocabulary 4. Learn vector representations for all words … or simply use (or fine-tune) pre-trained vectors (more on this later)
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Word2Vec (2013) https://arxiv.org/abs/1301.3781 https://code.google.com/archive/p/word2vec/ • Continuous bag of words (CBOW): • the model predicts the current word from surrounding context words • Word order doesn’t matter (hence the ‘bag of words’) • Skipgram • the model uses the current word to predict the surrounding window of context words • This may work better when little data is available • CBOW trains faster, skipgram is more accurate • C code, based on shallow neural network
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Global Vectors aka GloVe (2014) https://nlp.stanford.edu/projects/glove/ https://github.com/stanfordnlp/GloVe https://www.quora.com/How-is-GloVe-different-from-word2vec • Performance generally similar to Word2Vec • Several pre-trained embedding collections: up to 840 billion tokens, 2.2 million vocabulary, 300 dimensions • C code, based on matrix factorization
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. FastText (2016) https://arxiv.org/abs/1607.04606 + https://arxiv.org/abs/1802.06893 https://fasttext.cc/ https://www.quora.com/What-is-the-main-difference-between-word2vec-and-fastText • Extension of Word2Vec: each word is a set of subwords, aka n-grams • « Computer », n=5 : <START>Comp , compu, omput, mpute, puter, uter<END> • A word vector is the average of its subword vectors • Good for unknown/mispelled words, as they share subwords with known words • « Computerized » and « Cmputer » should be close to « Computer » • Three modes • Unsupervised learning: compute embeddings (294 languages available) • Supervised learning: use / fine-tune embeddings for multi-label, multi-class classification • Also language detection for 170 languages • Multithreaded C++ code, with Python API
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://dl.acm.org/citation.cfm?id=3146354 https://aws.amazon.com/blogs/machine-learning/enhanced-text-classification-and- word-vectors-using-amazon-sagemaker-blazingtext/ • Amazon-invented algorithm, available in Amazon SageMaker • Extends FastText with GPU capabilities • Unsupervised learning: word vectors • 20x faster • CBOW and skip-gram with subword support • Batch skip-gram for distributed training • Supervised learning: text classification • 100x faster • Models are compatible with FastText BlazingText (2017)
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Limitations of Word2Vec (and family) • Some words have different meanings (aka polysemy) • « Kevin, stop throwing rocks! » vs. « Machine Learning rocks » • Word2Vec encodes the different meanings of a word into the same vector • Bidirectional context is not taken into account • Previous words (left-to-right) and next words (right-to-left)
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Embeddings from Language Models aka ELMo (02/2018) https://arxiv.org/abs/1802.05365 https://allennlp.org/elmo • ELMo generates a context-aware vector for each word • Character-level CNN + two unidirectional LSTMs • Bidirectional context, no cheating possible (can’t peek at the next word) • “Deep” embeddings, reflecting output from all layers • No vocabulary, no vector file: you need to use the model itself • Reference implementation with TensorFlow
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Bidirectional EncoderRepresentationsfrom Transformers aka BERT (10/2018) https://arxiv.org/abs/1810.04805 https://github.com/google-research/bert https://www.quora.com/What-are-the-main-differences-between-the-word-embeddings-of-ELMo-BERT-Word2vec-and-GloVe • BERT improves on ELMo • Replace LSTM with Transformers, which deal better with long-term dependencies • True bidirectional architecture: left-to-right and right-to-left contexts are learned by the same network • 15% of words are randomly masked during training to prevent cheating • Pre-trained models: BERT Base and BERT Large • Masked word prediction • Next sentence prediction • Reference implementation with TensorFlow
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Limitations of BERT • BERT cannot handle more than 512 input tokens • BERT masks words during training, but not during fine-tuning (aka training/fine-tuning discrepancy) • BERT isn’t trained to predict the next word, so it’s not great at text generation • BERT doesn’t learn dependencies for masked words • Train « I am going to <MASK> my <MASK> » on « walk » / « dog », « eat » / « breakfast », and « debug » / « code ». • BERT could legitimately predict « I am going to eat my code » or « I am going to debug my dog » :-/
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. XLNet (06/2019) https://arxiv.org/abs/1906.08237 https://github.com/zihangdai/xlnet • XLNet beats BERT at 20 tasks • XLNet uses bidirectional context, but words are randomly permuted • No cheating possible • No masking required • XLNet Base and XLNet Large • Reference implementation with TensorFlow 07/2019: ERNIE 2.0 (Baidu) beats BERT & XLNet https://github.com/PaddlePaddle/ERNIE/
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summing things up • Word2Vec and friends • Try pre-trained embeddings first • Check that the training corpus is similar to your own data • Same language, similar vocabulary • Remember that subword models will help with unknown / mispelled words • If you have exotic requirements AND lots of data, training is not expensive • ElMo, BERT, XLNet • Training is very expensive: several days using several GPUs • Fine-tuning is cheap: just a few GPU hours for SOTA results • Fine-tuning scripts and pre-trained models are available: start there! • The ”best model” is the one that works best on your business problem
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo: finding similarities and analogies with Gluon NLP and pre-trained GloVe embeddings https://gitlab.com/juliensimon/dlnotebooks/gluonnlp/
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo: embeddings with ELMo on TensorFlow https://gitlab.com/juliensimon/dlnotebooks/blob/master/nlp/ELMO%20TensorFlow.ipynb
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Getting started on AWS https://ml.aws https://aws.amazon.com/marketplace/solutions/machine-learning/natural- language-processing → off the shelf algos and models that might just save you from this madness ;) https://aws.amazon.com/sagemaker https://github.com/awslabs/amazon-sagemaker-examples
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Julien Simon Global Evangelist, AI & Machine Learning @julsimon https://medium.com/@julsimon