SlideShare a Scribd company logo
1 of 26
Download to read offline
Moshe Wasserblat
Intel AI Lab
NLP MeetUp, Aug. 2020
BIO
2
● NICE Systems
● Led Speech & Text Analytics research group
● First company to productize Speech2Text, ED, Voice Biometric in Call-Center
● INTEL
● Innovate for our products
● Collaborate with top academic
● Explore compute features that disrupt our HW
AGENDA
3
● Efficiency
● Large model intro
● Inference efficiency: models with lower comp. complexity
● Examples
● SustiaNLP Workshop in EMNLP Nov. 2020
● Data challenges
● Extensibility: address new domain with limited data and minimal supervision
● Weakly-supervised ABSA example
1980-2018 1980-2019
The advantages of BERT
1. Efficient transfer learning
Leverage a large model that was pre-trained for a generic task using
a large amount of data for specific task using small amount of data.
high accuracy with smaller amount of data
2. Context embeddings.
Produces vectors that represent each word in a context of a
sentence. E.g. bank in “river bank” vs. “investment bank”
5
Task Specific
Classifier
Context embeddings
Input sentence
Task output
12/24 stacked layers of transformer encoder
(110/330M parameters)
6
Pre-trained LMs have become extremely
large and deep
Pre-trained LMs have become extremely
large and deep
T5
11b
2.5
5
7.5
10
12.5
15
#par
b
Source: HuggingFace
7
• Heavy computation
• Large memory footprint
• Hard to train/fine-tune
• Hard to deploy
How should we put these monsters in production?
0
20
40
60
80
100
120
8
BERT
aLBERT
Year 2020: from accuracy to efficiency
MobileBERT
DistilBERT
TinyBERT
#par
M
Vectors for optimization
9
•Quantization of weights to int8 or other lower precision representation
•Pruning of weights and structural (complete layers, self-attention heads)
•Early prediction of samples by using predictors attached to shallow layers
•Sharing weights of self-attention and FFs modules across all model
blocks
•Training smaller models using Distillation and other novel techniques
•Replacing Transformer modules and searching for best architecture using
Neural Architecture Search
Quantization
10
•Quantization of BERT models to 16/8-bit weights
4x compression, minimal loss in accuracy
We Scaled Bert To Serve
1+ Billion Daily Requests
on CPUs
Pruning
11
It is possible, for some tasks, to prune up to 9 of the
top layers from a 12 layer model without degrading
the performance more than 3%.
Poor Man's BERT: Smaller and Faster Transformer Models
Distillation
12
teacher
Small BERT
student
Loss
TinyBERT
MobileBERT
DistilBERT
hard labels
probability/logit
embeddings
attentions
Naïve approach (Thieves on Sesame street, Krishna et al. ICLR20)
13
FF
Classifier
for fine
tuning
“Mulan is highly
recommended”
“The movie was
good as the book”
teacher
student
pseudo labels
annotated
labels
Unlabeled
examples
Labeled
examples
Task
Loss
Sent: POS
Sent: POS
*Distillation- mimic the output teacher probability
14
FF
Classifier
for fine
tuning
teacher
Unlabeled
examples
Distill
Loss
**mse
• Surprisingly work well
• Great for low resource tasks
Total
Loss
Task
Loss
student
*Hinton et al.**Tang et al.
BERT
2 BERT
15
Note: performance is
cited from the original
paper
Can we do more?
16
LSTM/CNN
>100x
Or CBOW
>1000x
19
Real use-case example
• Named Entity Recognition (NER) is a widely used Information Extraction task in
many industrial applications and use cases
• Ramping up on a new domain can be difficult
§ Lots of unlabeled data, little of no labeled data and often not good enough for
training a model with good performance
Solution A
? Hire a linguist or data scientist to tune/build model
? Hire annotators to label more data or buy similar dataset
? Time/compute resource limitations
Solution B
? Pre-trained Language Models such as BERT, GPT, ELMo are great at low-
resource scenarios
? Require great compute and memory resources and suffer from high latency in
inference
? Deploying such models in production or on edge devices is a major issue This Photo by Unknown Author is licensed under CC BY
20
65
70
75
80
85
90
95
150 300 750 3000
Accuracy
#samples
Name Entity Recognition (CoNLL-2003)
BERT Distil LSTM Distil ID-CNN
Compression Rate x1 x36 x36
•Train a small LSTM/CNN
model using BERT
•Utilizing unlabeled data
via Teacher
•Student competitive
with Teacher
Peter et al. NeurIPS19
21
78
80
82
84
86
88
90
92
94
Agnews 0.4K
samples
Dair's Emotions
16K samples
IMDB 1K samples STS-2 7K samples
Accuracy Text Classification
BERT Distill LSTM Distill CBOW
Compression Rate x1 x100 x1500
•Train a small CBOW
model using BERT
•Utilizing unlabeled data
via Teacher
•Student competitive
with Teacher in specific
dataset
Wasserblat, more details coming soon
22
Takeaways
• Compact models perform equally well as pre-trained LM in low-resource
scenarios, and with superior inference speed and with high compression rate
• Practical Tips:
• Set simpler classifier as baseline
• Finetune DistillBERT/BERT on your task
• High resource for labeled data:
Go with DistillBERT or other compact pre-trained models
• Low resources for labeled data:
Distill BERT to simpler NN and compare to BERT
23
•Data and training efficiency: models requiring less training data and/or less computational
resources and/or time;
•Inference efficiency: models with lower comp. complexity of prediction/inference
https://sites.google.com/view/sustainlp2020
AGENDA
24
● Efficiency
● Large model intro
● Inference efficiency: models with lower comp. complexity
● Examples
● SustiaNLP Workshop 2020
● Data challenges
● Extensibility: address new domain with limited data and minimal supervision
● Weakly-supervised ABSA example
The NLP today
25
● Create a model to individual task and domain
● Need a large team of domain experts, large amount of labeled data
and very time consuming
● Hard to scale and adapt solutions across different domains
● No adaptation to business environment
26
ABSAexampleandusage
the owner is super friendly and service is fastthe owner is super friendly and service is fastfriendly fast
ASP ASPopinion opinion
TheAdvantagesofthealgo.Advantage
Aspect Based SA Producing knowledge regarding specific aspects hence enables
to gain targeted business insight.
Unsupervised -
Domain Adaptive
Unsupervised method - does not require costly manually
tagged data for training
Explainable AI Displaying the relation between opinion terms and aspects
enables the interpretability of the results
• ABSA recommended amongst Top 10 ML Code Examples on Azure
and Included by MSFT in their NLP Recipes
• Published in EMNLP19
• ABSA used by University of British Columbia and the British Columbia CDC to
analyze COVID-19 related tweets in North America. See Jang et al, 2020.
Efficient Deep Learning in Natural Language Processing Production, with Moshe Wasserblat, Intel AI

More Related Content

Similar to Efficient Deep Learning in Natural Language Processing Production, with Moshe Wasserblat, Intel AI

USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSHCL Technologies
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBigML, Inc
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Enterprise deep learning lessons bodkin o reilly ai sf 2017
Enterprise deep learning lessons bodkin o reilly ai sf 2017Enterprise deep learning lessons bodkin o reilly ai sf 2017
Enterprise deep learning lessons bodkin o reilly ai sf 2017Ron Bodkin
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine LearningSudarsun Santhiappan
 
Unlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingUnlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingFlorian Wilhelm
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangVivian S. Zhang
 
Domain Driven Design Introduction
Domain Driven Design IntroductionDomain Driven Design Introduction
Domain Driven Design Introductionwojtek_s
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsAlexey Rybakov
 
Refactoring, Therapeutic Attitude to Programming.
Refactoring, Therapeutic Attitude to Programming.Refactoring, Therapeutic Attitude to Programming.
Refactoring, Therapeutic Attitude to Programming.Amin Shahnazari
 
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...IRJET Journal
 
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang)  - 2014 Boston Data FestivalWinning Data Science Competitions (Owen Zhang)  - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festivalfreshdatabos
 
Winning data science competitions
Winning data science competitionsWinning data science competitions
Winning data science competitionsOwen Zhang
 
How to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR DataHow to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR DataSafe Software
 
VSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and DeepnetsVSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and DeepnetsBigML, Inc
 
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...Edge AI and Vision Alliance
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015Daniela Zuppini
 
Software Design Principles and Best Practices - Satyajit Dey
Software Design Principles and Best Practices - Satyajit DeySoftware Design Principles and Best Practices - Satyajit Dey
Software Design Principles and Best Practices - Satyajit DeyCefalo
 

Similar to Efficient Deep Learning in Natural Language Processing Production, with Moshe Wasserblat, Intel AI (20)

USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
Enterprise deep learning lessons bodkin o reilly ai sf 2017
Enterprise deep learning lessons bodkin o reilly ai sf 2017Enterprise deep learning lessons bodkin o reilly ai sf 2017
Enterprise deep learning lessons bodkin o reilly ai sf 2017
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine Learning
 
Unlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingUnlocking the Power of Integer Programming
Unlocking the Power of Integer Programming
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen Zhang
 
Domain Driven Design Introduction
Domain Driven Design IntroductionDomain Driven Design Introduction
Domain Driven Design Introduction
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
 
Refactoring, Therapeutic Attitude to Programming.
Refactoring, Therapeutic Attitude to Programming.Refactoring, Therapeutic Attitude to Programming.
Refactoring, Therapeutic Attitude to Programming.
 
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
 
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang)  - 2014 Boston Data FestivalWinning Data Science Competitions (Owen Zhang)  - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
 
Winning data science competitions
Winning data science competitionsWinning data science competitions
Winning data science competitions
 
How to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR DataHow to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR Data
 
VSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and DeepnetsVSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and Deepnets
 
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
 
Software Design Principles and Best Practices - Satyajit Dey
Software Design Principles and Best Practices - Satyajit DeySoftware Design Principles and Best Practices - Satyajit Dey
Software Design Principles and Best Practices - Satyajit Dey
 

More from Seth Grimes

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingSeth Grimes
 
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowSeth Grimes
 
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextSeth Grimes
 
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonSeth Grimes
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AISeth Grimes
 
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market TrendsSeth Grimes
 
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPersSeth Grimes
 
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Seth Grimes
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AISeth Grimes
 
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case studySeth Grimes
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisSeth Grimes
 
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to PracticeSeth Grimes
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextSeth Grimes
 
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialSeth Grimes
 
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social SentimentSeth Grimes
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersSeth Grimes
 
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics TodaySeth Grimes
 

More from Seth Grimes (20)

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to Know
 
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's Next
 
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter Dorrington
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
 
Emotion AI
Emotion AIEmotion AI
Emotion AI
 
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market Trends
 
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPers
 
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges?
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
 
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case study
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
 
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to Practice
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and Social
 
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social Sentiment
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
 
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics Today
 

Recently uploaded

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 

Recently uploaded (20)

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Efficient Deep Learning in Natural Language Processing Production, with Moshe Wasserblat, Intel AI

  • 1. Moshe Wasserblat Intel AI Lab NLP MeetUp, Aug. 2020
  • 2. BIO 2 ● NICE Systems ● Led Speech & Text Analytics research group ● First company to productize Speech2Text, ED, Voice Biometric in Call-Center ● INTEL ● Innovate for our products ● Collaborate with top academic ● Explore compute features that disrupt our HW
  • 3. AGENDA 3 ● Efficiency ● Large model intro ● Inference efficiency: models with lower comp. complexity ● Examples ● SustiaNLP Workshop in EMNLP Nov. 2020 ● Data challenges ● Extensibility: address new domain with limited data and minimal supervision ● Weakly-supervised ABSA example
  • 5. The advantages of BERT 1. Efficient transfer learning Leverage a large model that was pre-trained for a generic task using a large amount of data for specific task using small amount of data. high accuracy with smaller amount of data 2. Context embeddings. Produces vectors that represent each word in a context of a sentence. E.g. bank in “river bank” vs. “investment bank” 5 Task Specific Classifier Context embeddings Input sentence Task output 12/24 stacked layers of transformer encoder (110/330M parameters)
  • 6. 6 Pre-trained LMs have become extremely large and deep Pre-trained LMs have become extremely large and deep T5 11b 2.5 5 7.5 10 12.5 15 #par b Source: HuggingFace
  • 7. 7 • Heavy computation • Large memory footprint • Hard to train/fine-tune • Hard to deploy How should we put these monsters in production?
  • 8. 0 20 40 60 80 100 120 8 BERT aLBERT Year 2020: from accuracy to efficiency MobileBERT DistilBERT TinyBERT #par M
  • 9. Vectors for optimization 9 •Quantization of weights to int8 or other lower precision representation •Pruning of weights and structural (complete layers, self-attention heads) •Early prediction of samples by using predictors attached to shallow layers •Sharing weights of self-attention and FFs modules across all model blocks •Training smaller models using Distillation and other novel techniques •Replacing Transformer modules and searching for best architecture using Neural Architecture Search
  • 10. Quantization 10 •Quantization of BERT models to 16/8-bit weights 4x compression, minimal loss in accuracy We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs
  • 11. Pruning 11 It is possible, for some tasks, to prune up to 9 of the top layers from a 12 layer model without degrading the performance more than 3%. Poor Man's BERT: Smaller and Faster Transformer Models
  • 13. Naïve approach (Thieves on Sesame street, Krishna et al. ICLR20) 13 FF Classifier for fine tuning “Mulan is highly recommended” “The movie was good as the book” teacher student pseudo labels annotated labels Unlabeled examples Labeled examples Task Loss Sent: POS Sent: POS
  • 14. *Distillation- mimic the output teacher probability 14 FF Classifier for fine tuning teacher Unlabeled examples Distill Loss **mse • Surprisingly work well • Great for low resource tasks Total Loss Task Loss student *Hinton et al.**Tang et al.
  • 15. BERT 2 BERT 15 Note: performance is cited from the original paper
  • 16. Can we do more? 16 LSTM/CNN >100x Or CBOW >1000x
  • 17. 19 Real use-case example • Named Entity Recognition (NER) is a widely used Information Extraction task in many industrial applications and use cases • Ramping up on a new domain can be difficult § Lots of unlabeled data, little of no labeled data and often not good enough for training a model with good performance Solution A ? Hire a linguist or data scientist to tune/build model ? Hire annotators to label more data or buy similar dataset ? Time/compute resource limitations Solution B ? Pre-trained Language Models such as BERT, GPT, ELMo are great at low- resource scenarios ? Require great compute and memory resources and suffer from high latency in inference ? Deploying such models in production or on edge devices is a major issue This Photo by Unknown Author is licensed under CC BY
  • 18. 20 65 70 75 80 85 90 95 150 300 750 3000 Accuracy #samples Name Entity Recognition (CoNLL-2003) BERT Distil LSTM Distil ID-CNN Compression Rate x1 x36 x36 •Train a small LSTM/CNN model using BERT •Utilizing unlabeled data via Teacher •Student competitive with Teacher Peter et al. NeurIPS19
  • 19. 21 78 80 82 84 86 88 90 92 94 Agnews 0.4K samples Dair's Emotions 16K samples IMDB 1K samples STS-2 7K samples Accuracy Text Classification BERT Distill LSTM Distill CBOW Compression Rate x1 x100 x1500 •Train a small CBOW model using BERT •Utilizing unlabeled data via Teacher •Student competitive with Teacher in specific dataset Wasserblat, more details coming soon
  • 20. 22 Takeaways • Compact models perform equally well as pre-trained LM in low-resource scenarios, and with superior inference speed and with high compression rate • Practical Tips: • Set simpler classifier as baseline • Finetune DistillBERT/BERT on your task • High resource for labeled data: Go with DistillBERT or other compact pre-trained models • Low resources for labeled data: Distill BERT to simpler NN and compare to BERT
  • 21. 23 •Data and training efficiency: models requiring less training data and/or less computational resources and/or time; •Inference efficiency: models with lower comp. complexity of prediction/inference https://sites.google.com/view/sustainlp2020
  • 22. AGENDA 24 ● Efficiency ● Large model intro ● Inference efficiency: models with lower comp. complexity ● Examples ● SustiaNLP Workshop 2020 ● Data challenges ● Extensibility: address new domain with limited data and minimal supervision ● Weakly-supervised ABSA example
  • 23. The NLP today 25 ● Create a model to individual task and domain ● Need a large team of domain experts, large amount of labeled data and very time consuming ● Hard to scale and adapt solutions across different domains ● No adaptation to business environment
  • 24. 26 ABSAexampleandusage the owner is super friendly and service is fastthe owner is super friendly and service is fastfriendly fast ASP ASPopinion opinion
  • 25. TheAdvantagesofthealgo.Advantage Aspect Based SA Producing knowledge regarding specific aspects hence enables to gain targeted business insight. Unsupervised - Domain Adaptive Unsupervised method - does not require costly manually tagged data for training Explainable AI Displaying the relation between opinion terms and aspects enables the interpretability of the results • ABSA recommended amongst Top 10 ML Code Examples on Azure and Included by MSFT in their NLP Recipes • Published in EMNLP19 • ABSA used by University of British Columbia and the British Columbia CDC to analyze COVID-19 related tweets in North America. See Jang et al, 2020.