NLP in 2020

Grigory Sapunov
OpenTalks.AI / 2021.02.04
gs@inten.to
NLP in 2020

1. GPT-3 and the
“new way of learning”

GPT-3
https://blog.inten.to/gpt-3-language-models-are-few-shot-learners-a13d1ae8b1f9
https://arxiv.org/abs/2005.14165
The GPT-3 family of models is a recent upgrade of the well-known GPT-2
model, with the largest of them (175B parameters), the “GPT-3” is 100x
times larger than the largest (1.5B parameters) GPT-2.

Large models
● GPT-3 (up to 175B parameters)
● ruGPT-3 (1.3B+)
● Chinese CPM (2.6B)
● T5 (up to 11B)
● mT5 (up to 13B)
● mBERT (110M)
● mBART (680M, trained for MT)
● MARGE (960M)
● XLM, XLM-R (570M)
● Turing-NLG (17B)
● T-ULRv2 (550M)
● M2M-100 (12B, trained for MT)
● MoE Transformer (600B*, MT)
● Switch Transformer (1.6T*)

Large models
http://faculty.washington.edu/ebender/papers/Stochastic_Parrots.pdf

Scaling laws
“Scaling Laws for Neural Language Models”

Large model training costs
“The Cost of Training NLP Models: A Concise Overview”

CO2 emissions
“Energy and Policy Considerations for Deep Learning in NLP”

Training Data Extraction
“Extracting Training Data from Large Language Models”

● Size Doesn’t Guarantee Diversity
○ Internet data overrepresenting younger users and those from developed countries.
○ Training data is sourced by scraping only specific sites (e.g. Reddit).
○ There are structural factors including moderation practices.
○ The current practice of filtering datasets can further attenuate specific voices.
● Static Data/Changing Social Views
○ The risk of ‘value-lock’, where the LM-reliant technology reifies older, less-inclusive
understandings.
○ Movements with no significant media attention will not be captured at all.
○ Given the compute costs it likely isn’t feasible to fully retrain LMs frequently
enough.
● Encoding Bias
○ Large LMs exhibit various kinds of bias, including stereotypical associations or
negative sentiment towards specific groups.
○ Issues with training data: unreliable news sites, banned subreddits, etc.
○ Model auditing using automated systems that are not reliable themselves.
● Documentation debt
○ Datasets are both undocumented and too large to document post hoc.

“An LM is a system for haphazardly stitching together
sequences of linguistic forms it has observed in its vast
training data, according to probabilistic information about
how they combine, but without any reference to meaning:
a stochastic parrot. “

Large multilingual models
● GPT-3 (up to 175B parameters)
● ruGPT-3 (1.3B+)
● Chinese CPM (2.6B)
● T5 (up to 11B)
● mT5 (up to 13B, 101 languages)
● mBERT (110M, 104 languages)
● mBART (680M, 25 languages, trained for MT)
● MARGE (960M, 26 languages)
● XLM, XLM-R (570M, 100 languages)
● Turing-NLG (17B)
● T-ULRv2 (550M, 94 languages)
● M2M-100 (12B, 100 languages, trained for MT)
● MoE Transformer (600B*, 100 languages → en, trained for MT)
● Switch Transformer (1.6T*, 101 languages like mT5)
http://robot-design.org/

Positive language transfer (MoE Transformer)
“GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding”

Positive language transfer (M2M-100)
“Introducing the First AI Model That Translates 100 Languages Without Relying on English”
https://about.fb.com/news/2020/10/first-multilingual-machine-translation-model/
“Beyond English-Centric Multilingual Machine Translation”

XGLUE Benchmark
https://github.com/microsoft/XGLUE

XGLUE Benchmark
“XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation”

XTREME Benchmark
“XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization”

XTREME Benchmark
https://sites.research.google/xtreme

Architectural innovations
● Efficiency:
○ Lower attention computational
complexity
○ Larger attention spans
○ Reformer, Linformer, Longformer, Big
Bird, Performer, Axial Transformers
● Images:
○ iGPT, Vision Transformer (ViT), image
processing transformer (IPT), DALL·E
● Memory:
○ Compressive Transformer,
memory-augmented models
● many other improvements!

“Efficient Transformers: A Survey”

SuperGLUE (2019)
“SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems”

SuperGLUE
https://super.gluebenchmark.com/leaderboard

Meena (2.6B model, Google)
“Towards a Conversational Agent that Can Chat About…Anything”
https://ai.googleblog.com/2020/01/towards-conversational-agent-that-can.html

BlenderBot (up to 9.4B model, FB)
“A state-of-the-art open source chatbot”
https://ai.facebook.com/blog/state-of-the-art-open-source-chatbot

Alexa Prize Grand Challenge 3
https://www.amazon.science/latest-news/amazon-announces-2020-alexa-prize-winner-emory-university
Their ultimate goal is to meet the Grand
Challenge: earn a composite score of 4.0 or
higher (out of 5) from the judges, and have the
judges find that at least two-thirds of their
conversations with the socialbot in the final round of judging remain
coherent and engaging for 20 minutes.
Emora, the Emory University chatbot, earned first place with a 3.81
average rating and average duration of 7 minutes and 37 seconds.

Cloud: The democratization of AI
“Across all these different technology areas, 93 percent are using cloud-based AI capabilities,
while 78 percent employ open-source AI capabilities. For example, online marketplace Etsy
has shifted its AI experimentation to a cloud provider to dramatically increase its
computing power and number of experiments. Learning how to manage and integrate these
disparate tools and techniques is fundamental for success.”
Deloitte’s State of AI in the Enterprise, 3rd Edition
https://www2.deloitte.com/us/en/insights/focus/cognitive-technologies/state-of-ai-and-intelligent-automation-in-business-survey.html

https://openai.com/blog/openai-api/
Cloud: API for GPT-3

https://try.inten.to/mt_report_2020
Cloud: Machine Translation Landscape

Old example: NLP cloud APIs / 60+ APIs

Сustomized models in the cloud
https://cloud.google.com/automl/

https://ru.linkedin.com/in/grigorysapunov
gs@inten.to
Thanks!

NLP in 2020

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NLP in 2020

Similar to NLP in 2020 (20)

More from Grigory Sapunov

More from Grigory Sapunov (20)

Recently uploaded

Recently uploaded (20)

NLP in 2020