18. ● Size Doesn’t Guarantee Diversity
○ Internet data overrepresenting younger users and those from developed countries.
○ Training data is sourced by scraping only specific sites (e.g. Reddit).
○ There are structural factors including moderation practices.
○ The current practice of filtering datasets can further attenuate specific voices.
● Static Data/Changing Social Views
○ The risk of ‘value-lock’, where the LM-reliant technology reifies older, less-inclusive
understandings.
○ Movements with no significant media attention will not be captured at all.
○ Given the compute costs it likely isn’t feasible to fully retrain LMs frequently
enough.
● Encoding Bias
○ Large LMs exhibit various kinds of bias, including stereotypical associations or
negative sentiment towards specific groups.
○ Issues with training data: unreliable news sites, banned subreddits, etc.
○ Model auditing using automated systems that are not reliable themselves.
● Documentation debt
○ Datasets are both undocumented and too large to document post hoc.
19. “An LM is a system for haphazardly stitching together
sequences of linguistic forms it has observed in its vast
training data, according to probabilistic information about
how they combine, but without any reference to meaning:
a stochastic parrot. “
http://faculty.washington.edu/ebender/papers/Stochastic_Parrots.pdf
21. Large multilingual models
● GPT-3 (up to 175B parameters)
● ruGPT-3 (1.3B+)
● Chinese CPM (2.6B)
● T5 (up to 11B)
● mT5 (up to 13B, 101 languages)
● mBERT (110M, 104 languages)
● mBART (680M, 25 languages, trained for MT)
● MARGE (960M, 26 languages)
● XLM, XLM-R (570M, 100 languages)
● Turing-NLG (17B)
● T-ULRv2 (550M, 94 languages)
● M2M-100 (12B, 100 languages, trained for MT)
● MoE Transformer (600B*, 100 languages → en, trained for MT)
● Switch Transformer (1.6T*, 101 languages like mT5)
http://robot-design.org/
22. Positive language transfer (MoE Transformer)
“GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding”
https://arxiv.org/abs/2006.16668
23. Positive language transfer (M2M-100)
“Introducing the First AI Model That Translates 100 Languages Without Relying on English”
https://about.fb.com/news/2020/10/first-multilingual-machine-translation-model/
“Beyond English-Centric Multilingual Machine Translation”
https://arxiv.org/abs/2010.11125
36. Meena (2.6B model, Google)
“Towards a Conversational Agent that Can Chat About…Anything”
https://ai.googleblog.com/2020/01/towards-conversational-agent-that-can.html
https://arxiv.org/abs/2001.09977
37. BlenderBot (up to 9.4B model, FB)
“A state-of-the-art open source chatbot”
https://ai.facebook.com/blog/state-of-the-art-open-source-chatbot
https://arxiv.org/abs/2004.13637
38. Alexa Prize Grand Challenge 3
https://www.amazon.science/latest-news/amazon-announces-2020-alexa-prize-winner-emory-university
Their ultimate goal is to meet the Grand
Challenge: earn a composite score of 4.0 or
higher (out of 5) from the judges, and have the
judges find that at least two-thirds of their
conversations with the socialbot in the final round of judging remain
coherent and engaging for 20 minutes.
Emora, the Emory University chatbot, earned first place with a 3.81
average rating and average duration of 7 minutes and 37 seconds.
40. Cloud: The democratization of AI
“Across all these different technology areas, 93 percent are using cloud-based AI capabilities,
while 78 percent employ open-source AI capabilities. For example, online marketplace Etsy
has shifted its AI experimentation to a cloud provider to dramatically increase its
computing power and number of experiments. Learning how to manage and integrate these
disparate tools and techniques is fundamental for success.”
Deloitte’s State of AI in the Enterprise, 3rd Edition
https://www2.deloitte.com/us/en/insights/focus/cognitive-technologies/state-of-ai-and-intelligent-automation-in-business-survey.html