SlideShare a Scribd company logo
1 of 24
Download to read offline
PEGASUS:
Pre-training with Extracted Gap-sentences
for Abstractive Summarization
Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu(2019.12., ICML 2020)
NLP: 김한길, 문경언, 주정헌
2020.07.19.
Photo by Sudan Ouyang on Unsplash
Two main approaches to summarization task
2
https://www.slideshare.net/aclanthology/abigail-see-2017-get-
to-the-point-summarization-with-pointergenerator-networks
● More difficult
● More flexible
● Recent approach
● Easier
● Too restrictive <-> unlikely to produce absurd summary
● Most past work is extractive
PEGASUS
• PEGASUS
Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models
• A large Transformer-based encoder-decoder model
with pre-training self-supervised objective (called gap-sentence generation, GSG)
to improve fine-tuning performance on abstractive summarization,
• Achieving state-of-the-art results on 12 diverse summarization datasets.
3
What is the good pre-training objectives
tailored for abstractive text summarization?
Key question
Gap Sentences Generation (GSG)
4
Pre-training Objectives
tailored for abstractive text summarization
• Problem: Insufficient (labeled) data for a new domain
• time / money ↓, performance ↑
Transfer learning
5
(end-to-end)
e.g. ①Pre-training, ②Fine-tuning
Transfer learning: BERT
6→ Can get general linguistic knowledge
Pre-training objective: MLM
7https://amitness.com/2020/05/self-supervised-learning-nlp/
Pre-training objective for summarization
8
What is the good pre-training objectives
tailored for abstractive text summarization?
Key question
A pre-training objective that
more closely resembles the downstream task
Hypothesis
(i.e. Generality(Reusability) ↓, Performance ↑)
Pre-training objective for summarization
9
But it is not easy…
Pretraining
Fine-tuning
A large unlabeled data set
Pre-training objective for summarization:
Gap Sentences Generation
GSG: Self-Supervised Objective for Summarization
• ① Masking sentences from a document
and generating these gap-sentences from the rest of the document
10
https://towardsdatascience.com/pegasus-google-state-of-
the-art-abstractive-summarization-model-627b1bbbc5ce
② By choose putatively important sentences (psedo-summary)
How? → ROUGE1-F1
Which sentences are important?  ROUGE1-F1
• ROUGE(Recall-Oriented Understudy for Gisting Evaluation) & F-score
• ROUGE
• Metrics for evaluating automatic summarization or machine translation
• By comparing an automatically produced summary against a set of reference summaries
• Diverse variations: ROUGE-N(N-gram), ROUGE-L(longest matching sequence,not require consecutive matches)
• e.g.
Recall
- ROUGE-N: System-1 = System-2 (“police”, “the gunman”)
- ROUGE-L:
- Output 1 = 3 / 4 (“police the gunman”)
- Output 2 = 2 / 4 (“the gunman”)
11
http://www.ccs.neu.edu/home/vip/teach/DMcourse/5_topicmodel_summ/notes_slides/What-is-ROUGE.pdf
https://huffon.github.io/2019/12/07/rouge/
Reference: police killed the gunman
- Output 1: police kill the gunman
- Output 2: the gunman kill police
Pre-training objective for summarization:
Gap Sentences Generation
Both GSG and MLM are applied simultaneously to this example as pre-training objectives
12
Pre-training objective for summarization:
Gap Sentences Generation
MASK1: GSG
MASK2: MLM
Experiment
13
Experiments
Pre-training each corpus
• C4: text from 350M Web-pages (750GB)
• HugeNews: 1.5B articles (3.8TB) collected from news and news-like websites from 2013-2019
Downstream Tasks/Datasets
• TensorFlow Summarization Datasets 1 for reproduciblity
Experiments
1. Pre-training ablation experiments to choices of pre-training corpus, objective, and vocabulary size
Using PEGASUSBASE(223M) instead of PEGASUSLARGE(568M)
2. Larger Model Results
3. Fine-tuning with low-resource
4. Qualitative Observations
14
Pre-training ablation experiments:
6.1.1. Corpus
• Pre-training on HugeNews (1.5B news-like documents)
→ more effective on the two news downstream datasets
• Pre-training on C4 (350M Web-pages)
→ more effective on the non-news informal datasets (WikiHow and Reddit TIFU)
✓ Pretraining models transfer more effectively to downstream tasks when their domains are aligned better.
Pre-training ablation experiments:
6.1.2. Pre-training Objectives
How to select the “important sentences” as gap-sentences? 6 strategies
• Random: Uniformly select m sentences at random.
• Lead: Select the first m sentences.
• Principal: Select top-m scored sentences according to importance
ROUGE1-F1 (Lin, 2004) between the sentence and the rest of the document
• (Ind) scored independently / (Seq) selecting sequentially by greedily maximizing the
ROUGE1-F1
• (Uniq) n-grams as a set / (Orig) double-counting identical n-grams
16
- Comparison six variants: Lead, Random, Ind-Orig, Ind-Uniq, Seq-Orig, Seq-Uniq
Pre-training ablation experiments:
6.1.2. Pre-training Objectives
- MLM solely < Lead < Random < ~~ < Ind-Orig
- MLM & Ind-Orig VS Ind-Orig
MLM improved fine-tuning performance at early pre-training
checkpoints (100k - 200k steps),
but inhibited further gains with more pre-training steps (500k)
✓ Not to include MLM in PEGASUSLARGE
- GSG(30% masks sentences ratios)
Pre-training ablation experiments:
6.1.2.Effect of Vocabulary
- Two tokenizer
- Byte-pair-encoding(BPE)
- SentencePiece Unigram algorithm (Unigram 32k to 256k ranges )
- Best options : Unigram 96k in large model
6.2 Larger Model Results
PEGASUSBASE (223M) TO PEGASUSLARGE(568M)
• Number of layers for Transformer blocks L = 12 → 16
• Hidden size H = 768 → 1024
• Feed-forward layer size F = 3072 → 4096
• Number of self-attention heads A = 12 → 16
Optimization : pre-training&fine-tuning ”Adafactor” with square root learning rate decay, dropout 0.1
GSG
• Left 20% of selected sentences unchanged in the input to encourage the model to do copy mechnism
• Increased the GSR to 45% to achieve a similar number of “gaps” as the optimal 30% found above
6.2 Larger Model Results
The improvement from a Transformer model without pretraining (TransformerBASE) to PEGASUSLARGE
was more significant on smaller datasets
✓ Small text summarization datasets benefit the most from pre-training
(ROUGE1-F1 / ROUGE2-F1 / ROUGEL-F1 scores )
6.3 Zero and Low-Resource Summarization
- In 8 out of 12 datasets, with just 100 examples PEGASUSLARGE >= TransformerBASE
The dashed lines are TransformerBASE models,
equivalent in capacity as PEGASUSBASE, trained using the full supervised datasets, with no pre-training
6.4 Qualitative Observations and Human Evaluation
① Both PEGASUSLARGE outputs were
at least as good as the reference
summaries in all cases.
② At low-levels of supervision
PEGASUSLARGE(HugeNews) was
not measurably worse than
human summaries
on Xsum and CNN/DailyMail.
③ In the Reddit TIFU case, however,
perhaps due to its diverse writing
styles, required full supervision.
Workers were asked to rate the summaries on a 1-5 scale
Do paired t-test to assess whether scores were significantly different from human
①
② ③
Conclusion
• Suggest new pretraining objective, GSG(gap-sentences generation)
as a tailored for abstractive text summarization
• Identified gap-sentence selection strategy : principle sentence selection(Ind-Orig)
• Demonstrated the effects of the pre-training corpora, gap-sentences ratios, vocabulary sizes
• Achieve state-of-the-art results on all 12 diverse downstream datasets
• Showed that the model was able to adapt to unseen summarization datasets
very quickly, achieving strong results in as little as 1000 examples
Thanks!
24

More Related Content

What's hot

What's hot (20)

BERT
BERTBERT
BERT
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Transformer xl
Transformer xlTransformer xl
Transformer xl
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Understanding GloVe
Understanding GloVeUnderstanding GloVe
Understanding GloVe
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Text Classification
Text ClassificationText Classification
Text Classification
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Bleu vs rouge
Bleu vs rougeBleu vs rouge
Bleu vs rouge
 

Similar to Pegasus

Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
Rama Irsheidat
 
Online_News_Popularity_Machine_Learning
Online_News_Popularity_Machine_LearningOnline_News_Popularity_Machine_Learning
Online_News_Popularity_Machine_Learning
Dibyajyoti Bose
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
Po-Chuan Chen
 
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Jisu Han
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpo
mosi2005
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
eXascale Infolab
 
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning ModelChinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Lifeng (Aaron) Han
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
Lifeng (Aaron) Han
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Po-Chuan Chen
 

Similar to Pegasus (20)

Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
 
Online_News_Popularity_Machine_Learning
Online_News_Popularity_Machine_LearningOnline_News_Popularity_Machine_Learning
Online_News_Popularity_Machine_Learning
 
Training language models to follow instructions with human feedback.pdf
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
 
Evaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdfEvaluating Parameter Efficient Learning for Generation.pdf
Evaluating Parameter Efficient Learning for Generation.pdf
 
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
 
Genetic Programming-based Evolutionary Feature Construction for Heterogeneous...
Genetic Programming-based Evolutionary Feature Construction for Heterogeneous...Genetic Programming-based Evolutionary Feature Construction for Heterogeneous...
Genetic Programming-based Evolutionary Feature Construction for Heterogeneous...
 
On the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software EngineeringOn the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software Engineering
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpo
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
 
Universal job embedding in recommendation (public ver.)
Universal job embedding in recommendation (public ver.)Universal job embedding in recommendation (public ver.)
Universal job embedding in recommendation (public ver.)
 
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
 
Regression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVMRegression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVM
 
LLMs for the “GPU-Poor” - Franck Nijimbere.pdf
LLMs for the “GPU-Poor” - Franck Nijimbere.pdfLLMs for the “GPU-Poor” - Franck Nijimbere.pdf
LLMs for the “GPU-Poor” - Franck Nijimbere.pdf
 
Unsupervised Data Augmentation for Consistency Training
Unsupervised Data Augmentation for Consistency TrainingUnsupervised Data Augmentation for Consistency Training
Unsupervised Data Augmentation for Consistency Training
 
Self training improves_nlu
Self training improves_nlu Self training improves_nlu
Self training improves_nlu
 
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning ModelChinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
 
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdfOffline Reinforcement Learning for Informal Summarization in Online Domains.pdf
Offline Reinforcement Learning for Informal Summarization in Online Domains.pdf
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 

More from Hangil Kim

‘관계맺기’의 관점에서 비즈니스 바라보기
‘관계맺기’의 관점에서 비즈니스 바라보기‘관계맺기’의 관점에서 비즈니스 바라보기
‘관계맺기’의 관점에서 비즈니스 바라보기
Hangil Kim
 

More from Hangil Kim (8)

한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기
한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기
한국어 문서 추출요약 AI 경진대회- 좌충우돌 후기
 
Prepo
PrepoPrepo
Prepo
 
[Paper Review] What have we achieved on text summarization?
[Paper Review] What have we achieved on text summarization? [Paper Review] What have we achieved on text summarization?
[Paper Review] What have we achieved on text summarization?
 
Transafer entrophy
Transafer entrophyTransafer entrophy
Transafer entrophy
 
‘관계맺기’의 관점에서 비즈니스 바라보기
‘관계맺기’의 관점에서 비즈니스 바라보기‘관계맺기’의 관점에서 비즈니스 바라보기
‘관계맺기’의 관점에서 비즈니스 바라보기
 
2019년 상반기 K-MOOC 플랫폼 업데이트 안내
2019년 상반기 K-MOOC 플랫폼 업데이트 안내2019년 상반기 K-MOOC 플랫폼 업데이트 안내
2019년 상반기 K-MOOC 플랫폼 업데이트 안내
 
K-MOOC_모바일앱_사용자매뉴얼_v2.0
K-MOOC_모바일앱_사용자매뉴얼_v2.0K-MOOC_모바일앱_사용자매뉴얼_v2.0
K-MOOC_모바일앱_사용자매뉴얼_v2.0
 
K mooc 모바일앱 사용자매뉴얼 v1.0
K mooc 모바일앱 사용자매뉴얼 v1.0K mooc 모바일앱 사용자매뉴얼 v1.0
K mooc 모바일앱 사용자매뉴얼 v1.0
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Pegasus

  • 1. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu(2019.12., ICML 2020) NLP: 김한길, 문경언, 주정헌 2020.07.19. Photo by Sudan Ouyang on Unsplash
  • 2. Two main approaches to summarization task 2 https://www.slideshare.net/aclanthology/abigail-see-2017-get- to-the-point-summarization-with-pointergenerator-networks ● More difficult ● More flexible ● Recent approach ● Easier ● Too restrictive <-> unlikely to produce absurd summary ● Most past work is extractive
  • 3. PEGASUS • PEGASUS Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models • A large Transformer-based encoder-decoder model with pre-training self-supervised objective (called gap-sentence generation, GSG) to improve fine-tuning performance on abstractive summarization, • Achieving state-of-the-art results on 12 diverse summarization datasets. 3 What is the good pre-training objectives tailored for abstractive text summarization? Key question
  • 4. Gap Sentences Generation (GSG) 4 Pre-training Objectives tailored for abstractive text summarization
  • 5. • Problem: Insufficient (labeled) data for a new domain • time / money ↓, performance ↑ Transfer learning 5 (end-to-end) e.g. ①Pre-training, ②Fine-tuning
  • 6. Transfer learning: BERT 6→ Can get general linguistic knowledge
  • 8. Pre-training objective for summarization 8 What is the good pre-training objectives tailored for abstractive text summarization? Key question A pre-training objective that more closely resembles the downstream task Hypothesis (i.e. Generality(Reusability) ↓, Performance ↑)
  • 9. Pre-training objective for summarization 9 But it is not easy… Pretraining Fine-tuning A large unlabeled data set
  • 10. Pre-training objective for summarization: Gap Sentences Generation GSG: Self-Supervised Objective for Summarization • ① Masking sentences from a document and generating these gap-sentences from the rest of the document 10 https://towardsdatascience.com/pegasus-google-state-of- the-art-abstractive-summarization-model-627b1bbbc5ce ② By choose putatively important sentences (psedo-summary) How? → ROUGE1-F1
  • 11. Which sentences are important?  ROUGE1-F1 • ROUGE(Recall-Oriented Understudy for Gisting Evaluation) & F-score • ROUGE • Metrics for evaluating automatic summarization or machine translation • By comparing an automatically produced summary against a set of reference summaries • Diverse variations: ROUGE-N(N-gram), ROUGE-L(longest matching sequence,not require consecutive matches) • e.g. Recall - ROUGE-N: System-1 = System-2 (“police”, “the gunman”) - ROUGE-L: - Output 1 = 3 / 4 (“police the gunman”) - Output 2 = 2 / 4 (“the gunman”) 11 http://www.ccs.neu.edu/home/vip/teach/DMcourse/5_topicmodel_summ/notes_slides/What-is-ROUGE.pdf https://huffon.github.io/2019/12/07/rouge/ Reference: police killed the gunman - Output 1: police kill the gunman - Output 2: the gunman kill police Pre-training objective for summarization: Gap Sentences Generation
  • 12. Both GSG and MLM are applied simultaneously to this example as pre-training objectives 12 Pre-training objective for summarization: Gap Sentences Generation MASK1: GSG MASK2: MLM
  • 14. Experiments Pre-training each corpus • C4: text from 350M Web-pages (750GB) • HugeNews: 1.5B articles (3.8TB) collected from news and news-like websites from 2013-2019 Downstream Tasks/Datasets • TensorFlow Summarization Datasets 1 for reproduciblity Experiments 1. Pre-training ablation experiments to choices of pre-training corpus, objective, and vocabulary size Using PEGASUSBASE(223M) instead of PEGASUSLARGE(568M) 2. Larger Model Results 3. Fine-tuning with low-resource 4. Qualitative Observations 14
  • 15. Pre-training ablation experiments: 6.1.1. Corpus • Pre-training on HugeNews (1.5B news-like documents) → more effective on the two news downstream datasets • Pre-training on C4 (350M Web-pages) → more effective on the non-news informal datasets (WikiHow and Reddit TIFU) ✓ Pretraining models transfer more effectively to downstream tasks when their domains are aligned better.
  • 16. Pre-training ablation experiments: 6.1.2. Pre-training Objectives How to select the “important sentences” as gap-sentences? 6 strategies • Random: Uniformly select m sentences at random. • Lead: Select the first m sentences. • Principal: Select top-m scored sentences according to importance ROUGE1-F1 (Lin, 2004) between the sentence and the rest of the document • (Ind) scored independently / (Seq) selecting sequentially by greedily maximizing the ROUGE1-F1 • (Uniq) n-grams as a set / (Orig) double-counting identical n-grams 16
  • 17. - Comparison six variants: Lead, Random, Ind-Orig, Ind-Uniq, Seq-Orig, Seq-Uniq Pre-training ablation experiments: 6.1.2. Pre-training Objectives - MLM solely < Lead < Random < ~~ < Ind-Orig - MLM & Ind-Orig VS Ind-Orig MLM improved fine-tuning performance at early pre-training checkpoints (100k - 200k steps), but inhibited further gains with more pre-training steps (500k) ✓ Not to include MLM in PEGASUSLARGE - GSG(30% masks sentences ratios)
  • 18. Pre-training ablation experiments: 6.1.2.Effect of Vocabulary - Two tokenizer - Byte-pair-encoding(BPE) - SentencePiece Unigram algorithm (Unigram 32k to 256k ranges ) - Best options : Unigram 96k in large model
  • 19. 6.2 Larger Model Results PEGASUSBASE (223M) TO PEGASUSLARGE(568M) • Number of layers for Transformer blocks L = 12 → 16 • Hidden size H = 768 → 1024 • Feed-forward layer size F = 3072 → 4096 • Number of self-attention heads A = 12 → 16 Optimization : pre-training&fine-tuning ”Adafactor” with square root learning rate decay, dropout 0.1 GSG • Left 20% of selected sentences unchanged in the input to encourage the model to do copy mechnism • Increased the GSR to 45% to achieve a similar number of “gaps” as the optimal 30% found above
  • 20. 6.2 Larger Model Results The improvement from a Transformer model without pretraining (TransformerBASE) to PEGASUSLARGE was more significant on smaller datasets ✓ Small text summarization datasets benefit the most from pre-training (ROUGE1-F1 / ROUGE2-F1 / ROUGEL-F1 scores )
  • 21. 6.3 Zero and Low-Resource Summarization - In 8 out of 12 datasets, with just 100 examples PEGASUSLARGE >= TransformerBASE The dashed lines are TransformerBASE models, equivalent in capacity as PEGASUSBASE, trained using the full supervised datasets, with no pre-training
  • 22. 6.4 Qualitative Observations and Human Evaluation ① Both PEGASUSLARGE outputs were at least as good as the reference summaries in all cases. ② At low-levels of supervision PEGASUSLARGE(HugeNews) was not measurably worse than human summaries on Xsum and CNN/DailyMail. ③ In the Reddit TIFU case, however, perhaps due to its diverse writing styles, required full supervision. Workers were asked to rate the summaries on a 1-5 scale Do paired t-test to assess whether scores were significantly different from human ① ② ③
  • 23. Conclusion • Suggest new pretraining objective, GSG(gap-sentences generation) as a tailored for abstractive text summarization • Identified gap-sentence selection strategy : principle sentence selection(Ind-Orig) • Demonstrated the effects of the pre-training corpora, gap-sentences ratios, vocabulary sizes • Achieve state-of-the-art results on all 12 diverse downstream datasets • Showed that the model was able to adapt to unseen summarization datasets very quickly, achieving strong results in as little as 1000 examples