Submit Search
Upload
[DL輪読会]Are Pre-trained Convolutions Better than Pre-trained Transformers? (2021)
•
0 likes
•
153 views
Deep Learning JP
Follow
2021/06/04 Deep Learning JP: http://deeplearning.jp/seminar-2/
Read less
Read more
Technology
Report
Share
Report
Share
1 of 20
Download now
Download to read offline
Recommended
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Deep Learning JP
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
Deep Learning JP
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
Deep Learning JP
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
Deep Learning JP
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
Deep Learning JP
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
Deep Learning JP
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
Deep Learning JP
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
Deep Learning JP
Recommended
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Deep Learning JP
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
Deep Learning JP
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
Deep Learning JP
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
Deep Learning JP
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
Deep Learning JP
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
Deep Learning JP
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
Deep Learning JP
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
Deep Learning JP
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
Deep Learning JP
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
Deep Learning JP
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
Deep Learning JP
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
Deep Learning JP
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
Deep Learning JP
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
Deep Learning JP
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
Deep Learning JP
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
Deep Learning JP
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
Deep Learning JP
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
Deep Learning JP
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
Deep Learning JP
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
Deep Learning JP
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル
Deep Learning JP
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
Deep Learning JP
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
Deep Learning JP
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM
Deep Learning JP
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
Deep Learning JP
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
Deep Learning JP
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
More Related Content
More from Deep Learning JP
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
Deep Learning JP
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
Deep Learning JP
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
Deep Learning JP
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
Deep Learning JP
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
Deep Learning JP
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
Deep Learning JP
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
Deep Learning JP
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
Deep Learning JP
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
Deep Learning JP
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
Deep Learning JP
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
Deep Learning JP
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
Deep Learning JP
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル
Deep Learning JP
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
Deep Learning JP
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
Deep Learning JP
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM
Deep Learning JP
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
Deep Learning JP
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
Deep Learning JP
More from Deep Learning JP
(20)
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
Recently uploaded
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Pixlogix Infotech
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
Slack Application Development 101 Slides
Slack Application Development 101 Slides
praypatel2
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
wesley chun
Recently uploaded
(20)
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Slack Application Development 101 Slides
Slack Application Development 101 Slides
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
[DL輪読会]Are Pre-trained Convolutions Better than Pre-trained Transformers? (2021)
1.
1 DEEP LEARNING JP [DL
Papers] http://deeplearning.jp/ “Are Pre-trained Convolutions Better than Pre-trained Transformers? (2021)” Itsuki Okimura, Matsuo Lab, B4
2.
アジェンダ 1. 書誌情報 2. 概要 3.
問題意識 4. 先⾏研究 5. 提案⼿法 6. 実験結果 7. 議論 8. まとめ 2
3.
1 書誌情報 • 論⽂名:
Are Pre-trained Convolutions Better than Pre-trained Transformers? • 出典: arXiv (https://arxiv.org/abs/2105.03322) • 著者: Yi Tay, Mostafa Dehghani, Jai GuptaらGoogle Researchのチーム • 選んだ理由: 最近活発なTransformerアーキテクチャに対する問題提起 3
4.
2 概要 • Transformerにおけるself-attention層をconvolution層に変更した CNNベースの事前学習モデルを従来の事前学習モデルと⽐較 •
7つの下流タスクで⽐較した結果, CNNベースの事前学習モデルは 従来の事前学習モデルに匹敵する, もしくは上回る性能を発揮すると主張 • また,ランタイム、スケーラビリティの点で従来のTransformer ベースの事前学習に⽐べCNNベースの事前学習に 優位性があることを指摘 • 事前学習とTransformerアーキテクチャは分けて議論すべきと主張 4
5.
3 問題意識 • 近年NLPでは,
BERT, GPT-n, T5といった事前学習済みモデルが 発表されてきた • Transformerをベースにしていない最近の事前学習済みモデルはほとんど存在しない (*) Q: 異なるアーキテクチャの機能バイアスでも同様に事前学習の恩恵を享受できるのか? 5 NLPʹ͓͍ͯࣄલֶशϞσϧͱTransformerΞʔΩςΫνϟಉ຺͡ ͰޠΒΕ͍ͯΔ ʹޮࢉܭ༏Εɺॴہతʹಈ࡞͠ɺ࠶Ͱܕؼͳ͍CNNΛ༻͍࣮ͯݧ
6.
4 先⾏研究 • 各特徴量次元ごとにCNNを適⽤する先⾏研究(Depthwise
convolution)に対し, チャンネルの次元にわたってCNNの重みを共有することで 更にパラメータを削減するLightweight convolution, さらにその拡張として、タイムステップごとにCNNの重みを動的に計算する Dynamic convolutionを提案 • self-attentionを⽤いずに機械翻訳で⾼い精度を⽰すことに成功(WMT En-Deの BLEUスコア当時3位) 6 Pay Less Attention with Lightweight and Dynamic Convolutions (ICLR 2019) https://arxiv.org/pdf/1901.10430.p df
7.
4 先⾏研究 • 各チャンネルごとに独⽴のパラメータで畳み込みを⾏うConvolution 𝐷𝑒𝑝𝑡ℎ𝑤𝑖𝑠𝑒𝐶𝑜𝑛𝑣
𝑋, 𝑊 !,:, 𝑖, 𝑐 = 2 $%& ' 𝑊!,$ 3 𝑋 ()$* ')& + ,! 7 Depthwise convolution https://qiita.com/koreyou/items/3 28fa92a1d3a7e680376#fn4
8.
4 先⾏研究 • チャンネルをH個ごとのグループにわけ、グループごとに共通のパラメータで depthwise
convolutionを⾏う 𝐿𝑖𝑔ℎ𝑡𝑤𝑒𝑖𝑔ℎ𝑡𝐶𝑜𝑛𝑣 𝑋, 𝑊 !, - ,: , 𝑖, 𝑐 = 2 $%& ' 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑊 !, - ,$ 3 𝑋 ()$* ')& + ,! 8 Lightweight convolution https://qiita.com/koreyou/items/3 28fa92a1d3a7e680376#fn4
9.
4 先⾏研究 • ⼊⼒された特徴量から、Lightweight
convolutionのパラメータを動的に計算する 𝐷𝑦𝑛𝑎𝑚𝑖𝑐𝐶𝑜𝑛𝑣 𝑋, 𝑖, 𝑐 = 𝐿𝑖𝑔ℎ𝑡𝑤𝑒𝑖𝑔ℎ𝑡𝐶𝑜𝑛𝑣 𝑋, 𝑓 𝑋! ",:, 𝑖, 𝑐 ここで𝑓 𝑋! = ∑%&' ( 𝑋!,% 𝑊",!,% 9 Dynamic convolution https://qiita.com/koreyou/items/3 28fa92a1d3a7e680376#fn4
10.
4 先⾏研究 • 間隔の開いたカーネルから畳み込みを⾏うConvolution 𝐷𝑖𝑙𝑎𝑡𝑒𝑑𝐶𝑜𝑛𝑣
𝑋, 𝑊 %,:, 𝑖, 𝑐 = 9 )&' * 𝑊%,) : 𝑋!+,)- *+' ,% 10 Dilated convolution
11.
5 提案⼿法 • TransformerのQ,
K, Vの変換の代わりにGLU(gated linear unit)層へ self-attention層の代わりにconvolution層へ変更し, seq2seqで事前学習を⾏う • ⽤いるConvolutionはLightweight convolution, Dynamic convolution (それぞれfilter size=7), Dilated convolution (12層のfilter size=[4, 4, 7, 7, 15, 15, 15, 15, 31, 31, 31])のいずれか • トークン単位のクロスエントロピーから損失を最適化 11 CNNΞʔΩςΫνϟͷࣄલֶशϞσϧ
12.
6 実験結果 • T5をベースとした畳み込みモデルとTransformerモデルの両⽅で 事前学習を⾏ったものと⾏わないものを⽤意 •
事前学習にはColossal Cleaned CommonCrawl Corpus(C4)を⽤い, 524kステップ, 128のバッチサイズで学習 • 毒性検出(CIVIL COMMENTS, WIKI TOXIC), センチメント分類(IMDb, SST-2, S140), トピック分類(AGNews), 質問分類(TREC)ら7つのタスクでFine-tuning • 事前学習の有無とそれぞれのモデルの下流タスクでのスコアから 事前学習が与える影響を調査 12
13.
6 実験結果 • 幅広いドメインの7つのタスクにおいて、 (1)事前に学習されていない畳み込みは競争⼒があり、頻繁に事前に学習されてい ないTransformerを上回る (2)事前に学習された畳み込みは7つのタスクのうち6つで事前に学習された Transformerを上回る 13
14.
6 実験結果 • (3)事前に学習した畳み込みモデルの中では,
Dilated convolutionと Dynamic convolutionがLightweight convolutionよりも優れている • (4) 事前学習なしで(相対的に)良い性能を発揮するモデルが事前学習を⾏うと 必ずしも最⾼の性能を発揮するとは限らない 14
15.
7 議論 • 複数の⽂章間の関係をモデル化するタスクが困難 –
⻑距離依存を捉えるself-attentionに相当する機構がないため? (例)SQuAD: パラグラフと質問が与えられ, 正しい回答を⽣成する⽂書読解タスク - 事前学習済みTransformer F1 90% - 事前学習済みCNN F1 70% Multi NLI: 2つの⽂の含意関係を判定するタスク - 事前学習済みTransformer Accuracy 84% - 事前学習済みCNN Accuracy 75% - エンコーダーに2つの⽂のcross-attention層を補強すると83%まで到達 *Dual Encoderにすると良いのではと主張するが、 個別のタスクのためにEncoderのアーキテクチャを変えるのは微妙な気がする 15 ۤखͳλεΫ
16.
7 議論 • self-attentionは系列⻑Nに対し計算量𝑂
𝑁, に対し convolutionは計算量𝑂 𝑁 で済む • convolutionは⼀貫して⾼速であるだけでなく(系列⻑ が短くても), Transformerよりも優れたスケーリングが可能 • FLOPs効率は配列が⻑くなっても悪化しない 16 ྻܥ͕͘ͳֶͬͯश͕͘ͳΒͳ͍
17.
7 議論 - 良い点 -
ランタイムやスケーラビリティなどは優れている - 悪い点 - 複数の相互の⽂章の配列の関係のモデル化が困難 CNNベースのアーキテクチャがTransformerベースのアーキテクチャを 置き換える必要があるという主張するわけではなく, より広い選択肢を持ってアーキテクチャを探索する必要性を提⽰ 事前学習とアーキテクチャは分けて議論すべきと主張 17 ٞͷ·ͱΊ
18.
8 まとめ • Transformerにおけるself-attention層をconvolution層に変更した CNNベースの事前学習モデルを従来の事前学習モデルと⽐較。 •
7つの下流タスクで⽐較した結果, CNNベースの事前学習モデルは 従来の事前学習モデルに匹敵する, もしくは上回る性能を発揮すると主張。 • また,ランタイム、スケーラビリティの点で従来のTransformer ベースの事前学習に⽐べCNNベースの事前学習に 優位性があることを指摘 • 事前学習とTransformerアーキテクチャは分けて議論すべきと主張 18
19.
感想 • classificationには強そうだが, 幅広いタスクだときつそう •
層を増やして, ⼊⼒の系列⻑全てを⾒ることができるとどうなるのか • 複数の⽂章間の関係を捉えるのが苦⼿な割に, 既存研究では要約タスクも割とできているのが不思議 ->既存の事前学習モデルの要約は全体の⽊構造を軽視しているのかも 19
20.
DEEP LEARNING JP [DL
Papers] “Are Pre-trained Convolutions Better than Pre-trained Transformers? (2021)” Istuki Okimura, Matsuo Lab, B4 http://deeplearning.jp/
Download now