SlideShare a Scribd company logo
1 of 23
DEEP LEARNING JP
[DL Papers] TIMs:Transformers with Competitive Ensembles of
Independent Mechanisms
XIN ZHANG, Matsuo Lab
http://deeplearning.jp/
1
書誌情報
● タイトル
○ Transformers with Competitive Ensembles of Independent Mechanisms
● 著者
○ Alex Lamb, Di He, Anirudh Goyal, Guolin Ke, Chien-Feng Liao, Mirco Ravanelli, Yoshua Bengio
● 研究機関:Mila, University of Montreal, Microsft Research Asia, Reaserach Center for Information
Technology Innovation, Academia Sinica.
● arXiv, Feb 2021
● 概要
○ Transformerのアーキテクチャを、独立メカニズムに基づいて改善する研究。
○ Attention機構で潜在表現の塊を複数のモジュールに分解して実現する。
2
1. Introduction
3
4
ModularityとIMはどれも関連した入力にモジュールが反応する仕組みだが、
IMは関係しない入力に反応しないことを強調。
- OOD汎化の根源?
Independent Causal Mechanisms(ICM)
ModularityとIndependent Mechanisms(IM)
Recurrent Independent Mechanisms(RIMs)
5
Transformerに注目
ポジション間の情報をAttention機構で学習し、特定の入力に反応できる。
ただ、Transformerの学習した表現は、ごちゃ混ぜ状態の全体表現になっている。
分解する必要ないからだが、汎化性能が落ちる原因になる。
6
2. Method
7
Transformer Independent Mechanisms(TIMs)
- TIMs:Transformerの潜在表現を分割して、モジュール構造を導入する。
- 例えば:3つのMechanismsを用意する場合は、三等分して同じPositionに合わせて重ねる
M1
P1
M1
P2
M3
P2
P1 P6
Transformer. 8
Transformer Independent Mechanisms(TIMs)
- 同じpositionにあるMechanismsらの競争を誘発する機構:
a. Mechanismsの潜在表現 h を1つのスカラーに
b. softmaxでスコアを計算して、Mechanismsの優先順位を決める。
- Positions間で情報を共有
a. PositionAttention
b. スコアを重みとして、hを更新
M1
P1
M1
P2
M3
P2
9
Transformer Independent Mechanisms(TIMs)
- 3. 同じPositionにあるMechanisms間で情報を共有
a. MechanismAttention:2 heads, 32 unitsで規模が小さい。
b. この部分がなければ、複数のTransformerを同時に使うことと一致する。
M1
P1
M1
P2
M3
P2
10
Transformer Independent Mechanisms(TIMs)
- 4. Mechanism軸とポジション軸と2つの軸
a. それぞれFFN順伝搬ネットワークで線形変換を行い、潜在変数hの更新
M1
P1
M1
P2
M3
P2
11
Transformer層はほぼTIMsに置き換えられる
Step0
Step1 Step2 Step3 Step4
Mechanism Module
Position
PositionAttention MechanismAttention FFN
P1 P1
M1
12
3. Experiments
13
- TIMが合理的で有意義な専門性を持つMechanismモジュールの学習が
できるのか?
3.1 Image Transformer
3.2 Speech Enhancement
- 独立したメカニズムを持つモデルを活用して、定量的な精度の向上に
繋げられるか?
3.3 BERT Pre-training and Fine-Tuning
3.4 CATER Occluded Object Tracking
14
3.1 Image Transormer
- 2つのMechanism Moduleでそれぞれ異なる情報を学習して欲しい
- 可視化で確認
- 左:MNISTとCIFARの画像を組み合わせたトイデータ
- 一つのModuleを励起して可視化
- 右:CIFAR-10データセット
- 物体と背景に対しても
15
3.2 Speech Enhancement
- 実世界のノイズを含んだ音声データデータの質を高めるタスク
- 音声とノイズを検出するため、2つのMechanism ModulesのTIMsを用いた
- 1/8のパラメータで、音質の良さを評価する指標においてSOTA.
16
3.3 BERT Pre-training and Fine-Tuning
- BERTと同様に事前学習を行い、複数のデータセットでFine-tuningをしてそれぞれの精度で比較
- 12層のTransformerを最初の2層と最後の1層を除いて、9層をTIMsに変換するのが良かった
- NoComp:without competition(Step 1のスコア)
17
3.4 CATER Occluded Object Tracking
- 入力は動画で、目標物体を追跡するタスクを行い、動画終了時に物体が存在する場所を出力する
- 動画から1秒間6枚の画像をサンリングして系列情報として入力
- 6 x 6 のグリッドに分割して、存在する場所を36分類問題として扱う
- LSTM, Transformerと比較して、精度は大幅に向上
18
4. Related Work
19
Independent Mechanisms and Modularity in Transformers
- TransformerのHeadsに対して、特定の情報を持たせる研究はいくつかある。
- Transformerの潜在表現を明確に複数のモジュールに分割する研究はおそらくない
- Group Transformer:Group Linearを導入.
- Universal transformer:Gate機構を追加、TIMsの競争機構と似たような動き
- Switch Transformers(図)
- RIMs
- 時系列
- TIMsは時空間
1.6万億のパラメータを持つSwitch Transformers
20
5. Conclusion
21
Discussion & Future work
- TIMs:TransformerにIndependent Mechanismsを導入した。独立メカニズムの概念はすごく賛成
- 実験
- Image transformerの可視化実験と音声増強の実験では、わかりやすい結果が得られた
- 一方で、BERTでははっきりした良さがいまいち。
- タスクの性質を考慮する必要がある
- 「世界が独立したメカニズムによって動かされている」という仮説がある
- 実際のタスクはどのぐらい関連するのか?が重要になる
22
Appendix
解説記事 :TIMs
23

More Related Content

More from Deep Learning JP

【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place RecognitionDeep Learning JP
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?Deep Learning JP
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究についてDeep Learning JP
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )Deep Learning JP
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...Deep Learning JP
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"Deep Learning JP
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "Deep Learning JP
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat ModelsDeep Learning JP
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"Deep Learning JP
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...Deep Learning JP
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...Deep Learning JP
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...Deep Learning JP
 
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...Deep Learning JP
 
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...Deep Learning JP
 
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデルDeep Learning JP
 
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...Deep Learning JP
 
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...Deep Learning JP
 
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLMDeep Learning JP
 
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without SupervisionDeep Learning JP
 

More from Deep Learning JP (20)

【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
 
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
 
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
 
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル
 
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
 
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
 
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM
 
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
 

【DL輪読会】TIMs:Transformers with Competitive Ensembles of Independent Mechanisms