6. Attention, also referred to as enthrallment, is the behavioral and cognitive process
of selectively concentrating on a discrete aspect of information, whether deemed
subjective or objective, while ignoring other perceivable information. It is a state of
arousal. . It is the taking possession by the mind in clear and vivid form of one out
of what seem several simultaneous objects or trains of thought. Focalization, the
concentration of consciousness, is of its essence. Attention or enthrallment or
attention has also been described as the allocation of limited cognitive processing
resources.
7. Attention, also referred to as enthrallment, is the behavioral and cognitive process
of selectively concentrating on a discrete aspect of information, whether deemed
subjective or objective, while ignoring other perceivable information. It is a state of
arousal. . It is the taking possession by the mind in clear and vivid form of one out
of what seem several simultaneous objects or trains of thought. Focalization, the
concentration of consciousness, is of its essence. Attention or enthrallment or
attention has also been described as the allocation of limited cognitive processing
resources.
8. Attention, also referred to as enthrallment, is the behavioral and cognitive process
of selectively concentrating on a discrete aspect of information, whether deemed
subjective or objective, while ignoring other perceivable information. It is a state of
arousal. It is the taking possession by the mind in clear and vivid form of one out of
what seem several simultaneous objects or trains of thought. Focalization, the
concentration of consciousness, is of its essence. Attention or enthrallment or
attention has also been described as the allocation of limited cognitive processing
resources.
Recurrent Neural Network
...
attention also referred resources
문제 : Non-parallel computation, not long-range dependencies
9. Attention, also referred to as enthrallment, is the behavioral and cognitive process
of selectively concentrating on a discrete aspect of information, whether deemed
subjective or objective, while ignoring other perceivable information. It is a state of
arousal. It is the taking possession by the mind in clear and vivid form of one out of
what seem several simultaneous objects or trains of thought. Focalization, the
concentration of consciousness, is of its essence. Attention or enthrallment or
attention has also been described as the allocation of limited cognitive processing
resources.
Convolution Neural Network
attention also ... cognitive process
of selectively ... whether deemed
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
concentrat of ... enthrallment or
attention has ... cognitive processing
Filter
문제 : Not long-range dependencies, computationally inefficient
10. Attention, also referred to as enthrallment, is the behavioral and cognitive process
of selectively concentrating on a discrete aspect of information, whether deemed
subjective or objective, while ignoring other perceivable information. It is a state of
arousal. . It is the taking possession by the mind in clear and vivid form of one out
of what seem several simultaneous objects or trains of thought. Focalization, the
concentration of consciousness, is of its essence. Attention or enthrallment or
attention has also been described as the allocation of limited cognitive processing
resources.
Attention mechanism
Parallel computation, long-range dependencies, explainable
12. Attention mechanism
Fig. from Vaswani et al. Attention is all you need. ArXiv. 2017
2. 너무 큰 값이 지배적이지 않도록 normalize
1. Q 와 K 간의 유사도를 구합니다 .
13. Attention mechanism
Fig. from Vaswani et al. Attention is all you need. ArXiv. 2017
2. 너무 큰 값이 지배적이지 않도록 normalize
1. Q 와 K 간의 유사도를 구합니다 .
3. 유사도 → 가중치 ( 총 합 =1)
14. Attention mechanism
Fig. from Vaswani et al. Attention is all you need. ArXiv. 2017
2. 너무 큰 값이 지배적이지 않도록 normalize
3. 유사도 → 가중치 ( 총 합 =1)
1. Q 와 K 간의 유사도를 구합니다 .
4. 가중치를
V 에 곱해줍니다 .
15. Attention mechanism
Fig. from Vaswani et al. Attention is all you need. ArXiv. 2017
정보 {K:V} 가 어떤 Q 와 연관이 있을 것입니다 .
이를 활용해서 K 와 Q 의 유사도를 구하고 이를 , V 에 반영해줍시다 .
그럼 Q 에 직접적으로 연관된 V 의 정보를 더 많이 전달해 줄 수 있을 것입
니다 .
2. 너무 큰 값이 지배적이지 않도록 normalize
3. 유사도 → 가중치 ( 총 합 =1)
1. Q 와 K 간의 유사도를 구합니다 .
4. 가중치를
V 에 곱해줍니다 .
16. e.g. Attention mechanism with Seq2Seq
...
Encoder
Decoder
...
Decoder 의 정보 전달은 오직 이
전 t 의 정보에 의존적입니다 .
Encoder 의 마지막 정보가
Decoder 로 전달됩니다 .
Encoder 의 정보 전달은 이전
t 의 hidden state, 현재 t 의
input 에 의존적입니다 .
(Machine translation, Encoder-Decoder, Attention)
17. e.g. Attention mechanism with Seq2Seq
(Machine translation, Encoder-Decoder, Attention)
⊕
...
Decoder
Encoder
...
Attention
Long-range dependency
18. e.g. Attention mechanism with Seq2Seq
Fig from Bahdanau et al. Neural Machine Translation by Jointly Learning to Align and Translate. ICLR. 2015
(Machine translation, Encoder-Decoder, Attention)
Attention
19. e.g. Style-token
Fig. from Wang et al. Style-tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis.
ArXiv. 2018
Decoder
Encoder1
Encoder2
GST
(Random init token)
⊕
Attention
(Text to speech, Encoder-Decoder, Style transfer, Attention)
Demo: https://google.github.io/tacotron/publications/global_style_tokens/
23. Fig. from Wang et al. Non-local neural networks. ArXiv. 2017.
1. i, j pixel 간의 유사도를 구한다 .
Self-attention
24. Fig. from Wang et al. Non-local neural networks. ArXiv. 2017.
1. i, j pixel 간의 유사도를 구한다 .
2. j pixel 값을 곱한다 .
Self-attention
25. Fig. from Wang et al. Non-local neural networks. ArXiv. 2017.
1. i, j pixel 간의 유사도를 구한다 .
2. j pixel 값을 곱한다 .
3. normalization 항
Self-attention
26. Fig. from Wang et al. Non-local neural networks. ArXiv. 2017.
1. i, j pixel 간의 유사도를 구한다 .
i, j 번째 정보는 서로 연관이 있을 것입니다 .
각 위치 별 유사도를 구하고 이를 가중치로 반영해줍시다 .
그럼 , 모든 위치 별 관계를 학습 할 수 있을 것입니다 .
(Long-range dependency!)
2. j pixel 값을 곱한다 .
3. normalization 항
Self-attention
27. e.g. Self-Attention GAN
(Image generation, GAN, Self-attention)
Transpose
Conv ⊕
Latent
(z)
Image
(x’)
Self-
Attention
Conv ⊕ FC
Self-
Attention
ProbImage
(x)
Generator
Discriminator
28. Fig. from Zhang et al. Self-Attention Generative Adversarial Networks. ArXiv. 2018.
e.g. Self-Attention GAN
(Image generation, GAN, Self-attention)
31. Reference
- Bahdanau et al. Neural Machine Translation by Jointly Learning to Align and Translate. ICLR. 2015
- Wang et al. Non-local neural networks. ArXiv. 2017
- Vaswani et al. Attention is all you need. ArXiv. 2017
- Wang et al. Style-tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End
Speech Synthesis. ArXiv. 2018
- Zhang et al. Self-Attention Generative Adversarial Networks. ArXiv. 2018.
- Attention is all you need 설명 블로그
(https://mchromiak.github.io/articles/2017/Sep/12/Transformer-Attention-is-all-you-need/)
- Attention is all you need 설명 동영상
(https://www.youtube.com/watch?v=iDulhoQ2pro)