SlideShare a Scribd company logo
1 of 129
Transformer & BERT
Review by Hyunwoong (github.com/gusdnd852)
1
Presentation Outline
1. The Transformer Network (Backbone of BERT)
1.1 Attention Mechanism (Seq2Seq with Attention)
1.2 Transformer Architecture (Transformer)
2. Background : emergence of BERT (Introduction & Related Work)
2.1 Word Embedding (Word Representation in ML)
2.2 Embedding from Language Model (ELMo)
2.3 Unsupervised Pre-train, Supervised Fine Tune (GPT)
3. BERT & Experiment (Proposed Method & Experimental Result)
2
1.1 Attention Mechanism (Seq2Seq with Attention)
3
4
1.1 Attention Mechanism (Seq2Seq with Attention)
There are plain Seq2Seq model
Reference : Raimi
Karim
5
1.1 Attention Mechanism (Seq2Seq with Attention)
If sequence is long, it shows very bad performance
But, Why?
Reference : Raimi
Karim
6
1.1 Attention Mechanism (Seq2Seq with Attention)
Context vector size is fixed, we can not put every information
Reference : Raimi
Karim
7
1.1 Attention Mechanism (Seq2Seq with Attention)
Network confused.
“so many information in small vector. which part is important?”
Reference : Raimi
Karim
8
1.1 Attention Mechanism (Seq2Seq with Attention)
So, We will use output of each time step
Reference : Raimi
Karim
9
1.1 Attention Mechanism (Seq2Seq with Attention)
Firstly, we can get the encoder hidden state
Reference : Raimi
Karim
10
1.1 Attention Mechanism (Seq2Seq with Attention)
And, compute dot product of encoder hidden state and
output of decoder(firstly, BOS’s output) to get relationship of words
Reference : Raimi
Karim
11
1.1 Attention Mechanism (Seq2Seq with Attention)
We call this framework as Query, Key, Value
Reference : Raimi
Karim
12
1.1 Attention Mechanism (Seq2Seq with Attention)
Query
Key Key Key Key
Reference : Raimi
Karim
13
1.1 Attention Mechanism (Seq2Seq with Attention)
Query
Key Key Key Key
And compute softmax score to adjust dot product to [0,1]
Reference : Raimi
Karim
14
1.1 Attention Mechanism (Seq2Seq with Attention)
Multiply score with keys (hidden state of encoders)
Query
Key Key Key Key
Value Value Value Value
Reference : Raimi
Karim
15
1.1 Attention Mechanism (Seq2Seq with Attention)
Add Every Value. Now we call this Align (Attention)
Query
Key Key Key Key
Value Value Value Value
Align (attention)
Reference : Raimi
Karim
16
1.1 Attention Mechanism (Seq2Seq with Attention)
Align (Attention) is used as input
Reference : Raimi
Karim
17
1.1 Attention Mechanism (Seq2Seq with Attention)
Align (Attention) is used as input
NMT by jointly learning and align (2014)
Reference : Raimi
Karim
18
1.1 Attention Mechanism (Seq2Seq with Attention)
Effective Approaches to Attention-based NMT (2015)
Reference : Raimi
Karim
Align (Attention) is used as input
19
1.2 Transformer Architecture
20
1.2 Transformer Architecture
1. Parallelism (Recurrent to Pos Encoding)
2. Consider self attention (not only enc-dec attention)
21
1.2 Transformer Architecture
There are <s></s><PAD><UNK> tokens (max_len = 6)
Reference : Kim Dong Hwa
Input data format.
22
1.2 Transformer Architecture
Data size is like this (zero initialized)
d_model = 4, max_len = 6
I : [0, 0, 0, 0]
am : [0, 0, 0, 0]
kim : [0, 0, 0, 0]
<PAD> : [0, 0, 0, 0]
<PAD> : [0, 0, 0, 0]
<PAD> : [0, 0, 0, 0]
Reference : Kim Dong Hwa
Input data format.
data = torch.zeros(batch_size, max_len, d_model)
size = data.size() # [?, 6, 4]
? Is batchsize (number of sentence in here)
23
1.2 Transformer Architecture
nn.Embedding(vocab_size, d_model) work like this
Reference : Kim Dong Hwa
Input Embedding
24
1.2 Transformer Architecture
Pos Encoding – 1 (Weight Matrix)
Reference : Kim Dong Hwa
Positional Encoding
25
1.2 Transformer Architecture
Pos Encoding – 2 (Sinusoid Function)
Reference : Kim Dong Hwa
Positional Encoding
26
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Embedding Dropout
27
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Multi-Head Attention
28
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Multi-Head Attention
29
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Multi-Head Attention
30
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Multi-Head Attention
31
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Multi-Head Attention (Apply Mask)
32
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Multi-Head Attention (Apply Mask)
33
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Multi-Head Attention (Apply Mask)
34
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Multi-Head Attention
35
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Multi-Head Attention
36
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Add & Norm
37
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Add & Norm
38
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Add & Norm
39
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Positionwise Feed Foward
40
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Positionwise Feed Foward
41
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Add & Norm
42
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Decoder
43
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Masked Multi-Head Attention
44
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Masked Multi-Head Attention
45
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Masked Multi-Head Attention
46
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Masked Multi-Head Attention
47
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Masked Multi-Head Attention (Apply Mask)
48
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Masked Multi-Head Attention (Apply Mask)
49
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Masked Multi-Head Attention
50
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Add & Norm
51
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Encoder–Decoder Multi-Head Attention
52
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Encoder–Decoder Multi-Head Attention
53
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Encoder–Decoder Multi-Head Attention
54
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Encoder–Decoder Multi-Head Attention
55
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Encoder–Decoder Multi-Head Attention
56
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Add & Norm
57
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Positionwise Feed Foward
58
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Positionwise Feed Foward
59
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Add & Norm
60
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Linear & Softmax
61
1.2 Transformer Architecture
Reference : Kim Dong Hwa
Summary
2.1 Word Embedding (Word Representation in ML)
62
2.1 Word Embedding (Word Representation in ML)
63
How to represent word in computer ?
2.1 Word Embedding (Word Representation in ML)
64
… We Want Dense Representation !!
Sparse Representation (One-Hot)
2.1 Word Embedding (Word Representation in ML)
65
Dense Representation (Word2Vec)
CBOWSkip Gram
2.1 Word Embedding (Word Representation in ML)
66
Dense Representation (Word2Vec)
CBOWSkip Gram
2.1 Word Embedding (Word Representation in ML)
67
Dense Representation (Improved Word2Vec)
"apple" : "ap", "app", "appl", "apple"
2.1 Word Embedding (Word Representation in ML)
68
Word Embedding with Pre-training
2.1 Word Embedding (Word Representation in ML)
69
But They can not consider contextual meaning of word
2.1 Word Embedding (Word Representation in ML)
70
I Love an Apple. It is delicious than Banana.
I Love the Apple. It is better than Samsung.
same embedding vector… is it ok?
2.1 Word Embedding (Word Representation in ML)
71
School, Home, Hospital, Church, Temple …
They will have similar embedding vector. Because they only
consider about whether they appear at the same time
[going, to , _____ , I , am] are appear at the same time.
4/28 : Today, I am going to _____ , I am so happy.
4/29 : Today, I am going to _____ , I am so sad…
4/30 : Today, I am going to _____ , I am so exciting!
2.1 Word Embedding (Word Representation in ML)
72
But, think again... Is Church similar with Temple?
Is School similar with Home? …Really?
~~
~~
73
2.2 Embedding from Language Model (ELMo)
2.2 Embedding from Language Model (ELMo)
74
Language Model : To assign probability to sequence.
Consider previous words and Predict next word
2.2 Embedding from Language Model (ELMo)
75
Consider neighbor word
Word2Vec
Consider previous words
Language Model
Language Model : To assign probability to sequence.
2.2 Embedding from Language Model (ELMo)
76
Embedding from Language Model (ELMo)
ELMo uses 2 Language Model consisted of Multi Layer RNN(LSTM) : BiLM
2.2 Embedding from Language Model (ELMo)
77
Embedding from Language Model (ELMo)
To embed the word ”play”, ELMo uses the result of each layer
inside the dotted rectangle above as the material.
2.2 Embedding from Language Model (ELMo)
78
Embedding from Language Model (ELMo)
1. Concatenate
Forward LM & Backward LM
2. Multiply
Wight at each layer’s outputs
3. Add
Every layer’s outputs
4. Scale
by multiplying constant
2.2 Embedding from Language Model (ELMo)
79
Embedding from Language Model (ELMo)
We can use ELMo representation for downstream task with embedding
And Now, Model knows about word’s contextual meaning.
2.2 Embedding from Language Model (ELMo)
80
Embedding from Language Model (ELMo)
But, there are still a big problem
We still lack the data specific to a particular task
2.2 Embedding from Language Model (ELMo)
81
We still lack the data specific to a particular task
Text Classification
2.2 Embedding from Language Model (ELMo)
82
We still lack the data specific to a particular task
Named Entity Recognition
2.2 Embedding from Language Model (ELMo)
83
We still lack the data specific to a particular task
Machine Translation
2.2 Embedding from Language Model (ELMo)
84
We still lack the data specific to a particular task
Question & Answering
2.3 Unsupervised Pre-train, Supervised Fine Tune (GPT)
85
2.3 Unsupervised Pre-train, Supervised Fine Tune (GPT)
86
But we have a lot of unlabeled text data.
2.3 Unsupervised Pre-train, Supervised Fine Tune (GPT)
87
But we have a lot of unlabeled text data.
2.3 Unsupervised Pre-train, Supervised Fine Tune (GPT)
88
We want universal NLU Model trained unlabeled data like human
So, GPT tries semi-supervised approach with pre-train & fine tune
to solve that problem (lack of task specific dataset)
2.3 Unsupervised Pre-train, Supervised Fine Tune (GPT)
89
Unsupervised Pre-train
Unsupervised Corpus of token U = u1, u2, … , un
The maximum likelihood of maximizing a standard LM objective is:
K = window size
Theta = NN’s parameter
2.3 Unsupervised Pre-train, Supervised Fine Tune (GPT)
90
Unsupervised Pre-train
Unsupervised Corpus of token U = u1, u2, … , un
We = Embedding , Wp = Positional
P(u) is output of model (next word prediction)
2.3 Unsupervised Pre-train, Supervised Fine Tune (GPT)
91
Unsupervised Pre-train
2.3 Unsupervised Pre-train, Supervised Fine Tune (GPT)
92
Supervised Fine Tune
2.3 Unsupervised Pre-train, Supervised Fine Tune (GPT)
93
Supervised Fine Tune
Supervised Corpus of token X = x1, x2, … , xm, label = Y
pass through the linear output layer once more to predict the y value
2.3 Unsupervised Pre-train, Supervised Fine Tune (GPT)
94
Auxiliary Loss
Supervised Corpus of token X = x1, x2, … , xm, label = Y
pass through the linear output layer once more to predict the y value
2.3 Unsupervised Pre-train, Supervised Fine Tune (GPT)
95
Framework
96
3. BERT & Experiments
97
3. BERT & Experiments
Pre-training
Reference : Kim Dong Hwa
98
3. BERT & Experiments
Pre-training
Reference : Kim Dong Hwa
99
3. BERT & Experiments
Pre-training
Reference : Kim Dong Hwa
100
3. BERT & Experiments
Pre-training : Segmentation
Reference : Kim Dong Hwa
101
3. BERT & Experiments
Pre-training : Segmentation
Reference : Kim Dong Hwa
102
3. BERT & Experiments
Pre-training : Segmentation
Reference : Kim Dong Hwa
103
3. BERT & Experiments
Pre-training : Segmentation
Reference : Kim Dong Hwa
104
3. BERT & Experiments
Pre-training : Segmentation
Reference : Kim Dong Hwa
105
3. BERT & Experiments
Preprocessing : Next Sentence
Reference : Kim Dong Hwa
106
3. BERT & Experiments
Preprocessing : Next Sentence
Reference : Kim Dong Hwa
107
3. BERT & Experiments
Preprocessing : Length & Mask
Reference : Kim Dong Hwa
108
3. BERT & Experiments
Preprocessing : Length & Mask
Reference : Kim Dong Hwa
109
3. BERT & Experiments
Preprocessing : 학습되는 데이터
Reference : Kim Dong Hwa
110
3. BERT & Experiments
BERT vs Transformer
Reference : Kim Dong Hwa
111
3. BERT & Experiments
BERT vs Transformer
Reference : Kim Dong Hwa
112
3. BERT & Experiments
BERT vs Transformer
Reference : Kim Dong Hwa
113
3. BERT & Experiments
BERT vs Transformer
Reference : Kim Dong Hwa
114
3. BERT & Experiments
BERT vs Transformer
Reference : Kim Dong Hwa
115
3. BERT & Experiments
Token Embedding
Reference : Kim Dong Hwa
116
3. BERT & Experiments
Segment Embedding
Reference : Kim Dong Hwa
117
3. BERT & Experiments
Positional Embedding
Reference : Kim Dong Hwa
118
3. BERT & Experiments
Norm
Reference : Kim Dong Hwa
119
3. BERT & Experiments
GELU
Reference : Kim Dong Hwa
120
3. BERT & Experiments
Forwarding
Reference : Kim Dong Hwa
121
3. BERT & Experiments
Loss
Reference : Kim Dong Hwa
122
3. BERT & Experiments
Next Sentence Loss
Reference : Kim Dong Hwa
123
3. BERT & Experiments
Next Sentence Loss
Reference : Kim Dong Hwa
124
3. BERT & Experiments
Mask Loss
Reference : Kim Dong Hwa
125
3. BERT & Experiments
Mask Loss
Reference : Kim Dong Hwa
126
3. BERT & Experiments
Fine Tune
Reference : Kim Dong Hwa
127
3. BERT & Experiments
Fine Tune
Reference : Kim Dong Hwa
128
3. BERT & Experiments
Experiment Result
Reference : Kim Dong Hwa
129
3. BERT & Experiments
Experiment Result
Reference : Kim Dong Hwa

More Related Content

What's hot

NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Yuta Niki
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)WarNik Chow
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)H K Yoon
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear ComplexitySangwoo Mo
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Jeong-Gwan Lee
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanismKhang Pham
 

What's hot (20)

Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
 
Bert
BertBert
Bert
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
 
rnn BASICS
rnn BASICSrnn BASICS
rnn BASICS
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
 
BERT
BERTBERT
BERT
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 

Similar to BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflowKeon Kim
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model佳蓉 倪
 
Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...
Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...
Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...Jonathan Salwan
 
assembler_full_slides.ppt
assembler_full_slides.pptassembler_full_slides.ppt
assembler_full_slides.pptAshwini864432
 
Workshop Assembler
Workshop AssemblerWorkshop Assembler
Workshop AssemblerTuhin_Das
 
Reading_0413_var_Transformers.pptx
Reading_0413_var_Transformers.pptxReading_0413_var_Transformers.pptx
Reading_0413_var_Transformers.pptxcongtran88
 
220921_atttention_is_all_you_need_논문리뷰.pdf
220921_atttention_is_all_you_need_논문리뷰.pdf220921_atttention_is_all_you_need_논문리뷰.pdf
220921_atttention_is_all_you_need_논문리뷰.pdfminalang
 
Virtualization summary b.docx
Virtualization summary b.docxVirtualization summary b.docx
Virtualization summary b.docxshruti533256
 
Lightweight APIs in mRuby
Lightweight APIs in mRubyLightweight APIs in mRuby
Lightweight APIs in mRubyPivorak MeetUp
 
Finite Element Analysis of Truss Structures
Finite Element Analysis of Truss StructuresFinite Element Analysis of Truss Structures
Finite Element Analysis of Truss StructuresMahdi Damghani
 
Deep Learning to Text
Deep Learning to TextDeep Learning to Text
Deep Learning to TextJian-Kai Wang
 
LLaMA_Final The Meta LLM Presentation.pptx
LLaMA_Final The Meta LLM Presentation.pptxLLaMA_Final The Meta LLM Presentation.pptx
LLaMA_Final The Meta LLM Presentation.pptxDr. Yasir Butt
 
labour force
labour forcelabour force
labour forceS S
 
1-Phases of compiler-26-04-2023.pptx
1-Phases of compiler-26-04-2023.pptx1-Phases of compiler-26-04-2023.pptx
1-Phases of compiler-26-04-2023.pptxvenkatapranaykumarGa
 
A survey of paradigms for building and
A survey of paradigms for building andA survey of paradigms for building and
A survey of paradigms for building andcseij
 

Similar to BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (20)

Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflow
 
Assembler
AssemblerAssembler
Assembler
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
ViT.pptx
ViT.pptxViT.pptx
ViT.pptx
 
Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...
Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...
Sstic 2015 detailed_version_triton_concolic_execution_frame_work_f_saudel_jsa...
 
assembler_full_slides.ppt
assembler_full_slides.pptassembler_full_slides.ppt
assembler_full_slides.ppt
 
Workshop Assembler
Workshop AssemblerWorkshop Assembler
Workshop Assembler
 
Reading_0413_var_Transformers.pptx
Reading_0413_var_Transformers.pptxReading_0413_var_Transformers.pptx
Reading_0413_var_Transformers.pptx
 
220921_atttention_is_all_you_need_논문리뷰.pdf
220921_atttention_is_all_you_need_논문리뷰.pdf220921_atttention_is_all_you_need_논문리뷰.pdf
220921_atttention_is_all_you_need_논문리뷰.pdf
 
Virtualization summary b.docx
Virtualization summary b.docxVirtualization summary b.docx
Virtualization summary b.docx
 
Lightweight APIs in mRuby
Lightweight APIs in mRubyLightweight APIs in mRuby
Lightweight APIs in mRuby
 
Finite Element Analysis of Truss Structures
Finite Element Analysis of Truss StructuresFinite Element Analysis of Truss Structures
Finite Element Analysis of Truss Structures
 
Handout#06
Handout#06Handout#06
Handout#06
 
Deep Learning to Text
Deep Learning to TextDeep Learning to Text
Deep Learning to Text
 
LLaMA_Final The Meta LLM Presentation.pptx
LLaMA_Final The Meta LLM Presentation.pptxLLaMA_Final The Meta LLM Presentation.pptx
LLaMA_Final The Meta LLM Presentation.pptx
 
labour force
labour forcelabour force
labour force
 
First pass of assembler
First pass of assemblerFirst pass of assembler
First pass of assembler
 
1-Phases of compiler-26-04-2023.pptx
1-Phases of compiler-26-04-2023.pptx1-Phases of compiler-26-04-2023.pptx
1-Phases of compiler-26-04-2023.pptx
 
A survey of paradigms for building and
A survey of paradigms for building andA survey of paradigms for building and
A survey of paradigms for building and
 

More from gohyunwoong

Large scale-lm-part1
Large scale-lm-part1Large scale-lm-part1
Large scale-lm-part1gohyunwoong
 
Machine translation survey vol2
Machine translation survey   vol2Machine translation survey   vol2
Machine translation survey vol2gohyunwoong
 
Machine translation survey - vol1
Machine translation survey  - vol1Machine translation survey  - vol1
Machine translation survey - vol1gohyunwoong
 
Pretrained summarization on distillation
Pretrained summarization on distillationPretrained summarization on distillation
Pretrained summarization on distillationgohyunwoong
 
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...gohyunwoong
 

More from gohyunwoong (7)

Large scale-lm-part1
Large scale-lm-part1Large scale-lm-part1
Large scale-lm-part1
 
Parallelformers
ParallelformersParallelformers
Parallelformers
 
GPT-X
GPT-XGPT-X
GPT-X
 
Machine translation survey vol2
Machine translation survey   vol2Machine translation survey   vol2
Machine translation survey vol2
 
Machine translation survey - vol1
Machine translation survey  - vol1Machine translation survey  - vol1
Machine translation survey - vol1
 
Pretrained summarization on distillation
Pretrained summarization on distillationPretrained summarization on distillation
Pretrained summarization on distillation
 
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
 

Recently uploaded

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 

Recently uploaded (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding