SlideShare a Scribd company logo
1 of 42
Download to read offline
휴먼인터페이스 연구실
Human Interface Lab.
WFST for Speech Recognition
and Utterance Verification
Won Ik Cho
05 April, 2017
Contents
• Task : Utterance verification
• WFST in Speech recognition
• WFST : Theory and examples
 Structures of FSA and WFST
 Operations
• Speech recognition revisited
 Construction of decoding network
 Decoding with WFST
• WFST level utterance verification
2
Task : Utterance verification
3
Task : Utterance verification
4
How can we judge whether
‘this sentence’ is question or not?
Raw data?
Task : Utterance verification
5
?
?
?
Task : Utterance verification
• Real-time WFST level detection?
6
WFST in Speech recognition
7
WFST in Speech recognition
• WFST gives common and natural representation for
major components of speech recognition systems :
 Hidden Markov models (HMMs)
 Context-dependency models
 Pronunciation dictionaries
 Statistical grammars
 Word/phone lattices
• Why WFST?
 Efficient algorithm exists
 A unified framework to represent
different layers of knowledges
 Can be optimized at training phase
8
WFST in Speech recognition
• WFST in KALDI
 Decoding graph : min(det(H ∘ C ∘ L ∘ G))
9
H: mapping from PDFs to context labels
C: mapping from context labels to phones
L: mapping from phones to words
G: grammar or language model
What are
∘, det, min?
WFST : Theory and examples
10
WFST : Theory and examples
• Finite state automata (acceptors)
 Representation of
possibly infinite set of strings
(ex) {ab}
Numbers in circle : state labels
Labels on arc : symbols
 Strings can be infinite
(ex) {aab}, {aaab}, …
 String is ‘accepted’ if
There is a path with that sequence of symbols on it
 Epsilon symbol : ‘no symbol there’
Usually symbol numbered 0
Simply making loop
11
Since they accept each string
that can be read along a path
WFST : Theory and examples
• Weighted set as semirings
 Ring : R(⊕, ⊗) with 0� and 1�
 Semiring : ring that does not require and additive
inverse for each element
Sum : to compute the weight of a sequence
Product : to compute the weight of a path
12
state label or symbol/weight
WFST : Theory and examples
• Weighted finite state automata
13
Toy finite-state
Language model
Possible pronunciation of ‘data’
In real language model
Weighted finite state automata consists of :
- Set of states
- An initial state
- Set of final states
- Set of transition between states
- Transition : source state/destination state/label/weight
WFST : Theory and examples
• Weighted finite state transducers :
 WFSA with input label, output label, weight on each
transition
 Transduces a phone string to a word string
14
That can be read along a path
from start state to a final state
Output by the transition that consumes
the first phone for that pronunciation
Input
Output
Output
WFST : Theory and examples
• Weighted finite state transducers contains more
information relatively to WFSA
 Can represent a relationship between two levels of
representation
(ex) between phones and words / between HMMs and context-
independent phones.
 Possible to combine the pronunciation transducers for
more than one word without losing word identity
15
WFST : Theory and examples
• Elementary operations
 Combine transducers in parallel, in series
Union
Concatenation
 Two weighted automata are equivalent
If they associate the same weight to each input string
• Composition
• Determinization
• Weight pushing
• Minimization
16
WFST : Theory and examples
• Composition
 Transducer operation for combining different levels of
representation
 Key operation for model combination
17
Composition in log probability semiring
WFST : Theory and examples
• Determinization
 To mean ‘deterministic on the input symbol’
 Deterministic automaton
1) If it has a unique initial state, and 2) If no two transitions
leaving any state share the same input label
 Key operation for redundant path removal
18
Determinization in tropical semiring
WFST : Theory and examples
• Weight pushing
 Creates an equivalent pushed/stochastic machine
Operation that ensures if the FST is stochastic
Stochastic FST : weights sum to one for each state
 Useful as first step of minimization, also redistributing
weight among transitions to improve pruned search
19
Weight pushing in probability semiring
WFST : Theory and examples
• Minimization
 Any deterministic weighted automaton can be
minimized
 Minimized automaton B of A
Has the least number of states and transitions
among all deterministic automata equivalent to A
 Key operation for size reduction
20
Deterministic WA
After weight pushing
in tropical semiring
Equivalent minimal WA
Speech recognition revisited
21
Speech recognition revisited
• WFST in KALDI
 Decoding graph : min(det(H ∘ C ∘ L ∘ G))
22
H: mapping from PDFs to context labels
C: mapping from context labels to phones
L: mapping from phones to words
G: grammar or language model
H ∘ C ∘ L ∘ G :
mapping from PDFs to words
based on language model
Speech recognition revisited
• Construction of decoding network
 Decoding graph : min(det(H ∘ C ∘ L ∘ G))
23
1) Decoder finds word pronunciations in its lexicon and
substitutes them into the grammar
(might be restricted to trigrams)
2) Decoder identifies the correct context-dependent models to use for
each phone in context
(might have to be triphonic)
3) Decoder substitutes them to create an HMM-level transducer
particular model topologies
mkgraph function in KALDI
Speech recognition revisited
• Construction of decoding network
 G : Probabilistic grammar or language model acceptor
Stochastic n-gram models can be represented compactly by
finite-state models
Input : word
Weight : history-dependent word probability
24
Word bigram
transducer model
−log(𝑝𝑝̂ 𝑤𝑤2 𝑤𝑤1 )
Backoff weight
Speech recognition revisited
• Construction of decoding network
 L : Pronunciation lexicon
Input : context-independent phone (phoneme)
Output : word
Weight : pronunciation probability
25
Speech recognition revisited
• Construction of decoding network
 L : Pronunciation lexicon
Non-deterministic because of homonyms
• Ex) read <-> red ?
Disambiguation symbols added
• Removed at last stage
26
Speech recognition revisited
• Construction of decoding network
 C : Context-dependency transducer
Input : context-dependent phone
(triphone)
Output : context-independent phone
(phone)
27
Non-deterministic Deterministic
Triphone:Phone/LeftContext_RightContext
Speech recognition revisited
• Construction of decoding network
 Decoding graph : min(det(H ∘ C ∘ L ∘ G))
C ∘ L ∘ G : transducer that maps from
context-dependent phones to word strings
restricted to grammar G
• Determinizable if C, L, G determinizable
• G determinizable if G is an n-gram language mode
• L may not be determinizable if L has ambiguities
• Revised 𝐿𝐿� with auxiliary homophone tagging
• Modified 𝐶𝐶̃ that pairs the context-independent auxiliary symbols
in the lexicon with new context-dependent auxiliary symbols
 𝐶𝐶̃ ∘ 𝐿𝐿� ∘ 𝐺𝐺 : revised determinizable and minimizable transducer
28
Speech recognition revisited
• Construction of decoding network
 H : HMM topology transducer (maps states to phonemes)
Input : state
Output : context-dependent phone (triphone)
Weight : HMM transition probability
29
Monophone case
without self-loops
Used in KALDI
Speech recognition revisited
• Construction of decoding network
 Decoding graph : min(det(𝐻𝐻� ∘ 𝐶𝐶̃ ∘ 𝐿𝐿� ∘ 𝐺𝐺))
 𝐶𝐶̃ ∘ 𝐿𝐿� ∘ 𝐺𝐺 : revised determinizable and minimizable transducer
• H : closure of the union of the individual HMMs
• 𝐻𝐻� : self-loops added to auxiliary distribution name input labels and
auxiliary context phone output labels
 𝐻𝐻� ∘ 𝐶𝐶̃ ∘ 𝐿𝐿� ∘ 𝐺𝐺 : transducer that maps from distribution to word
strings restricted to G
30
Standardized integrated transducer :
unique deterministic, minimal transducer for which
the weights for all transitions leaving any state sum to 1 in probability
Speech recognition revisited
• Construction of decoding network
31
Grammar G
Lexicon 𝐿𝐿�
Speech recognition revisited
• Construction of decoding network
32
𝐿𝐿� ∘ 𝐺𝐺
Det(𝐿𝐿� ∘ 𝐺𝐺)
Speech recognition revisited
• Construction of decoding network
33
𝑚𝑚𝑚𝑚𝑚𝑚𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡 (Det(𝐿𝐿� ∘ 𝐺𝐺))
𝑚𝑚𝑚𝑚𝑚𝑚𝑙𝑙𝑙𝑙𝑙𝑙 (Det(𝐿𝐿� ∘ 𝐺𝐺))
Conjectured to be best for pruning efficiency
of a standard Viterbi beam search
Speech recognition revisited
• Construction of decoding network
 Weight and label pushing
 Decoding the graph construction
 Decoding with WFSTs
Make outgoing arcs stochastic distribution
• Output labels not synchronized anymore in WFST
34
Speech recognition revisited
• Construction of decoding network
 Weight and label pushing
 Decoding the graph construction
 Decoding with WFSTs
Determinization for WFSTs can fail
Need to guarantee that the final HCLG is stochastic
• Needed for optimal pruning
35
Speech recognition revisited
• Construction of decoding network
 Weight and label pushing
 Decoding the graph construction
 Decoding with WFSTs
finding best path : Solving 𝑊𝑊′
= 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑊𝑊 𝑃𝑃 𝑋𝑋 𝑊𝑊 𝑃𝑃(𝑊𝑊)
• Compose recognizer as HCLG that maps states to word sequence
• Decode by aligning the feature vectors X with HCLG
• 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑊𝑊 𝑋𝑋 ∘ (𝐻𝐻 ∘ 𝐶𝐶 ∘ 𝐿𝐿 ∘ 𝐺𝐺)
36
WFST level utterance
verification
37
WFST level utterance verification
• Real-time WFST level detection?
38
WFST level utterance verification
• WFST level detection
 Operations in making graph
Make new decoding graph based on new corpus
• For example, from Q&A style corpus
 Operations in searching path
Detect the utterance by giving higher scores to paths related to
objective
• Lattice structure and classification algorithms in NLP can be
considered
39
Not a question!
(low score assigned by
classification algorithm)
Summary
• WFST gives common and natural representation for
major components of speech recognition systems.
• WFST in speech recognition system implies
decoding graph which maps PDF to words based on
language model.
• WFST-based utterance verification includes change
of weights in graphs such as in C, L or G; or lattice
structure reweighting.
40
Reference
• M. Mohri, F. Pereira, and M. Riley, “Speech recognition with weighted
finite-state transducers” In Springer Handbook of Speech Processing,
Springer Berlin Heidelberg, pp. 559-584, 2008.
• OpenFst: An Open-Source, Weighted Finite-State Transducer Library and
its Applications to Speech and Language, Part I. Theory and Algorithms.
http://www.openfst.org/twiki/pub/FST/FstHltTutorial/tutorial_part1.pdf
• M. Hannemann, Weighted Finite State Transducers in Automatic Speech
Recognition, ZRE lecture 15, Apr., 2015.
http://www.fit.vutbr.cz/study/courses/ZRE/public/pred/10_wfst_lvcsr/z
re_lecture_asr_wfst_2015.pdf
• T. Hanneforth, Finite-state Machines: Theory and Applications, Dec.,
2008.
http://tagh.de/tom/wp-content/uploads/fsm_weightedautomata.pdf
• Kind explanation on KALDI decoding
http://vpanayotov.blogspot.kr/2012/06/kaldi-decoding-graph-
construction.html
41
Thank you!
42

More Related Content

What's hot

What's hot (20)

Compiler Design- Machine Independent Optimizations
Compiler Design- Machine Independent OptimizationsCompiler Design- Machine Independent Optimizations
Compiler Design- Machine Independent Optimizations
 
CS571: Language Models
CS571: Language ModelsCS571: Language Models
CS571: Language Models
 
Block-based Speech to Speech Translation
Block-based Speech to Speech TranslationBlock-based Speech to Speech Translation
Block-based Speech to Speech Translation
 
Knowledge Representation & Reasoning
Knowledge Representation & ReasoningKnowledge Representation & Reasoning
Knowledge Representation & Reasoning
 
Cs6503 theory of computation lesson plan
Cs6503 theory of computation  lesson planCs6503 theory of computation  lesson plan
Cs6503 theory of computation lesson plan
 
Peephole optimization techniques in compiler design
Peephole optimization techniques in compiler designPeephole optimization techniques in compiler design
Peephole optimization techniques in compiler design
 
Siamese networks
Siamese networksSiamese networks
Siamese networks
 
Text summarization
Text summarization Text summarization
Text summarization
 
Language models
Language modelsLanguage models
Language models
 
ASIC
ASICASIC
ASIC
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
Ll(1) Parser in Compilers
Ll(1) Parser in CompilersLl(1) Parser in Compilers
Ll(1) Parser in Compilers
 
Semantic analysis
Semantic analysisSemantic analysis
Semantic analysis
 
Context free grammars
Context free grammarsContext free grammars
Context free grammars
 
Instruction types
Instruction typesInstruction types
Instruction types
 
Tic tac toe simple ai game
Tic tac toe simple ai gameTic tac toe simple ai game
Tic tac toe simple ai game
 
Encoder, decoder, multiplexers and demultiplexers
Encoder, decoder, multiplexers and demultiplexersEncoder, decoder, multiplexers and demultiplexers
Encoder, decoder, multiplexers and demultiplexers
 
Classical Planning
Classical PlanningClassical Planning
Classical Planning
 
Word embedding
Word embedding Word embedding
Word embedding
 
Vlsi design
Vlsi designVlsi design
Vlsi design
 

Similar to WFST

An Approach To Verilog-VHDL Interoperability For Synchronous Designs
An Approach To Verilog-VHDL Interoperability For Synchronous DesignsAn Approach To Verilog-VHDL Interoperability For Synchronous Designs
An Approach To Verilog-VHDL Interoperability For Synchronous DesignsDawn Cook
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Spark Summit
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler DesignKuppusamy P
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition子毅 楊
 
Can programming be liberated from the von neumann style?
Can programming be liberated from the von neumann style?Can programming be liberated from the von neumann style?
Can programming be liberated from the von neumann style?Oriol López Massaguer
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitShubham Verma
 
Ch 2.pptx
Ch 2.pptxCh 2.pptx
Ch 2.pptxwoldu2
 
Finals-review.pptx
Finals-review.pptxFinals-review.pptx
Finals-review.pptxamara jyothi
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
Hossein Taghavi : Codes on Graphs
Hossein Taghavi : Codes on GraphsHossein Taghavi : Codes on Graphs
Hossein Taghavi : Codes on Graphsknowdiff
 
Architectural_Synthesis_for_DSP_Structured_Datapaths
Architectural_Synthesis_for_DSP_Structured_DatapathsArchitectural_Synthesis_for_DSP_Structured_Datapaths
Architectural_Synthesis_for_DSP_Structured_DatapathsShereef Shehata
 
Slides:Coercion Quantification
Slides:Coercion QuantificationSlides:Coercion Quantification
Slides:Coercion QuantificationNingningXIE1
 
Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...Frank Kienle
 
Halide - 2
Halide - 2 Halide - 2
Halide - 2 Kobe Yu
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderAkira Tamamori
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured predictionzukun
 
70 C o m m u n i C at i o n s o f t h E a C m j u.docx
70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx
70 C o m m u n i C at i o n s o f t h E a C m j u.docxevonnehoggarth79783
 
Speech recognition final
Speech recognition finalSpeech recognition final
Speech recognition finalArchit Vora
 

Similar to WFST (20)

An Approach To Verilog-VHDL Interoperability For Synchronous Designs
An Approach To Verilog-VHDL Interoperability For Synchronous DesignsAn Approach To Verilog-VHDL Interoperability For Synchronous Designs
An Approach To Verilog-VHDL Interoperability For Synchronous Designs
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition
 
Can programming be liberated from the von neumann style?
Can programming be liberated from the von neumann style?Can programming be liberated from the von neumann style?
Can programming be liberated from the von neumann style?
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
 
Ch 2.pptx
Ch 2.pptxCh 2.pptx
Ch 2.pptx
 
Finals-review.pptx
Finals-review.pptxFinals-review.pptx
Finals-review.pptx
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Hossein Taghavi : Codes on Graphs
Hossein Taghavi : Codes on GraphsHossein Taghavi : Codes on Graphs
Hossein Taghavi : Codes on Graphs
 
Architectural_Synthesis_for_DSP_Structured_Datapaths
Architectural_Synthesis_for_DSP_Structured_DatapathsArchitectural_Synthesis_for_DSP_Structured_Datapaths
Architectural_Synthesis_for_DSP_Structured_Datapaths
 
Slides:Coercion Quantification
Slides:Coercion QuantificationSlides:Coercion Quantification
Slides:Coercion Quantification
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...
 
Halide - 2
Halide - 2 Halide - 2
Halide - 2
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
 
70 C o m m u n i C at i o n s o f t h E a C m j u.docx
70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx
70 C o m m u n i C at i o n s o f t h E a C m j u.docx
 
Speech recognition final
Speech recognition finalSpeech recognition final
Speech recognition final
 
Cdma 101
Cdma 101Cdma 101
Cdma 101
 

More from WarNik Chow

2206 FAccT_inperson
2206 FAccT_inperson2206 FAccT_inperson
2206 FAccT_inpersonWarNik Chow
 
2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech datasetWarNik Chow
 
2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2eWarNik Chow
 
2102 Redone seminar
2102 Redone seminar2102 Redone seminar
2102 Redone seminarWarNik Chow
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH WarNik Chow
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categoriesWarNik Chow
 
2010 HCLT Hate Speech
2010 HCLT Hate Speech2010 HCLT Hate Speech
2010 HCLT Hate SpeechWarNik Chow
 

More from WarNik Chow (20)

2312 PACLIC
2312 PACLIC2312 PACLIC
2312 PACLIC
 
2311 EAAMO
2311 EAAMO2311 EAAMO
2311 EAAMO
 
2211 HCOMP
2211 HCOMP2211 HCOMP
2211 HCOMP
 
2211 APSIPA
2211 APSIPA2211 APSIPA
2211 APSIPA
 
2211 AACL
2211 AACL2211 AACL
2211 AACL
 
2210 CODI
2210 CODI2210 CODI
2210 CODI
 
2206 FAccT_inperson
2206 FAccT_inperson2206 FAccT_inperson
2206 FAccT_inperson
 
2206 Modupop!
2206 Modupop!2206 Modupop!
2206 Modupop!
 
2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset
 
2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e
 
2106 PRSLLS
2106 PRSLLS2106 PRSLLS
2106 PRSLLS
 
2106 JWLLP
2106 JWLLP2106 JWLLP
2106 JWLLP
 
2106 ACM DIS
2106 ACM DIS2106 ACM DIS
2106 ACM DIS
 
2104 Talk @SSU
2104 Talk @SSU2104 Talk @SSU
2104 Talk @SSU
 
2103 ACM FAccT
2103 ACM FAccT2103 ACM FAccT
2103 ACM FAccT
 
2102 Redone seminar
2102 Redone seminar2102 Redone seminar
2102 Redone seminar
 
2011 NLP-OSS
2011 NLP-OSS2011 NLP-OSS
2011 NLP-OSS
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
 
2010 HCLT Hate Speech
2010 HCLT Hate Speech2010 HCLT Hate Speech
2010 HCLT Hate Speech
 

Recently uploaded

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 

Recently uploaded (20)

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 

WFST

  • 1. 휴먼인터페이스 연구실 Human Interface Lab. WFST for Speech Recognition and Utterance Verification Won Ik Cho 05 April, 2017
  • 2. Contents • Task : Utterance verification • WFST in Speech recognition • WFST : Theory and examples  Structures of FSA and WFST  Operations • Speech recognition revisited  Construction of decoding network  Decoding with WFST • WFST level utterance verification 2
  • 3. Task : Utterance verification 3
  • 4. Task : Utterance verification 4 How can we judge whether ‘this sentence’ is question or not? Raw data?
  • 5. Task : Utterance verification 5 ? ? ?
  • 6. Task : Utterance verification • Real-time WFST level detection? 6
  • 7. WFST in Speech recognition 7
  • 8. WFST in Speech recognition • WFST gives common and natural representation for major components of speech recognition systems :  Hidden Markov models (HMMs)  Context-dependency models  Pronunciation dictionaries  Statistical grammars  Word/phone lattices • Why WFST?  Efficient algorithm exists  A unified framework to represent different layers of knowledges  Can be optimized at training phase 8
  • 9. WFST in Speech recognition • WFST in KALDI  Decoding graph : min(det(H ∘ C ∘ L ∘ G)) 9 H: mapping from PDFs to context labels C: mapping from context labels to phones L: mapping from phones to words G: grammar or language model What are ∘, det, min?
  • 10. WFST : Theory and examples 10
  • 11. WFST : Theory and examples • Finite state automata (acceptors)  Representation of possibly infinite set of strings (ex) {ab} Numbers in circle : state labels Labels on arc : symbols  Strings can be infinite (ex) {aab}, {aaab}, …  String is ‘accepted’ if There is a path with that sequence of symbols on it  Epsilon symbol : ‘no symbol there’ Usually symbol numbered 0 Simply making loop 11 Since they accept each string that can be read along a path
  • 12. WFST : Theory and examples • Weighted set as semirings  Ring : R(⊕, ⊗) with 0� and 1�  Semiring : ring that does not require and additive inverse for each element Sum : to compute the weight of a sequence Product : to compute the weight of a path 12 state label or symbol/weight
  • 13. WFST : Theory and examples • Weighted finite state automata 13 Toy finite-state Language model Possible pronunciation of ‘data’ In real language model Weighted finite state automata consists of : - Set of states - An initial state - Set of final states - Set of transition between states - Transition : source state/destination state/label/weight
  • 14. WFST : Theory and examples • Weighted finite state transducers :  WFSA with input label, output label, weight on each transition  Transduces a phone string to a word string 14 That can be read along a path from start state to a final state Output by the transition that consumes the first phone for that pronunciation Input Output Output
  • 15. WFST : Theory and examples • Weighted finite state transducers contains more information relatively to WFSA  Can represent a relationship between two levels of representation (ex) between phones and words / between HMMs and context- independent phones.  Possible to combine the pronunciation transducers for more than one word without losing word identity 15
  • 16. WFST : Theory and examples • Elementary operations  Combine transducers in parallel, in series Union Concatenation  Two weighted automata are equivalent If they associate the same weight to each input string • Composition • Determinization • Weight pushing • Minimization 16
  • 17. WFST : Theory and examples • Composition  Transducer operation for combining different levels of representation  Key operation for model combination 17 Composition in log probability semiring
  • 18. WFST : Theory and examples • Determinization  To mean ‘deterministic on the input symbol’  Deterministic automaton 1) If it has a unique initial state, and 2) If no two transitions leaving any state share the same input label  Key operation for redundant path removal 18 Determinization in tropical semiring
  • 19. WFST : Theory and examples • Weight pushing  Creates an equivalent pushed/stochastic machine Operation that ensures if the FST is stochastic Stochastic FST : weights sum to one for each state  Useful as first step of minimization, also redistributing weight among transitions to improve pruned search 19 Weight pushing in probability semiring
  • 20. WFST : Theory and examples • Minimization  Any deterministic weighted automaton can be minimized  Minimized automaton B of A Has the least number of states and transitions among all deterministic automata equivalent to A  Key operation for size reduction 20 Deterministic WA After weight pushing in tropical semiring Equivalent minimal WA
  • 22. Speech recognition revisited • WFST in KALDI  Decoding graph : min(det(H ∘ C ∘ L ∘ G)) 22 H: mapping from PDFs to context labels C: mapping from context labels to phones L: mapping from phones to words G: grammar or language model H ∘ C ∘ L ∘ G : mapping from PDFs to words based on language model
  • 23. Speech recognition revisited • Construction of decoding network  Decoding graph : min(det(H ∘ C ∘ L ∘ G)) 23 1) Decoder finds word pronunciations in its lexicon and substitutes them into the grammar (might be restricted to trigrams) 2) Decoder identifies the correct context-dependent models to use for each phone in context (might have to be triphonic) 3) Decoder substitutes them to create an HMM-level transducer particular model topologies mkgraph function in KALDI
  • 24. Speech recognition revisited • Construction of decoding network  G : Probabilistic grammar or language model acceptor Stochastic n-gram models can be represented compactly by finite-state models Input : word Weight : history-dependent word probability 24 Word bigram transducer model −log(𝑝𝑝̂ 𝑤𝑤2 𝑤𝑤1 ) Backoff weight
  • 25. Speech recognition revisited • Construction of decoding network  L : Pronunciation lexicon Input : context-independent phone (phoneme) Output : word Weight : pronunciation probability 25
  • 26. Speech recognition revisited • Construction of decoding network  L : Pronunciation lexicon Non-deterministic because of homonyms • Ex) read <-> red ? Disambiguation symbols added • Removed at last stage 26
  • 27. Speech recognition revisited • Construction of decoding network  C : Context-dependency transducer Input : context-dependent phone (triphone) Output : context-independent phone (phone) 27 Non-deterministic Deterministic Triphone:Phone/LeftContext_RightContext
  • 28. Speech recognition revisited • Construction of decoding network  Decoding graph : min(det(H ∘ C ∘ L ∘ G)) C ∘ L ∘ G : transducer that maps from context-dependent phones to word strings restricted to grammar G • Determinizable if C, L, G determinizable • G determinizable if G is an n-gram language mode • L may not be determinizable if L has ambiguities • Revised 𝐿𝐿� with auxiliary homophone tagging • Modified 𝐶𝐶̃ that pairs the context-independent auxiliary symbols in the lexicon with new context-dependent auxiliary symbols  𝐶𝐶̃ ∘ 𝐿𝐿� ∘ 𝐺𝐺 : revised determinizable and minimizable transducer 28
  • 29. Speech recognition revisited • Construction of decoding network  H : HMM topology transducer (maps states to phonemes) Input : state Output : context-dependent phone (triphone) Weight : HMM transition probability 29 Monophone case without self-loops Used in KALDI
  • 30. Speech recognition revisited • Construction of decoding network  Decoding graph : min(det(𝐻𝐻� ∘ 𝐶𝐶̃ ∘ 𝐿𝐿� ∘ 𝐺𝐺))  𝐶𝐶̃ ∘ 𝐿𝐿� ∘ 𝐺𝐺 : revised determinizable and minimizable transducer • H : closure of the union of the individual HMMs • 𝐻𝐻� : self-loops added to auxiliary distribution name input labels and auxiliary context phone output labels  𝐻𝐻� ∘ 𝐶𝐶̃ ∘ 𝐿𝐿� ∘ 𝐺𝐺 : transducer that maps from distribution to word strings restricted to G 30 Standardized integrated transducer : unique deterministic, minimal transducer for which the weights for all transitions leaving any state sum to 1 in probability
  • 31. Speech recognition revisited • Construction of decoding network 31 Grammar G Lexicon 𝐿𝐿�
  • 32. Speech recognition revisited • Construction of decoding network 32 𝐿𝐿� ∘ 𝐺𝐺 Det(𝐿𝐿� ∘ 𝐺𝐺)
  • 33. Speech recognition revisited • Construction of decoding network 33 𝑚𝑚𝑚𝑚𝑚𝑚𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡 (Det(𝐿𝐿� ∘ 𝐺𝐺)) 𝑚𝑚𝑚𝑚𝑚𝑚𝑙𝑙𝑙𝑙𝑙𝑙 (Det(𝐿𝐿� ∘ 𝐺𝐺)) Conjectured to be best for pruning efficiency of a standard Viterbi beam search
  • 34. Speech recognition revisited • Construction of decoding network  Weight and label pushing  Decoding the graph construction  Decoding with WFSTs Make outgoing arcs stochastic distribution • Output labels not synchronized anymore in WFST 34
  • 35. Speech recognition revisited • Construction of decoding network  Weight and label pushing  Decoding the graph construction  Decoding with WFSTs Determinization for WFSTs can fail Need to guarantee that the final HCLG is stochastic • Needed for optimal pruning 35
  • 36. Speech recognition revisited • Construction of decoding network  Weight and label pushing  Decoding the graph construction  Decoding with WFSTs finding best path : Solving 𝑊𝑊′ = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑊𝑊 𝑃𝑃 𝑋𝑋 𝑊𝑊 𝑃𝑃(𝑊𝑊) • Compose recognizer as HCLG that maps states to word sequence • Decode by aligning the feature vectors X with HCLG • 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑊𝑊 𝑋𝑋 ∘ (𝐻𝐻 ∘ 𝐶𝐶 ∘ 𝐿𝐿 ∘ 𝐺𝐺) 36
  • 38. WFST level utterance verification • Real-time WFST level detection? 38
  • 39. WFST level utterance verification • WFST level detection  Operations in making graph Make new decoding graph based on new corpus • For example, from Q&A style corpus  Operations in searching path Detect the utterance by giving higher scores to paths related to objective • Lattice structure and classification algorithms in NLP can be considered 39 Not a question! (low score assigned by classification algorithm)
  • 40. Summary • WFST gives common and natural representation for major components of speech recognition systems. • WFST in speech recognition system implies decoding graph which maps PDF to words based on language model. • WFST-based utterance verification includes change of weights in graphs such as in C, L or G; or lattice structure reweighting. 40
  • 41. Reference • M. Mohri, F. Pereira, and M. Riley, “Speech recognition with weighted finite-state transducers” In Springer Handbook of Speech Processing, Springer Berlin Heidelberg, pp. 559-584, 2008. • OpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language, Part I. Theory and Algorithms. http://www.openfst.org/twiki/pub/FST/FstHltTutorial/tutorial_part1.pdf • M. Hannemann, Weighted Finite State Transducers in Automatic Speech Recognition, ZRE lecture 15, Apr., 2015. http://www.fit.vutbr.cz/study/courses/ZRE/public/pred/10_wfst_lvcsr/z re_lecture_asr_wfst_2015.pdf • T. Hanneforth, Finite-state Machines: Theory and Applications, Dec., 2008. http://tagh.de/tom/wp-content/uploads/fsm_weightedautomata.pdf • Kind explanation on KALDI decoding http://vpanayotov.blogspot.kr/2012/06/kaldi-decoding-graph- construction.html 41