3. 4
[Lafferty+, 01] Conditional Random Fields: Probabilistic Models
for Segmenting and Labeling Sequence Data. John Lafferty, Andrew
McCallum, Fernando Pereira. Proceedings of ICML’01, 2001.
[Collins, 02] Discriminative training methods for hidden markov
models: Theory and experiments with perceptron algorithms.
Michael Collins. Proceedings of EMNLP’02, 2002.
[Morency+, 07] Latent-dynamic discriminative models for
continuous gesture recognition. Louis-Philippe Morency, Ariadna
Quattoni, and Trevor Darrell. Proceedings of CVPR’07, 2007.
[Sun+, 09] Latent Variable Perceptron Algorithm for Structured
Classification. Xu Sun, Takuya Matsuzaki, Daisuke Okanohara and
Jun’ichi Tsujii. Proceedings of IJCAI’09, 2009 3
17. Structured Perceptron
‣
‣ (xi , yi )
∗
F (yi |xi , Θ)
∗
=Θ· f (yi , xi )
∗
(xi , yi )
∗
yi = argmax F (y|xi , Θ ) i
y
yi = ∗
yi yi = ∗
yi
Θ i+1
=Θ + i
f (yi , xi )
∗
− f (yi , xi ) Θ i+1
=Θ i
17
18. Structured Perceptron
Θ i+1
=Θ + i
f (yi , xi )
∗
− f (yi , xi )
Θ i+1
· (f (yi , xi )
∗
− f (yi , xi ))
2
=Θ · i
(f (yi , xi )
∗
− f (yi , xi )) + f (yi , xi )
∗
− f (yi , xi )2
⇔ F (yi |xi , Θ )
∗ i+1
− F (yi |xi , Θ i+1
)
2
= F (yi |xi , Θi )
∗
− F (yi |xi , Θ ) +
i
f (yi , xi )
∗
− f (yi , xi )2
≥0
18
19. Structured Perceptron
Θ i+1
=Θ + i
f (yi , xi )
∗
− f (yi , xi )
∗
yi yi
F (yi |xi , Θ )
∗ i+1
− F (yi |xi , Θ i+1
)
2
= F (yi |xi , Θi )
∗
− F (yi |xi , Θ ) +
i
f (yi , xi )
∗
− f (yi , xi )2
≥0
19
33. Latent Variable Perceptron
(xi , yi )
∗
hi = argmax F (hi |xi , Θ),
h
yi = Proj(hi )
yi = ∗
yi yi = ∗
yi
Θ i+1
=Θ +i
f (hi , xi )
∗
− f (h, xi ) Θ i+1
=Θ i
∗
hi
∗
hi = argmax F (h|xi , Θ )
i
∗
h:Proj(h)=yi 33
34. mistake bound
δ0
{(xi , yi )}i=1
∗ d
M
2T M 2 2
M≤
δ2
T d
M = max f (y, xi )2 .
i,y
34