SlideShare a Scribd company logo
1 of 17
Download to read offline
Introduction HMM Window Based MaxEnt CRF Summary References
Machine Learning for Sequential
Data: A Review
MD2K Reading Group
March 12, 2015
1 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Classical Supervised Learning
Given train set {(x1, y1), (x2, y2), ..., (xn, yn)}
x – features, independent variables, scalar/vector i.e. |x| ≥ 1; y ∈ Y –
labels/classes, dependent variables, scalar i.e. |y| = 1
Learn a model h ∈ H such that y = h(x)
Example: character classfication, x–image of hand written character,
y ∈ {A, B, ...Z}
y1
x1
yt−1
xt−1
yt
xt
yt+1
xt+1
yn
xn
1 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Sequential Supervised Learning (SSL)
Given train set (x1,n, y1,n), (x2,n, y2,n), ..., (xl,n, yl,n)
l – training instances each of length n (all training instances need not be of
the same length i.e. n could vary)
x – features, independent variables, scalar/vector; y ∈ Y – labels/classes,
dependent variables
Learn a model h ∈ H such that yl = h(xl)
SSL is different from time series prediction, sequence classification
Leverage sequential patterns and interactions (lines - L to R; dotted - R to
L)
Example: POS tagging, x–‘the dog saw a cat’ (English sentence),
y = {D, N, V, D, N}
yl,1
xl,1
yl,t−1
xl,t−1
yl,t
xl,t
yl,t+1
xl,t+1
yl,n
xl,n
2 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
1 Hidden Markov Models
2 Window based Approaches
3 Maximum Entropy Models
4 Conditional Random Fields
3 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Hidden Markov Models (HMM)
p(y|x) =
p(x|y) × p(y)
p(x)
(Baye’s rule, single class)
= p(x|y) × p(y) (since p(x) is the same across all classes)
= p(x, y)
= p(x1|x2, ..., xn, y) × p(x2|x3, ..., xn, y) × ... × p(y)
= p(x1|y) × p(x2|y) × ... × p(y) (N¨aive Bayes assumption)
∝ p(y)
n
i=1
p(xi|y) (N¨aive Bayes model, single class)
p(y|x) =
n
i=1
p(yi) × p(xi|yi) (predict whole sequence; x, y are vectors)
=
n
i=1
p(yi|yi−1) × p(xi|yi) (first order Markov property; tack on y0)
P(x) =
y∈Y
n
i=1
p(yi|yi−1) × p(xi|yi)
(Y–all possible combinations of y sequences)
4 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
HMM (contd...)
y1
x1
yt−1
xt−1
yt
xt
yt+1
xt+1
yn
xn
HMM‘s are generative models i.e. model joint probability p(x, y)
Predicts whole sequence
Models only the first order Markov property not suitable for many real
world applications
xt only influences yt. Cannot model dependencies like p(xt|yt−1, yt, yt+1)
which implies xt influences {yt−1, yt, yt+1}
5 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Sliding Window Approach
Sliding windows consider a window of features to make a decision e.g. yt
looks at xt−1, xt, xt+1 to make a decision
Predict single class
Can utilize any existing supervised learning algorithms without modification
e.g. SVM, logistic regression, etc
Cannot model dependencies between y labels (both short and long range)
y1
x1
yt−1
xt−1
yt
xt
yt+1
xt+1
yn
xn
6 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Recurrent Sliding Window Approach
Similar to sliding window approach
Models short range dependencies by using previous decision (yt−1) when
making current decision (yt)
Problem: Need y values when training and testing
y1
x1
yt−1
xt−1
yt
xt
yt+1
xt+1
yn
xn
7 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Maximum Entropy Model (MaxEnt)
Based on the Principle of Maximum Entropy (Jaynes, 1957)
– if incomplete information about a probability distribution
is available then the unbiased assumption that can be made
is a distribution which is as uniform as possible given the
available information
Uniform distribution - maximum entropy (primal problem)
Model available information - expressed as constraints over
training data (dual problem)
Discriminative model i.e. models p(y|x)
Predict a single class
8 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
MaxEnt (contd...)
I. Model the known (dual problem)
Train set = {(x1, y1), (x2, y2), ..., (xn, yn)} (given)
˜p(x, y) =
1
N
× No. of times (x, y) occurs in train set
(i.e. joint probability table)
fi(x, y) =
1, if y = k AND x = xk
0, otherwise
(e.g. y = physical activity AND x = HR ≥ 110bpm; 1 ≤ i ≤ m, m–number of features)
˜E(fi) =
x,y
˜p(x, y) × fi(x, y) (expected value of fi from training data)
E(fi) =
x,y
p(x, y) × fi(x, y)
(expected value of fi under model distribution)
=
x,y
p(y|x) × p(x) × fi(x, y)
=
x,y
p(y|x) × ˜p(x) × fi(x, y) (replace p(x) with ˜p(x))
we need to only learn the conditional probability as opposed to joint probability
9 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
MaxEnt (contd...)
˜E(fi) = E(fi)
x,y
˜p(x, y) × fi(x, y) =
x,y
p(y|x) × ˜p(x) × fi(x, y)
(goal is to find best conditional probability p∗(y|x))
II. Make zero assumptions about the unknown (primal problem)
H(y|x) = −
(x,y)∈(X×Y)
p(x, y) log p(y|x) (conditional Entropy)
III. Objective function and Lagrange multipliers
Λ(p∗
(y|x), ¯λ) = H(y|x) +
m
i=1
λi E(fi) − ˜E(fi) + λm+1


y∈Y
p(y|x) − 1


(objective function)
p∗
¯λ
(y|x) =
1
Z¯λ(x)
exp
m
i=1
λifi(x, y)
(maximize conditional distribution subject to constraints)
p∗
¯λ
(yt|yt−1, x) =
1
Z¯λ(yt−1, x)
exp
m
i=1
λifi(x, y)
(inducing the Markov property results in Maximum Entropy Markov Model (MEMM))
10 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Conditional Random Fields (CRF)
Discriminative model i.e. models p(y|x)
Conditional probability, p(y|x), is modeled as a product of
factors ψk(xk, yk)
Factors have log-linear representation –
ψk(Xk, yk) = exp(λk × φk(xk, yk))
Predicts whole sequence
p(y|x) =
1
Z(x)
C=C
ΨC(xC, yC) (CRF general form)
11 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Linear Chain CRF
y1
x1
yt−1
xt−1
yt
xt
yt+1
xt+1
yn
xn
φF φF φF φF φF φF φF φF φF
φT φT φT φT φT φT φT φT
p(yt|xt) =
1
Z(x)
exp (λF × φF (yt, xt) + λT × φT (yt, yt−1))
(individual prediction)
p(y|x) =
1
Z(x)
n
i=1
exp(λF × φF (yi, xi) + λT × φT (yi, yi−1))
(predict whole sequence; tack on y0)
p(y|x) =
1
Z(x)
n
i=1
exp


k
j=1
λj × φj(yi, yi−1, xi)


(general form of linear chain CRF‘s)
12 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
CRF (contd...)
y1
x1
y2
x2
yt−1
xt−1
yt
xt
yt+1
xt+1
yn
xn
φ1
φ2
p(yt|x, y1:t−1) =
1
Z(x)
exp(λ1 × φ1(yt, xt) + λ2 × φ2(yt, yt−1)+
λ3 × φ3(yt, x2) + λ4 × φ4(yt, xt−1) + λ5 × φ5(yt, xt+1)+
λ6 × φ6(yt, y1)) (additional features; induce loops)
13 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Figure: Sample CRF 14 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Model Space
Figure: Graphical models for sequential data[4]
Further reading refer to [3, 4, 2, 1]
15 / 16
Introduction HMM Window Based MaxEnt CRF Summary References
Berger, A.
A brief maxent tutorial.
www.cs.cmu.edu/afs/cs/user/aberger/www/html/tutorial/tutorial.html.
Blake, A., Kohli, P., and Rother, C.
Markov random fields for vision and image processing.
Mit Press, 2011.
Dietterich, T. G.
Machine learning for sequential data: A review.
In Structural, syntactic, and statistical pattern recognition. Springer, 2002,
pp. 15–30.
Klinger, R., and Tomanek, K.
Classical probabilistic models and conditional random fields.
TU, Algorithm Engineering, 2007.
16 / 16

More Related Content

What's hot

Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionMichael Stumpf
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputszukun
 
Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...Université de Liège (ULg)
 
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear ClassifiersLecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear ClassifiersMarina Santini
 
Lecture 5: Structured Prediction
Lecture 5: Structured PredictionLecture 5: Structured Prediction
Lecture 5: Structured PredictionMarina Santini
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
 
Beyond function approximators for batch mode reinforcement learning: rebuildi...
Beyond function approximators for batch mode reinforcement learning: rebuildi...Beyond function approximators for batch mode reinforcement learning: rebuildi...
Beyond function approximators for batch mode reinforcement learning: rebuildi...Université de Liège (ULg)
 
Contribution of Fixed Point Theorem in Quasi Metric Spaces
Contribution of Fixed Point Theorem in Quasi Metric SpacesContribution of Fixed Point Theorem in Quasi Metric Spaces
Contribution of Fixed Point Theorem in Quasi Metric SpacesAM Publications,India
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 
Loss Calibrated Variational Inference
Loss Calibrated Variational InferenceLoss Calibrated Variational Inference
Loss Calibrated Variational InferenceTomasz Kusmierczyk
 
Johan Suykens: "Models from Data: a Unifying Picture"
Johan Suykens: "Models from Data: a Unifying Picture" Johan Suykens: "Models from Data: a Unifying Picture"
Johan Suykens: "Models from Data: a Unifying Picture" ieee_cis_cyprus
 
Module ii sp
Module ii spModule ii sp
Module ii spVijaya79
 

What's hot (20)

Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputs
 
Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...
 
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear ClassifiersLecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
 
Lecture 5: Structured Prediction
Lecture 5: Structured PredictionLecture 5: Structured Prediction
Lecture 5: Structured Prediction
 
Slides ihp
Slides ihpSlides ihp
Slides ihp
 
Slides dauphine
Slides dauphineSlides dauphine
Slides dauphine
 
Slides compiegne
Slides compiegneSlides compiegne
Slides compiegne
 
ma112011id535
ma112011id535ma112011id535
ma112011id535
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
Beyond function approximators for batch mode reinforcement learning: rebuildi...
Beyond function approximators for batch mode reinforcement learning: rebuildi...Beyond function approximators for batch mode reinforcement learning: rebuildi...
Beyond function approximators for batch mode reinforcement learning: rebuildi...
 
Contribution of Fixed Point Theorem in Quasi Metric Spaces
Contribution of Fixed Point Theorem in Quasi Metric SpacesContribution of Fixed Point Theorem in Quasi Metric Spaces
Contribution of Fixed Point Theorem in Quasi Metric Spaces
 
Madrid easy
Madrid easyMadrid easy
Madrid easy
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
Loss Calibrated Variational Inference
Loss Calibrated Variational InferenceLoss Calibrated Variational Inference
Loss Calibrated Variational Inference
 
Johan Suykens: "Models from Data: a Unifying Picture"
Johan Suykens: "Models from Data: a Unifying Picture" Johan Suykens: "Models from Data: a Unifying Picture"
Johan Suykens: "Models from Data: a Unifying Picture"
 
Module ii sp
Module ii spModule ii sp
Module ii sp
 

Viewers also liked

Overview of solutions for machine monitoring
Overview of solutions for machine monitoringOverview of solutions for machine monitoring
Overview of solutions for machine monitoringIvan Zgela
 
Complex System Engineering
Complex System EngineeringComplex System Engineering
Complex System EngineeringEmmanuel Fuchs
 
Faulty radiographs
Faulty     radiographsFaulty     radiographs
Faulty radiographsmelbia shine
 
FINITE STATE MACHINE AND CHOMSKY HIERARCHY
FINITE STATE MACHINE AND CHOMSKY HIERARCHYFINITE STATE MACHINE AND CHOMSKY HIERARCHY
FINITE STATE MACHINE AND CHOMSKY HIERARCHYnishimanglani
 

Viewers also liked (6)

Overview of solutions for machine monitoring
Overview of solutions for machine monitoringOverview of solutions for machine monitoring
Overview of solutions for machine monitoring
 
Theory of machines
Theory of machinesTheory of machines
Theory of machines
 
Complex System Engineering
Complex System EngineeringComplex System Engineering
Complex System Engineering
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Faulty radiographs
Faulty     radiographsFaulty     radiographs
Faulty radiographs
 
FINITE STATE MACHINE AND CHOMSKY HIERARCHY
FINITE STATE MACHINE AND CHOMSKY HIERARCHYFINITE STATE MACHINE AND CHOMSKY HIERARCHY
FINITE STATE MACHINE AND CHOMSKY HIERARCHY
 

Similar to Machine Learning Models for Sequential Data

Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fieldslswing
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodFrank Nielsen
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimizationhelalmohammad2
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learningSteve Nouri
 
02-VariableLengthCodes_pres.pdf
02-VariableLengthCodes_pres.pdf02-VariableLengthCodes_pres.pdf
02-VariableLengthCodes_pres.pdfJunZhao68
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Christian Robert
 
Monte Carlo Methods
Monte Carlo MethodsMonte Carlo Methods
Monte Carlo MethodsJames Bell
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Gaussian process in machine learning
Gaussian process in machine learningGaussian process in machine learning
Gaussian process in machine learningVARUN KUMAR
 
isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..butest
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Statement of stochastic programming problems
Statement of stochastic programming problemsStatement of stochastic programming problems
Statement of stochastic programming problemsSSA KPI
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel LearningMasahiro Suzuki
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted ParaproductsVjekoslavKovac1
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsFrank Nielsen
 

Similar to Machine Learning Models for Sequential Data (20)

Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fields
 
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimization
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
rinko2010
rinko2010rinko2010
rinko2010
 
02-VariableLengthCodes_pres.pdf
02-VariableLengthCodes_pres.pdf02-VariableLengthCodes_pres.pdf
02-VariableLengthCodes_pres.pdf
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
 
Monte Carlo Methods
Monte Carlo MethodsMonte Carlo Methods
Monte Carlo Methods
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Gaussian process in machine learning
Gaussian process in machine learningGaussian process in machine learning
Gaussian process in machine learning
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
isabelle_webinar_jan..
isabelle_webinar_jan..isabelle_webinar_jan..
isabelle_webinar_jan..
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
5.n nmodels i
5.n nmodels i5.n nmodels i
5.n nmodels i
 
Statement of stochastic programming problems
Statement of stochastic programming problemsStatement of stochastic programming problems
Statement of stochastic programming problems
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted Paraproducts
 
k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 

More from BBKuhn

Sound shredding moustafa
Sound shredding moustafaSound shredding moustafa
Sound shredding moustafaBBKuhn
 
Smoking soujanya
Smoking soujanyaSmoking soujanya
Smoking soujanyaBBKuhn
 
Presentation yamin
Presentation yaminPresentation yamin
Presentation yaminBBKuhn
 
Md2k 0219 shang
Md2k 0219 shangMd2k 0219 shang
Md2k 0219 shangBBKuhn
 
2014.chi.structured labeling to facilitate concept evolution in machine learning
2014.chi.structured labeling to facilitate concept evolution in machine learning2014.chi.structured labeling to facilitate concept evolution in machine learning
2014.chi.structured labeling to facilitate concept evolution in machine learningBBKuhn
 
Md2 k 04_19_2015
Md2 k 04_19_2015Md2 k 04_19_2015
Md2 k 04_19_2015BBKuhn
 
March19 tun
March19 tunMarch19 tun
March19 tunBBKuhn
 
March12 rahman
March12 rahmanMarch12 rahman
March12 rahmanBBKuhn
 
March12 chatterjee
March12 chatterjeeMarch12 chatterjee
March12 chatterjeeBBKuhn
 
March12 alzantot
March12 alzantotMarch12 alzantot
March12 alzantotBBKuhn
 
March5 gao
March5 gaoMarch5 gao
March5 gaoBBKuhn
 
March5 bargar
March5 bargarMarch5 bargar
March5 bargarBBKuhn
 
MD2K Presentation to Stanford Mobilize (1/22/15)
MD2K Presentation to Stanford Mobilize (1/22/15)MD2K Presentation to Stanford Mobilize (1/22/15)
MD2K Presentation to Stanford Mobilize (1/22/15)BBKuhn
 

More from BBKuhn (13)

Sound shredding moustafa
Sound shredding moustafaSound shredding moustafa
Sound shredding moustafa
 
Smoking soujanya
Smoking soujanyaSmoking soujanya
Smoking soujanya
 
Presentation yamin
Presentation yaminPresentation yamin
Presentation yamin
 
Md2k 0219 shang
Md2k 0219 shangMd2k 0219 shang
Md2k 0219 shang
 
2014.chi.structured labeling to facilitate concept evolution in machine learning
2014.chi.structured labeling to facilitate concept evolution in machine learning2014.chi.structured labeling to facilitate concept evolution in machine learning
2014.chi.structured labeling to facilitate concept evolution in machine learning
 
Md2 k 04_19_2015
Md2 k 04_19_2015Md2 k 04_19_2015
Md2 k 04_19_2015
 
March19 tun
March19 tunMarch19 tun
March19 tun
 
March12 rahman
March12 rahmanMarch12 rahman
March12 rahman
 
March12 chatterjee
March12 chatterjeeMarch12 chatterjee
March12 chatterjee
 
March12 alzantot
March12 alzantotMarch12 alzantot
March12 alzantot
 
March5 gao
March5 gaoMarch5 gao
March5 gao
 
March5 bargar
March5 bargarMarch5 bargar
March5 bargar
 
MD2K Presentation to Stanford Mobilize (1/22/15)
MD2K Presentation to Stanford Mobilize (1/22/15)MD2K Presentation to Stanford Mobilize (1/22/15)
MD2K Presentation to Stanford Mobilize (1/22/15)
 

Recently uploaded

Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 

Machine Learning Models for Sequential Data

  • 1. Introduction HMM Window Based MaxEnt CRF Summary References Machine Learning for Sequential Data: A Review MD2K Reading Group March 12, 2015 1 / 16
  • 2. Introduction HMM Window Based MaxEnt CRF Summary References Classical Supervised Learning Given train set {(x1, y1), (x2, y2), ..., (xn, yn)} x – features, independent variables, scalar/vector i.e. |x| ≥ 1; y ∈ Y – labels/classes, dependent variables, scalar i.e. |y| = 1 Learn a model h ∈ H such that y = h(x) Example: character classfication, x–image of hand written character, y ∈ {A, B, ...Z} y1 x1 yt−1 xt−1 yt xt yt+1 xt+1 yn xn 1 / 16
  • 3. Introduction HMM Window Based MaxEnt CRF Summary References Sequential Supervised Learning (SSL) Given train set (x1,n, y1,n), (x2,n, y2,n), ..., (xl,n, yl,n) l – training instances each of length n (all training instances need not be of the same length i.e. n could vary) x – features, independent variables, scalar/vector; y ∈ Y – labels/classes, dependent variables Learn a model h ∈ H such that yl = h(xl) SSL is different from time series prediction, sequence classification Leverage sequential patterns and interactions (lines - L to R; dotted - R to L) Example: POS tagging, x–‘the dog saw a cat’ (English sentence), y = {D, N, V, D, N} yl,1 xl,1 yl,t−1 xl,t−1 yl,t xl,t yl,t+1 xl,t+1 yl,n xl,n 2 / 16
  • 4. Introduction HMM Window Based MaxEnt CRF Summary References 1 Hidden Markov Models 2 Window based Approaches 3 Maximum Entropy Models 4 Conditional Random Fields 3 / 16
  • 5. Introduction HMM Window Based MaxEnt CRF Summary References Hidden Markov Models (HMM) p(y|x) = p(x|y) × p(y) p(x) (Baye’s rule, single class) = p(x|y) × p(y) (since p(x) is the same across all classes) = p(x, y) = p(x1|x2, ..., xn, y) × p(x2|x3, ..., xn, y) × ... × p(y) = p(x1|y) × p(x2|y) × ... × p(y) (N¨aive Bayes assumption) ∝ p(y) n i=1 p(xi|y) (N¨aive Bayes model, single class) p(y|x) = n i=1 p(yi) × p(xi|yi) (predict whole sequence; x, y are vectors) = n i=1 p(yi|yi−1) × p(xi|yi) (first order Markov property; tack on y0) P(x) = y∈Y n i=1 p(yi|yi−1) × p(xi|yi) (Y–all possible combinations of y sequences) 4 / 16
  • 6. Introduction HMM Window Based MaxEnt CRF Summary References HMM (contd...) y1 x1 yt−1 xt−1 yt xt yt+1 xt+1 yn xn HMM‘s are generative models i.e. model joint probability p(x, y) Predicts whole sequence Models only the first order Markov property not suitable for many real world applications xt only influences yt. Cannot model dependencies like p(xt|yt−1, yt, yt+1) which implies xt influences {yt−1, yt, yt+1} 5 / 16
  • 7. Introduction HMM Window Based MaxEnt CRF Summary References Sliding Window Approach Sliding windows consider a window of features to make a decision e.g. yt looks at xt−1, xt, xt+1 to make a decision Predict single class Can utilize any existing supervised learning algorithms without modification e.g. SVM, logistic regression, etc Cannot model dependencies between y labels (both short and long range) y1 x1 yt−1 xt−1 yt xt yt+1 xt+1 yn xn 6 / 16
  • 8. Introduction HMM Window Based MaxEnt CRF Summary References Recurrent Sliding Window Approach Similar to sliding window approach Models short range dependencies by using previous decision (yt−1) when making current decision (yt) Problem: Need y values when training and testing y1 x1 yt−1 xt−1 yt xt yt+1 xt+1 yn xn 7 / 16
  • 9. Introduction HMM Window Based MaxEnt CRF Summary References Maximum Entropy Model (MaxEnt) Based on the Principle of Maximum Entropy (Jaynes, 1957) – if incomplete information about a probability distribution is available then the unbiased assumption that can be made is a distribution which is as uniform as possible given the available information Uniform distribution - maximum entropy (primal problem) Model available information - expressed as constraints over training data (dual problem) Discriminative model i.e. models p(y|x) Predict a single class 8 / 16
  • 10. Introduction HMM Window Based MaxEnt CRF Summary References MaxEnt (contd...) I. Model the known (dual problem) Train set = {(x1, y1), (x2, y2), ..., (xn, yn)} (given) ˜p(x, y) = 1 N × No. of times (x, y) occurs in train set (i.e. joint probability table) fi(x, y) = 1, if y = k AND x = xk 0, otherwise (e.g. y = physical activity AND x = HR ≥ 110bpm; 1 ≤ i ≤ m, m–number of features) ˜E(fi) = x,y ˜p(x, y) × fi(x, y) (expected value of fi from training data) E(fi) = x,y p(x, y) × fi(x, y) (expected value of fi under model distribution) = x,y p(y|x) × p(x) × fi(x, y) = x,y p(y|x) × ˜p(x) × fi(x, y) (replace p(x) with ˜p(x)) we need to only learn the conditional probability as opposed to joint probability 9 / 16
  • 11. Introduction HMM Window Based MaxEnt CRF Summary References MaxEnt (contd...) ˜E(fi) = E(fi) x,y ˜p(x, y) × fi(x, y) = x,y p(y|x) × ˜p(x) × fi(x, y) (goal is to find best conditional probability p∗(y|x)) II. Make zero assumptions about the unknown (primal problem) H(y|x) = − (x,y)∈(X×Y) p(x, y) log p(y|x) (conditional Entropy) III. Objective function and Lagrange multipliers Λ(p∗ (y|x), ¯λ) = H(y|x) + m i=1 λi E(fi) − ˜E(fi) + λm+1   y∈Y p(y|x) − 1   (objective function) p∗ ¯λ (y|x) = 1 Z¯λ(x) exp m i=1 λifi(x, y) (maximize conditional distribution subject to constraints) p∗ ¯λ (yt|yt−1, x) = 1 Z¯λ(yt−1, x) exp m i=1 λifi(x, y) (inducing the Markov property results in Maximum Entropy Markov Model (MEMM)) 10 / 16
  • 12. Introduction HMM Window Based MaxEnt CRF Summary References Conditional Random Fields (CRF) Discriminative model i.e. models p(y|x) Conditional probability, p(y|x), is modeled as a product of factors ψk(xk, yk) Factors have log-linear representation – ψk(Xk, yk) = exp(λk × φk(xk, yk)) Predicts whole sequence p(y|x) = 1 Z(x) C=C ΨC(xC, yC) (CRF general form) 11 / 16
  • 13. Introduction HMM Window Based MaxEnt CRF Summary References Linear Chain CRF y1 x1 yt−1 xt−1 yt xt yt+1 xt+1 yn xn φF φF φF φF φF φF φF φF φF φT φT φT φT φT φT φT φT p(yt|xt) = 1 Z(x) exp (λF × φF (yt, xt) + λT × φT (yt, yt−1)) (individual prediction) p(y|x) = 1 Z(x) n i=1 exp(λF × φF (yi, xi) + λT × φT (yi, yi−1)) (predict whole sequence; tack on y0) p(y|x) = 1 Z(x) n i=1 exp   k j=1 λj × φj(yi, yi−1, xi)   (general form of linear chain CRF‘s) 12 / 16
  • 14. Introduction HMM Window Based MaxEnt CRF Summary References CRF (contd...) y1 x1 y2 x2 yt−1 xt−1 yt xt yt+1 xt+1 yn xn φ1 φ2 p(yt|x, y1:t−1) = 1 Z(x) exp(λ1 × φ1(yt, xt) + λ2 × φ2(yt, yt−1)+ λ3 × φ3(yt, x2) + λ4 × φ4(yt, xt−1) + λ5 × φ5(yt, xt+1)+ λ6 × φ6(yt, y1)) (additional features; induce loops) 13 / 16
  • 15. Introduction HMM Window Based MaxEnt CRF Summary References Figure: Sample CRF 14 / 16
  • 16. Introduction HMM Window Based MaxEnt CRF Summary References Model Space Figure: Graphical models for sequential data[4] Further reading refer to [3, 4, 2, 1] 15 / 16
  • 17. Introduction HMM Window Based MaxEnt CRF Summary References Berger, A. A brief maxent tutorial. www.cs.cmu.edu/afs/cs/user/aberger/www/html/tutorial/tutorial.html. Blake, A., Kohli, P., and Rother, C. Markov random fields for vision and image processing. Mit Press, 2011. Dietterich, T. G. Machine learning for sequential data: A review. In Structural, syntactic, and statistical pattern recognition. Springer, 2002, pp. 15–30. Klinger, R., and Tomanek, K. Classical probabilistic models and conditional random fields. TU, Algorithm Engineering, 2007. 16 / 16