SlideShare a Scribd company logo
1 of 141
Download to read offline
PRML 2.4-2.5

The exponential family
&
Nonparametric methods	
 
June 11, 2014
by Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
Almost all of the distributions we studied so far belong
to a single class, namely the exponential family.	

June 11, 2014
 PRML 2.4-2.5
The exponential family	
 
Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
Almost all of the distributions we studied so far belong
to a single class, namely the exponential family.	

June 11, 2014
 PRML 2.4-2.5
Bernoulli,
The exponential family	
 
Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
Almost all of the distributions we studied so far belong
to a single class, namely the exponential family.	

June 11, 2014
 PRML 2.4-2.5
Bernoulli, multinomial,
The exponential family	
 
Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
Almost all of the distributions we studied so far belong
to a single class, namely the exponential family.	

June 11, 2014
 PRML 2.4-2.5
Bernoulli, multinomial, Gaussian, 
The exponential family	
 
Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
Almost all of the distributions we studied so far belong
to a single class, namely the exponential family.	

June 11, 2014
 PRML 2.4-2.5
Bernoulli, multinomial, Gaussian,
beta,
The exponential family	
 
Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
Almost all of the distributions we studied so far belong
to a single class, namely the exponential family.	

June 11, 2014
 PRML 2.4-2.5
Bernoulli, multinomial, Gaussian,
beta, gamma,
The exponential family	
 
Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
Almost all of the distributions we studied so far belong
to a single class, namely the exponential family.	

June 11, 2014
 PRML 2.4-2.5
Bernoulli, multinomial, Gaussian,
beta, gamma, von Mises...etc.	
 
The exponential family	
 
Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
Almost all of the distributions we studied so far belong
to a single class, namely the exponential family.	

June 11, 2014
 PRML 2.4-2.5
Parametric distributions	
 
Bernoulli, multinomial, Gaussian,
beta, gamma, von Mises...etc.	
 
The exponential family	
 
Gaussian mixture...etc.	
 
Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
p(x|η) = h(x)g(η) exp ηT
u(x)
The Exponential Family
The exponential family over x given 	

is a class of distributions which form is	

	

	

η
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
p(x|η) = h(x)g(η) exp ηT
u(x)
The Exponential Family
The exponential family over x given 	

is a class of distributions which form is	

	

	

η
Natural parameter	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
p(x|η) = h(x)g(η) exp ηT
u(x)
The Exponential Family
The exponential family over x given 	

is a class of distributions which form is	

	

	

η
Natural parameter	
  Where and
come across 	
 
x η
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
p(x|η) = h(x)g(η) exp ηT
u(x)
The Exponential Family
The exponential family over x given 	

is a class of distributions which form is	

	

	

η
Natural parameter	
 
Normalizing constant	
 
Where and
come across 	
 
x η
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
E.g. 1) The Bernoulli Distribution	

	

	

p(x|η) = µx
(1 − µ)1−x
= σ(−η) exp(ηx)
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
E.g. 1) The Bernoulli Distribution	

	

	

	

where	

η = ln
µ
1 − µ
p(x|η) = µx
(1 − µ)1−x
= σ(−η) exp(ηx)
u(x)
h(x) = 1
g(η)
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
E.g. 2) The Multinomial Distribution	

	

 p(x|η) = µxk
k
= exp(ηT
x)
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
E.g. 2) The Multinomial Distribution	

	

	

	

where	

η = (ln µ1, . . . , ln µM )T
⇒ exp(ηk) = µk = 1
p(x|η) = µxk
k
= exp(ηT
x)
u(x)
h(x) = 1
g(η) = 1
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
E.g. 2) The Multinomial Distribution	

	

	

	

where	

η = (ln µ1, . . . , ln µM )T
⇒ exp(ηk) = µk = 1
p(x|η) = µxk
k
= exp(ηT
x)
It's inconvenient!	
 
u(x)
h(x) = 1
g(η) = 1
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
E.g. 2) The Multinomial Distribution	

	

Remove the constraint by	

	

	

	

	

	

	

	

µM = 1 −
M−1
k=1 µk, xM = 1 −
M−1
k=1 xk
p(x|µ) = exp
M−1
k=1
xk ln µk + 1 −
M−1
k=1
xk ln 1 −
M−1
k=1
µk
= exp
M−1
k=1
xk ln
µk
1 −
M−1
k=1 µk
+ ln 1 −
M−1
k=1
µk
= 1 −
M−1
k=1
µk exp
M−1
k=1
xk ln
µk
1 −
M−1
k=1 µk
.
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
E.g. 2) The Multinomial Distribution	

	

Remove the constraint by	

	

	

	

	

	

	

	

µM = 1 −
M−1
k=1 µk, xM = 1 −
M−1
k=1 xk
p(x|µ) = exp
M−1
k=1
xk ln µk + 1 −
M−1
k=1
xk ln 1 −
M−1
k=1
µk
= exp
M−1
k=1
xk ln
µk
1 −
M−1
k=1 µk
+ ln 1 −
M−1
k=1
µk
= 1 −
M−1
k=1
µk exp
M−1
k=1
xk ln
µk
1 −
M−1
k=1 µk
.
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
E.g. 2) The Multinomial Distribution	

	

Remove the constraint by	

	

	

	

	

	

	

	

Therefore...	

µM = 1 −
M−1
k=1 µk, xM = 1 −
M−1
k=1 xk
p(x|µ) = exp
M−1
k=1
xk ln µk + 1 −
M−1
k=1
xk ln 1 −
M−1
k=1
µk
= exp
M−1
k=1
xk ln
µk
1 −
M−1
k=1 µk
+ ln 1 −
M−1
k=1
µk
= 1 −
M−1
k=1
µk exp
M−1
k=1
xk ln
µk
1 −
M−1
k=1 µk
.
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
E.g. 2') The Multinomial Distribution 	

w/o constraint	

	

	

	

	

where	

p(x|η) = µxk
k
= 1 +
M−1
k=1
exp(ηk)
−1
exp(ηT
x)
η = ln µ1
1−
P
j µj
, . . . , ln µM−1
1−
P
j µj
, 0
T
u(x)
h(x) = 1
g(η)
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
E.g. 3) The Gaussian Distribution 	

	

p(x|η) =
1
(2πσ2)1/2
exp −
1
2σ2
(x − µ)2
= (2π)−1/2
(−2η2)1/2
exp
η2
1
4η2
exp η1 η2
x
x2
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
The Exponential Family
E.g. 3) The Gaussian Distribution 	

	

	

	

	

	

where	

u(x)
h(x) = 1
g(η)
p(x|η) =
1
(2πσ2)1/2
exp −
1
2σ2
(x − µ)2
= (2π)−1/2
(−2η2)1/2
exp
η2
1
4η2
exp η1 η2
x
x2
η =
µ
σ2
, −
1
2σ2
T
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Maximum likelihood for EF	
 
OK, we know what EF looks like.	

Then, how to estimate the parameter?	

	

Maximize likelihood!	

Frequentist way.	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Maximum likelihood for EF	
 
Suppose we have i.i.d. data , 	

The log-likelihood of is 	
 
June 11, 2014
 PRML 2.4-2.5
η
X = {x1, . . . , xN }
Shinichi TAMURA
ln p(X|η) = ln
N
n=1
p(xn|η)
= ln
N
n=1
h(xn)g(η) exp ηT
u(xn)
=
N
n=1
ln h(xn) + N ln g(η) + ηT
N
n=1
u(xn).
∴ η ln p(X|η) = N η ln g(η) +
N
n=1
u(xn). −→ 0
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Maximum likelihood for EF	
 
Suppose we have i.i.d. data , 	

The log-likelihood of is 	
 
June 11, 2014
 PRML 2.4-2.5
η
X = {x1, . . . , xN }
Shinichi TAMURA
ln p(X|η) = ln
N
n=1
p(xn|η)
= ln
N
n=1
h(xn)g(η) exp ηT
u(xn)
=
N
n=1
ln h(xn) + N ln g(η) + ηT
N
n=1
u(xn).
∴ η ln p(X|η) = N η ln g(η) +
N
n=1
u(xn). −→ 0
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Maximum likelihood for EF	
 
Suppose we have i.i.d. data , 	

The log-likelihood of is 	
 
June 11, 2014
 PRML 2.4-2.5
η
X = {x1, . . . , xN }
Shinichi TAMURA
ln p(X|η) = ln
N
n=1
p(xn|η)
= ln
N
n=1
h(xn)g(η) exp ηT
u(xn)
=
N
n=1
ln h(xn) + N ln g(η) + ηT
N
n=1
u(xn).
∴ η ln p(X|η) = N η ln g(η) +
N
n=1
u(xn). −→ 0
By putting this to zero
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Maximum likelihood for EF	
 
Therefore	

	

	

	

Here, is determined only through , 	

so it is called “sufficient statistics”.	

	

We need to store only for estimation.	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
− η ln g(ηML) =
1
N
N
n=1
u(xn).
ηML n u(xn)
n u(xn)
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Maximum likelihood for EF	
 
E.g.) Gaussian distribution	

By and ,	

	

	

	

	

	

	

	

That's what we already know.	

	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
g(η) = (−2η2)1/2
exp η2
1/4η2 u(x) = (x, x2
)T
− ln g(η) =
− η1
2η2
− 1
2η2
+
η2
1
4η2
2
=
µ
σ2
+ µ2 .
∴ µML =
1
N n
xn,
σ2
ML =
1
N n
x2
n −
1
N n
xn
2
.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Maximum likelihood for EF	
 
By the way, we want to know 	

the relation between and .	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
ηηML
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Maximum likelihood for EF	
 
Gradient of	

by gives	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
η
h(x)g(η) exp ηT
u(x) dx = 1
g(η) h(x) exp ηT
u(x) dx
+ h(x)g(η) exp ηT
u(x) u(x)dx = 0.
⇔ − ln g(η) = E [u(x)] .
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Maximum likelihood for EF	
 
Gradient of	

by gives	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
η
h(x)g(η) exp ηT
u(x) dx = 1
g(η) h(x) exp ηT
u(x) dx
+ h(x)g(η) exp ηT
u(x) u(x)dx = 0.
⇔ − ln g(η) = E [u(x)] .
Similar to	
 − η ln g(ηML) =
1
N
N
n=1
u(xn)
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Maximum likelihood for EF	
 
According to LLN, sample mean will converge to the
expectation, so will converge to .	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
ηηML
− η ln g(ηML) =
1
N
N
n=1
u(xn)
− ln g(η) = E [u(x)]
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Maximum likelihood for EF	
 
According to LLN, sample mean will converge to the
expectation, so will converge to .	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
ηηML
− η ln g(ηML) =
1
N
N
n=1
u(xn)
− ln g(η) = E [u(x)]
Converge
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Maximum likelihood for EF	
 
According to LLN, sample mean will converge to the
expectation, so will converge to .	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
ηηML
− η ln g(ηML) =
1
N
N
n=1
u(xn)
− ln g(η) = E [u(x)]
Converge	
 Converge
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF	
 
If you want to use the Bayesian inference, 	

a prior distribution is needed.	

	

Then, how to decide it, 	

if we don't know anything about the parameter?	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF	
 
Three candidates:	

	

1. Conjugate priors 	


2. Uniform distributions 	


3. Noninformative priors	

	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF	
 
Three candidates:	

	

1. Conjugate priors 	

... Easy to handle
2. Uniform distributions 	


3. Noninformative priors	

	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF	
 
Three candidates:	

	

1. Conjugate priors 	

... Easy to handle
2. Uniform distributions 	

... Principle of indifference
3. Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF	
 
Three candidates:	

	

1. Conjugate priors 	

... Easy to handle
2. Uniform distributions 	

... Principle of indifference
3. Noninformative priors	

... Make effects of priors little	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Conjugate priors	
 
Three candidates:	

	

1. Conjugate priors 	

... Easy to handle
2. Uniform distributions 	

... Principle of indifference
3. Noninformative priors	

... Make effects of priors little	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Conjugate priors	
 
Distributions of EF has factors of ,
so conjugate priors is	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
g(η) exp(ηT
u)
p(η|X, ν) = f(X, ν) g(η) exp{ηT
X}
ν
= f(X, ν)g(η)ν
exp{νηT
X}.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Conjugate priors	
 
Distributions of EF has factors of ,
so conjugate priors is	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
g(η) exp(ηT
u)
p(η|X, ν) = f(X, ν) g(η) exp{ηT
X}
ν
= f(X, ν)g(η)ν
exp{νηT
X}.
Correspond
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Conjugate priors	
 
Distributions of EF has factors of ,
so conjugate priors is	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
g(η) exp(ηT
u)
p(η|X, ν) = f(X, ν) g(η) exp{ηT
X}
ν
= f(X, ν)g(η)ν
exp{νηT
X}.
Normalizing constant
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Conjugate priors	
 
Distributions of EF has factors of ,
so conjugate priors is	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
g(η) exp(ηT
u)
p(η|X, ν) = f(X, ν) g(η) exp{ηT
X}
ν
= f(X, ν)g(η)ν
exp{νηT
X}.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Conjugate priors	
 
Distributions of EF has factors of ,
so conjugate priors is	

	

	

	

It will give posteriors as follows.	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
g(η) exp(ηT
u)
p(η|X, ν) = f(X, ν) g(η) exp{ηT
X}
ν
= f(X, ν)g(η)ν
exp{νηT
X}.
p(η|X, X, ν) ∝
N
n=1
h(xn)g(η) exp ηT
u(xn) × g(η)ν
exp{ηT
X}
∝ g(η)N+ν
exp ηT
N
n=1
u(xn) + νX
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Conjugate priors	
 
Distributions of EF has factors of ,
so conjugate priors is	

	

	

	

It will give posteriors as follows.	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
g(η) exp(ηT
u)
p(η|X, ν) = f(X, ν) g(η) exp{ηT
X}
ν
= f(X, ν)g(η)ν
exp{νηT
X}.
p(η|X, X, ν) ∝
N
n=1
h(xn)g(η) exp ηT
u(xn) × g(η)ν
exp{ηT
X}
∝ g(η)N+ν
exp ηT
N
n=1
u(xn) + νX
Correspond
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Uniform distributions	
 
Three candidates:	

	

1. Conjugate priors 	

... Easy to handle
2. Uniform distributions 	

... Principle of indifference
3. Noninformative priors	

... Make effects of priors little	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Uniform distributions	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
The uniform distribution is common choice for discrete
bounded variable.	

C.f.: Principle of insufficient reason (or Principle of indifference)
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Uniform distributions	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
The uniform distribution is common choice for discrete
bounded variable.	

C.f.: Principle of insufficient reason (or Principle of indifference)
	

But two problems arise when it is applied to continuous
variables:	

1.  The normalization problem	

2.  The transformation problem
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Uniform distributions	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
1. Normalization Problem	

If the parameter is unbounded	

	

	

These priors are called “improper”.	

	

∞
−∞
p(λ)dλ =
∞
−∞
const dλ → ∞
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Uniform distributions	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
1. Normalization Problem	

If the parameter is unbounded	

	

	

These priors are called “improper”.	

	

Note that these priors can give proper posteriors, 
because posteriors are proportional to likelihood, 
which can be normalized.	
 
∞
−∞
p(λ)dλ =
∞
−∞
const dλ → ∞
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Uniform distributions	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
2. Transformation problem	

Non-linear transformation gives non-constant priors.	

	

E.g.)





(Sometimes, the posteriors are not sensitive to the difference.)	
 
p(λ) = 1


η=
√
λ
p(η) = p(λ)
dλ
dη
= 2η
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Uniform distributions	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
2. Transformation problem	

Non-linear transformation gives non-constant priors.	

	

E.g.)





(Sometimes, the posteriors are not sensitive to the difference.)	
 
Not constant for
η
p(λ) = 1


η=
√
λ
p(η) = p(λ)
dλ
dη
= 2η
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Uniform distributions	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
2. Transformation problem	

Non-linear transformation gives non-constant priors.	

	

E.g.)





(Sometimes, the posteriors are not sensitive to the difference.)	
 
Not constant for
η
Think "constant for what?"
p(λ) = 1


η=
√
λ
p(η) = p(λ)
dλ
dη
= 2η
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Uniform distributions	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
Keep these problems in mind:	

1.  The normalization problem	

2.  The transformation problem
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
Three candidates:	

	

1. Conjugate priors 	

... Easy to handle
2. Uniform distributions 	

... Principle of indifference
3. Noninformative priors	

... Make effects of priors little	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
Two examples of noninformative priors:	

1. Priors for location parameters	

2. Priors for scale parameters	

These are constructed to make effects to posteriors
as little as possible, so that the inference would be
objective.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
1. Priors for location parameters	

	

If the density form is 	

	

 p(x|µ) = f(x − µ),
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
1. Priors for location parameters	

	

If the density form is 	

	

the constant shift gives same density:	

	

	

x = x + c
p(x|µ) = f(x − µ),
p(x|µ) = f(x − µ).
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
1. Priors for location parameters	

	

If the density form is 	

	

the constant shift gives same density:	

	

	

This property is “translation invariance” and 	

these parameter is “location parameter”.	

	

x = x + c
p(x|µ) = f(x − µ),
p(x|µ) = f(x − µ).
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
1. Priors for location parameters	

	

To reflect the translation invariance, priors should be	

A
B
p(µ)dµ =
A
B
p(µ − c)dµ for∀A, B.
⇐⇒ p(µ) = p(µ − c).
⇐⇒ p(µ) = constant.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
1. Priors for location parameters	

	

To reflect the translation invariance, priors should be	

A
B
p(µ)dµ =
A
B
p(µ − c)dµ for∀A, B.
⇐⇒ p(µ) = p(µ − c).
⇐⇒ p(µ) = constant.
We obtained uniform distributions after all.
But unlike before, we know when to use it.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
1. Priors for location parameters	

E.g.) The mean in Gaussian	

	

	

p(x|µ) =
1
(2πσ2)1/2
exp −
1
2σ2
(x − µ)2
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
1. Priors for location parameters	

E.g.) The mean in Gaussian	

	

	

p(x|µ) =
1
(2πσ2)1/2
exp −
1
2σ2
(x − µ)2
f(x − µ)This form is
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
1. Priors for location parameters	

E.g.) The mean in Gaussian	

	

	

This prior is also obtained as a limit of conjugates.	

p(x|µ) =
1
(2πσ2)1/2
exp −
1
2σ2
(x − µ)2
f(x − µ)This form is	
 
p(µ) = N(µ|µ0, σ2
0)
σ2
0 →∞
−−−−→const.,
µN =
σ2
Nσ2
0 + σ2
µ0 +
Nσ2
0
Nσ2
0 + σ2
µML →µML,
1
σ2
N
=
1
σ2
0
+
N
σ2
→
N
σ2
.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
2. Priors for scale parameters	

	

If the density form is 	

	

 p(x|σ) =
1
σ
f
x
σ
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
2. Priors for scale parameters	

	

If the density form is 	

	

the constant scale gives same density:	

	

	

p(x|σ) =
1
σ
f
x
σ
p(x|σ) =
1
σ
f
x
σ
x = cx
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
2. Priors for scale parameters	

	

If the density form is 	

	

the constant scale gives same density:	

	

	

This property is “scale invariance” and 	

these parameter is “scale parameter”.	

	

p(x|σ) =
1
σ
f
x
σ
p(x|σ) =
1
σ
f
x
σ
x = cx
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
2. Priors for scale parameters	

	

To reflect the scale invariance, priors should be	

A
B
p(σ)dσ =
A
B
p
1
c
σ
dσ
d(cσ)
dσ for∀A, B.
⇐⇒ p(σ) =
1
c
p
1
c
σ .
⇐⇒ p(σ) ∝
1
σ
.
⇐⇒ p(ln σ) = const.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
2. Priors for scale parameters	

E.g.) The deviation in Gaussian	

	

 p(x|σ) =
1
(2πσ2)1/2
exp −
1
2σ2
x2
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
2. Priors for scale parameters	

E.g.) The deviation in Gaussian	

	

This form is	
 1
σ f x
σ
p(x|σ) =
1
(2πσ2)1/2
exp −
1
2σ2
x2
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
2. Priors for scale parameters	

E.g.) The deviation in Gaussian	

	

	

This prior is also obtained as a limit of conjugates.	

This form is	
 1
σ f x
σ
p(x|σ) =
1
(2πσ2)1/2
exp −
1
2σ2
x2
p(λ) = Gam(λ|a0, b0)
a0,b0→∞
−−−−−−→
const
λ
,
aN = a0 +
N
2
→
N
2
,
bN = b0 +
N
2
σ2
ML →
N
2
σ2
ML,
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Priors for EF – Noninformative priors	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
Two examples of noninformative priors:	

1. Priors for location parameters	

	

2. Priors for scale parameters	

	

p(x|µ) = f(x − µ) =⇒ p(µ) = const.
p(x|σ) =
1
σ
f
x
σ
=⇒ p(σ) ∝
1
σ
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
We learned 	

“parametric approach”	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
We learned 	

“parametric approach”	

vs.	

We will learn 	

“nonparametric approach”	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
We learned 	

“parametric approach”	

vs.	

We will learn 	

“nonparametric approach”	
 
	

	

What is the difference?	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
Parametric	
  Nonparametric	
 
Assume a specific form
of the distribution	
 
Put few assumption about
the form of distribution	
 
Simple	
 
Complex 	

(depend on data size)	
 
Poor	
  Rich / Flexible	
 
Efficient	
  Inefficient
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
Parametric	
  Nonparametric	
 
Assume a specific form
of the distribution	
 
Put few assumption about
the form of distribution	
 
Simple	
 
Complex 	

(depend on data size)	
 
Poor	
  Rich / Flexible	
 
Efficient	
  Inefficient
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
We will learn:	

1. Histogram methods	

2. Kernel density estimators	

3. Nearest-neighbour methods	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
1. Histogram methods	

Split the space into grids (or bins), and count data points.	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
1. Histogram methods	

Split the space into grids (or bins), and count data points.	

	

	

where	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
p(x) = pi =
ni
N∆i
(x ∈ i-th bin),
∆i = Width of ith
bin (usually same for all i),
ni = # of observations which is assigned to ith
bin,
N = Total # of observations.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
1. Histogram methods	

Split the space into grids (or bins), and count data points.	

	

	

where	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
p(x) = pi =
ni
N∆i
(x ∈ i-th bin),
∆i = Width of ith
bin (usually same for all i),
ni = # of observations which is assigned to ith
bin,
N = Total # of observations.
This is piecewise constant, hence discontinuous.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
1. Histogram methods – Example	

is...	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
∆ = 0.04
0 0.5 1
0
5
∆ = 0.08
0 0.5 1
0
5
∆ = 0.25
0 0.5 1
0
5
∆
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
1. Histogram methods – Example	

is...	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
∆ = 0.04
0 0.5 1
0
5
∆ = 0.08
0 0.5 1
0
5
∆ = 0.25
0 0.5 1
0
5
Too narrow to catch enough points	

	

Too spiky (noisy)	

∆
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
1. Histogram methods – Example	

is...	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
∆ = 0.04
0 0.5 1
0
5
∆ = 0.08
0 0.5 1
0
5
∆ = 0.25
0 0.5 1
0
5
Too narrow to catch enough points	

	

Too spiky (noisy)	

# of bins = MD (curse of dimensionality)	
 
∆
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
1. Histogram methods – Example	

is...	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
∆ = 0.04
0 0.5 1
0
5
∆ = 0.08
0 0.5 1
0
5
∆ = 0.25
0 0.5 1
0
5
Too narrow to catch enough points	

	

Too spiky (noisy)	

Good intermediate value	

# of bins = MD (curse of dimensionality)	
 
∆
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
1. Histogram methods – Example	

is...	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
∆ = 0.04
0 0.5 1
0
5
∆ = 0.08
0 0.5 1
0
5
∆ = 0.25
0 0.5 1
0
5
Too narrow to catch enough points	

	

Too spiky (noisy)	

Good intermediate value	

Too wide to express the data	

	

Too smooth (less info)	

# of bins = MD (curse of dimensionality)	
 
∆
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
1. Histogram methods – Example	

is...	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
∆ = 0.04
0 0.5 1
0
5
∆ = 0.08
0 0.5 1
0
5
∆ = 0.25
0 0.5 1
0
5
Too narrow to catch enough points	

	

Too spiky (noisy)	

Good intermediate value	

Too wide to express the data	

	

Too smooth (less info)	

Find good value is very important!	
 
# of bins = MD (curse of dimensionality)	
 
∆
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
Lessons from histogram methods	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
Estimate density at a particular point
from data points of small local region.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
Lessons from histogram methods	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
Estimate density at a particular point
from data points of small local region.	

	

The regions are defined by “smoothing
parameter”, which control the
complexity in relation with data size.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
Lessons from histogram methods	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
Estimate density at a particular point
from data points of small local region.	

	

The regions are defined by “smoothing
parameter”, which control the
complexity in relation with data size.	



	
 
Other problems
•  Discontinuity
•  Not scalable (curse of dimensionality)
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
Lessons from histogram methods	

Let's consider a small local region , then	

	

	

	

	

where .	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
R
P = R
p(x)dx
Pr(K out of N data ∈ R) =
N!
K!(N − K)!
PK
(1 − P)N−K
,
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
Lessons from histogram methods	

Let's consider a small local region , then	

	

	

	

	

where .	

If	

1.  K is large enough (smoother not too small)	

2.  N is constant over (smoother small enough)	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
R
P = R
p(x)dx
Pr(K out of N data ∈ R) =
N!
K!(N − K)!
PK
(1 − P)N−K
,
R
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
Lessons from histogram methods	

Let's consider a small local region , then	

	

	

	

	

where .	

If	

1.  K is large enough (smoother not too small)	

2.  N is constant over (smoother small enough)	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
R
P = R
p(x)dx
Pr(K out of N data ∈ R) =
N!
K!(N − K)!
PK
(1 − P)N−K
,
R
Contradictory
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
Lessons from histogram methods	

Let's consider a small local region , then	

	

	

	

	

where .	

If	

1.  K is large enough (smoother not too small)	

2.  N is constant over (smoother small enough)	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
R
P = R
p(x)dx
Pr(K out of N data ∈ R) =
N!
K!(N − K)!
PK
(1 − P)N−K
,
R
Contradictory	
 Depend on data size
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
Lessons from histogram methods	

Let's consider a small local region , then	

	

	

	

	

where .	

If	

1.  K is large enough (smoother not too small)	

2.  N is constant over (smoother small enough)	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
R
P = R
p(x)dx
Pr(K out of N data ∈ R) =
N!
K!(N − K)!
PK
(1 − P)N−K
,
R
⇒ p(x) =
K
NV
.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Kernel density estimators	
 
Fix a region (e.g., hypercube centered on x, side is h) 	

and count data by kernel function k(u) (Parzen window).	

	

	

	

	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
k(u) =
1, |ui| 1/2, (i = 1, . . . D)
0, otherwise.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Kernel density estimators	
 
Fix a region (e.g., hypercube centered on x, side is h) 	

and count data by kernel function k(u) (Parzen window).	

	

	

	

	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
Centered on origin,
side is 1	
 
k(u) =
1, |ui| 1/2, (i = 1, . . . D)
0, otherwise.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Kernel density estimators	
 
Fix a region (e.g., hypercube centered on x, side is h) 	

and count data by kernel function k(u) (Parzen window).	

	

	

	

	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
k(u) =
1, |ui| 1/2, (i = 1, . . . D)
0, otherwise.
Discontinuous kernel
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Kernel density estimators	
 
Fix a region (e.g., hypercube centred on x, side is h) 	

and count data by kernel function k(u) (Parzen window).	

	

	

	

	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
K =
N
n=1
k
x − xn
h
,
V = hD
,
∴ p(x) =
1
N
N
n=1
1
hD
k
x − xn
h
.
k(u) =
1, |ui| 1/2, (i = 1, . . . D)
0, otherwise.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Kernel density estimators	
 
Symmetry of k(u) let us re-interpret the result.	

	

	

	

	

	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
N data points in the single	

cube centered on x
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Kernel density estimators	
 
Symmetry of k(u) let us re-interpret the result.	

	

	

	

	

	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
N data points in the single	

cube centered on x	
 
N cubes centered on xn
around x
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Kernel density estimators	
 
Other choice of k(u): Gaussian	

	

	

	

	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
k(u) =
1
(2π)D/2
exp −
u 2
2
.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Kernel density estimators	
 
Other choice of k(u): Gaussian	

	

	

	

	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
k(u) =
1
(2π)D/2
exp −
u 2
2
.
This kernel give continuous density.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Kernel density estimators	
 
Other choice of k(u): Gaussian	

	

	

	

	

You can use anything as long as it holds	

	

	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
k(u) 0,
k(u)du = 1.
k(u) =
1
(2π)D/2
exp −
u 2
2
.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Kernel density estimators	
 
Example	

	

Again, we can see that
smooth parameter h controls
the outcome of estimations.	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
h = 0.005
0 0.5 1
0
5
h = 0.07
0 0.5 1
0
5
h = 0.2
0 0.5 1
0
5
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nearest-neighbour methods	
 
Use a sphere as a region which centred on x and
contains K (fixed number) data points.	

	

	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nearest-neighbour methods	
 
Use a sphere as a region which centred on x and
contains K (fixed number) data points.	

	

	

	

where V(x) denotes the volume	

of the sphere.	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
p(x) =
K
NV (x)
,
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nearest-neighbour methods	
 
Note that this density can not be normalized. 	

From x* where faraway from all data points, the radius
of the sphere is inversely proportional to x, thus integral
diverge.	

	

	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
∞
−∞
dx
r(x)
∞
x∗
dx
r(x)
∞
x∗
dx
x − x†
→ ∞.
∴
RD
K
NV (x)
dx ∝
RD
dx
r(x)D
→ ∞.
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nearest-neighbour estimators	
 
Example	

	

Here again, smooth parameter	

K controls the outcome of
estimations.	

	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
K = 1
0 0.5 1
0
5
K = 5
0 0.5 1
0
5
K = 30
0 0.5 1
0
5
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nearest-neighbour estimators	
 
Example	

	

Here again, smooth parameter	

K controls the outcome of
estimations.	

	

Furthermore, we can observe
that in K=1 case.	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
K = 1
0 0.5 1
0
5
K = 5
0 0.5 1
0
5
K = 30
0 0.5 1
0
5
p(x) → ∞
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
Another problem of Kernels and NNs	

	

These methods need all observed data for estimation,
so both time and space complexity is O(N). It is very
inefficient.	

	

On that point, parametric methods are quite efficient
(c.f., sufficient statistics).	

Histograms are also efficient.	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
Histograms	

 Kernels	

 NNs	

K	

 Not fixed	

 Not fixed	

 Fixed	

V	

 Not fixed	

 Fixed	

 Not fixed	

Smoother	

 h	

 V	

Continuity	

 No	

 It depends	

 Yes*	

Dimensionality	

 Suffer	

 Scalable	

 Scalable	

Normalization	

 Proper	

 Proper	

 Improper	

Data set	

 Discard	

 Keep	

 Keep	

∆
* If K=1, not continuous
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
Histograms	

 Kernels	

 NNs	

K	

 Not fixed	

 Not fixed	

 Fixed	

V	

 Not fixed	

 Fixed	

 Not fixed	

Smoother	

 h	

 V	

Continuity	

 No	

 It depends	

 Yes*	

Dimensionality	

 Suffer	

 Scalable	

 Scalable	

Normalization	

 Proper	

 Proper	

 Improper	

Data set	

 Discard	

 Keep	

 Keep	

∆
* If K=1, not continuous
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
Histograms	

 Kernels	

 NNs	

K	

 Not fixed	

 Not fixed	

 Fixed	

V	

 Not fixed	

 Fixed	

 Not fixed	

Smoother	

 h	

 V	

Continuity	

 No	

 It depends	

 Yes*	

Dimensionality	

 Suffer	

 Scalable	

 Scalable	

Normalization	

 Proper	

 Proper	

 Improper	

Data set	

 Discard	

 Keep	

 Keep	

∆
* If K=1, not continuous
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nonparametric methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
Histograms	

 Kernels	

 NNs	

K	

 Not fixed	

 Not fixed	

 Fixed	

V	

 Not fixed	

 Fixed	

 Not fixed	

Smoother	

 h	

 V	

Continuity	

 No	

 It depends	

 Yes*	

Dimensionality	

 Suffer	

 Scalable	

 Scalable	

Normalization	

 Proper	

 Proper	

 Improper	

Data set	

 Discard	

 Keep	

 Keep	

∆
* If K=1, not continuous
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nearest-neighbour methods	
 
Use NNs as classifier	

To do this, use the sphere contains
K points irrespective to the class.	

	

	

	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nearest-neighbour methods	
 
Use NNs as classifier	

To do this, use the sphere contains
K points irrespective to the class.	

	

	

	

where Kk is # in class k and sphere. 	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
p(x|Ck) =
Kk
NkV
,
p(x) =
K
NV
,
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nearest-neighbour methods	
 
Use NNs as classifier	

To do this, use the sphere contains
K points irrespective to the class.	

	

	

	

where Kk is # in class k and sphere.
Class priors are , so 	

	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
p(x|Ck) =
Kk
NkV
,
p(x) =
K
NV
,
p(Ck|x) =
p(x|Ck)p(Ck)
p(x)
=
Kk
K
.
p(Ck) = Nk/N
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nearest-neighbour methods	
 
Use NNs as classifier	

	

Therefore, x will be classified to
the greatest majority among x's
K-nearest neighbours.	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nearest-neighbour methods	
 
Use NNs as classifier	

	

Therefore, x will be classified to
the greatest majority among x's
K-nearest neighbours.	

	

If K=1, it is called “nearest-
neighbour rule”.	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Nearest-neighbour methods	
 
Use NNs as classifier – Example	

	

	

	

	

	

	

Same as the discussion so far, here K acts as
smooth parameter.	

June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
x6
x7
K = 1
0 1 2
0
1
2
x6
x7
K = 3
0 1 2
0
1
2
x6
x7
K = 31
0 1 2
0
1
2
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA
NONPARAMETRIC METHODS	
 THE EXPONENTIAL FAMILY	
 
Today's topics	
 
1. The exponential family	

1.  What is exponential family?	

2.  Maximum likelihood for EF	

3.  How to decide priors for EF	

	

2. Nonparametric methods	

1.  What is the point of nonparametric methods ?	

2.  Kernel density estimator	

3.  Nearest-neighbour methods	
 
June 11, 2014
 PRML 2.4-2.5
 Shinichi TAMURA

More Related Content

What's hot

Continuous Random variable
Continuous Random variableContinuous Random variable
Continuous Random variableJay Patel
 
Gaussian quadratures
Gaussian quadraturesGaussian quadratures
Gaussian quadraturesTarun Gehlot
 
Tangent and normal
Tangent and normalTangent and normal
Tangent and normalRameshMakar
 
Definition of statistical efficiency
Definition of statistical efficiencyDefinition of statistical efficiency
Definition of statistical efficiencyRuhulAmin339
 
Discrete probability distribution (complete)
Discrete probability distribution (complete)Discrete probability distribution (complete)
Discrete probability distribution (complete)ISYousafzai
 
Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Adrian Olszewski
 
PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6
PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6
PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6Hiroyuki Kato
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Tsuyoshi Sakama
 
Different kind of distance and Statistical Distance
Different kind of distance and Statistical DistanceDifferent kind of distance and Statistical Distance
Different kind of distance and Statistical DistanceKhulna University
 
Bba 3274 qm week 3 probability distribution
Bba 3274 qm week 3 probability distributionBba 3274 qm week 3 probability distribution
Bba 3274 qm week 3 probability distributionStephen Ong
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression AnalysisSalim Azad
 
The Method Of Maximum Likelihood
The Method Of Maximum LikelihoodThe Method Of Maximum Likelihood
The Method Of Maximum LikelihoodMax Chipulu
 
Maximum likelihood estimation
Maximum likelihood estimationMaximum likelihood estimation
Maximum likelihood estimationzihad164
 

What's hot (20)

Continuous Random variable
Continuous Random variableContinuous Random variable
Continuous Random variable
 
Introduction of Partial Differential Equations
Introduction of Partial Differential EquationsIntroduction of Partial Differential Equations
Introduction of Partial Differential Equations
 
Gaussian quadratures
Gaussian quadraturesGaussian quadratures
Gaussian quadratures
 
Tangent and normal
Tangent and normalTangent and normal
Tangent and normal
 
Unit 5: All
Unit 5: AllUnit 5: All
Unit 5: All
 
Definition of statistical efficiency
Definition of statistical efficiencyDefinition of statistical efficiency
Definition of statistical efficiency
 
Poisson Distribution
Poisson DistributionPoisson Distribution
Poisson Distribution
 
Discrete probability distribution (complete)
Discrete probability distribution (complete)Discrete probability distribution (complete)
Discrete probability distribution (complete)
 
Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...
 
PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6
PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6
PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6
 
normal distribution
normal distributionnormal distribution
normal distribution
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章
 
Different kind of distance and Statistical Distance
Different kind of distance and Statistical DistanceDifferent kind of distance and Statistical Distance
Different kind of distance and Statistical Distance
 
Bba 3274 qm week 3 probability distribution
Bba 3274 qm week 3 probability distributionBba 3274 qm week 3 probability distribution
Bba 3274 qm week 3 probability distribution
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
PRML_titech 2.3.1 - 2.3.7
PRML_titech 2.3.1 - 2.3.7PRML_titech 2.3.1 - 2.3.7
PRML_titech 2.3.1 - 2.3.7
 
The Method Of Maximum Likelihood
The Method Of Maximum LikelihoodThe Method Of Maximum Likelihood
The Method Of Maximum Likelihood
 
Maximum likelihood estimation
Maximum likelihood estimationMaximum likelihood estimation
Maximum likelihood estimation
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 

Viewers also liked

ESL 17.3.2-17.4: Graphical Lasso and Boltzmann Machines
ESL 17.3.2-17.4: Graphical Lasso and Boltzmann MachinesESL 17.3.2-17.4: Graphical Lasso and Boltzmann Machines
ESL 17.3.2-17.4: Graphical Lasso and Boltzmann MachinesShinichi Tamura
 
MLaPP 2章 「確率」(前編)
MLaPP 2章 「確率」(前編)MLaPP 2章 「確率」(前編)
MLaPP 2章 「確率」(前編)Shinichi Tamura
 
NIPS 2016 輪読: Supervised Word Movers Distance
NIPS 2016 輪読: Supervised Word Movers DistanceNIPS 2016 輪読: Supervised Word Movers Distance
NIPS 2016 輪読: Supervised Word Movers DistanceShinichi Tamura
 
PRML 9.1-9.2: K-means Clustering & Mixtures of Gaussians
PRML 9.1-9.2: K-means Clustering & Mixtures of GaussiansPRML 9.1-9.2: K-means Clustering & Mixtures of Gaussians
PRML 9.1-9.2: K-means Clustering & Mixtures of GaussiansShinichi Tamura
 
PRML 13.2.2: The Forward-Backward Algorithm
PRML 13.2.2: The Forward-Backward AlgorithmPRML 13.2.2: The Forward-Backward Algorithm
PRML 13.2.2: The Forward-Backward AlgorithmShinichi Tamura
 
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating HyperplaneESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating HyperplaneShinichi Tamura
 
如何用十分鐘快速瞭解一個程式語言 《以JavaScript和C語言為例》
如何用十分鐘快速瞭解一個程式語言  《以JavaScript和C語言為例》如何用十分鐘快速瞭解一個程式語言  《以JavaScript和C語言為例》
如何用十分鐘快速瞭解一個程式語言 《以JavaScript和C語言為例》鍾誠 陳鍾誠
 

Viewers also liked (7)

ESL 17.3.2-17.4: Graphical Lasso and Boltzmann Machines
ESL 17.3.2-17.4: Graphical Lasso and Boltzmann MachinesESL 17.3.2-17.4: Graphical Lasso and Boltzmann Machines
ESL 17.3.2-17.4: Graphical Lasso and Boltzmann Machines
 
MLaPP 2章 「確率」(前編)
MLaPP 2章 「確率」(前編)MLaPP 2章 「確率」(前編)
MLaPP 2章 「確率」(前編)
 
NIPS 2016 輪読: Supervised Word Movers Distance
NIPS 2016 輪読: Supervised Word Movers DistanceNIPS 2016 輪読: Supervised Word Movers Distance
NIPS 2016 輪読: Supervised Word Movers Distance
 
PRML 9.1-9.2: K-means Clustering & Mixtures of Gaussians
PRML 9.1-9.2: K-means Clustering & Mixtures of GaussiansPRML 9.1-9.2: K-means Clustering & Mixtures of Gaussians
PRML 9.1-9.2: K-means Clustering & Mixtures of Gaussians
 
PRML 13.2.2: The Forward-Backward Algorithm
PRML 13.2.2: The Forward-Backward AlgorithmPRML 13.2.2: The Forward-Backward Algorithm
PRML 13.2.2: The Forward-Backward Algorithm
 
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating HyperplaneESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
ESL 4.4.3-4.5: Logistic Reression (contd.) and Separating Hyperplane
 
如何用十分鐘快速瞭解一個程式語言 《以JavaScript和C語言為例》
如何用十分鐘快速瞭解一個程式語言  《以JavaScript和C語言為例》如何用十分鐘快速瞭解一個程式語言  《以JavaScript和C語言為例》
如何用十分鐘快速瞭解一個程式語言 《以JavaScript和C語言為例》
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

PRML 2.4-2.5 Exponential Family & Nonparametric Methods

  • 1. PRML 2.4-2.5 The exponential family & Nonparametric methods June 11, 2014 by Shinichi TAMURA
  • 2. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 3. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 4. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 5. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family Almost all of the distributions we studied so far belong to a single class, namely the exponential family. June 11, 2014 PRML 2.4-2.5 The exponential family Shinichi TAMURA
  • 6. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family Almost all of the distributions we studied so far belong to a single class, namely the exponential family. June 11, 2014 PRML 2.4-2.5 Bernoulli, The exponential family Shinichi TAMURA
  • 7. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family Almost all of the distributions we studied so far belong to a single class, namely the exponential family. June 11, 2014 PRML 2.4-2.5 Bernoulli, multinomial, The exponential family Shinichi TAMURA
  • 8. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family Almost all of the distributions we studied so far belong to a single class, namely the exponential family. June 11, 2014 PRML 2.4-2.5 Bernoulli, multinomial, Gaussian, The exponential family Shinichi TAMURA
  • 9. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family Almost all of the distributions we studied so far belong to a single class, namely the exponential family. June 11, 2014 PRML 2.4-2.5 Bernoulli, multinomial, Gaussian, beta, The exponential family Shinichi TAMURA
  • 10. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family Almost all of the distributions we studied so far belong to a single class, namely the exponential family. June 11, 2014 PRML 2.4-2.5 Bernoulli, multinomial, Gaussian, beta, gamma, The exponential family Shinichi TAMURA
  • 11. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family Almost all of the distributions we studied so far belong to a single class, namely the exponential family. June 11, 2014 PRML 2.4-2.5 Bernoulli, multinomial, Gaussian, beta, gamma, von Mises...etc. The exponential family Shinichi TAMURA
  • 12. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family Almost all of the distributions we studied so far belong to a single class, namely the exponential family. June 11, 2014 PRML 2.4-2.5 Parametric distributions Bernoulli, multinomial, Gaussian, beta, gamma, von Mises...etc. The exponential family Gaussian mixture...etc. Shinichi TAMURA
  • 13. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY p(x|η) = h(x)g(η) exp ηT u(x) The Exponential Family The exponential family over x given is a class of distributions which form is η June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 14. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY p(x|η) = h(x)g(η) exp ηT u(x) The Exponential Family The exponential family over x given is a class of distributions which form is η Natural parameter June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 15. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY p(x|η) = h(x)g(η) exp ηT u(x) The Exponential Family The exponential family over x given is a class of distributions which form is η Natural parameter Where and come across x η June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 16. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY p(x|η) = h(x)g(η) exp ηT u(x) The Exponential Family The exponential family over x given is a class of distributions which form is η Natural parameter Normalizing constant Where and come across x η June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 17. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family E.g. 1) The Bernoulli Distribution p(x|η) = µx (1 − µ)1−x = σ(−η) exp(ηx) June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 18. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family E.g. 1) The Bernoulli Distribution where η = ln µ 1 − µ p(x|η) = µx (1 − µ)1−x = σ(−η) exp(ηx) u(x) h(x) = 1 g(η) June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 19. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family E.g. 2) The Multinomial Distribution p(x|η) = µxk k = exp(ηT x) June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 20. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family E.g. 2) The Multinomial Distribution where η = (ln µ1, . . . , ln µM )T ⇒ exp(ηk) = µk = 1 p(x|η) = µxk k = exp(ηT x) u(x) h(x) = 1 g(η) = 1 June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 21. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family E.g. 2) The Multinomial Distribution where η = (ln µ1, . . . , ln µM )T ⇒ exp(ηk) = µk = 1 p(x|η) = µxk k = exp(ηT x) It's inconvenient! u(x) h(x) = 1 g(η) = 1 June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 22. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family E.g. 2) The Multinomial Distribution Remove the constraint by µM = 1 − M−1 k=1 µk, xM = 1 − M−1 k=1 xk p(x|µ) = exp M−1 k=1 xk ln µk + 1 − M−1 k=1 xk ln 1 − M−1 k=1 µk = exp M−1 k=1 xk ln µk 1 − M−1 k=1 µk + ln 1 − M−1 k=1 µk = 1 − M−1 k=1 µk exp M−1 k=1 xk ln µk 1 − M−1 k=1 µk . June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 23. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family E.g. 2) The Multinomial Distribution Remove the constraint by µM = 1 − M−1 k=1 µk, xM = 1 − M−1 k=1 xk p(x|µ) = exp M−1 k=1 xk ln µk + 1 − M−1 k=1 xk ln 1 − M−1 k=1 µk = exp M−1 k=1 xk ln µk 1 − M−1 k=1 µk + ln 1 − M−1 k=1 µk = 1 − M−1 k=1 µk exp M−1 k=1 xk ln µk 1 − M−1 k=1 µk . June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 24. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family E.g. 2) The Multinomial Distribution Remove the constraint by Therefore... µM = 1 − M−1 k=1 µk, xM = 1 − M−1 k=1 xk p(x|µ) = exp M−1 k=1 xk ln µk + 1 − M−1 k=1 xk ln 1 − M−1 k=1 µk = exp M−1 k=1 xk ln µk 1 − M−1 k=1 µk + ln 1 − M−1 k=1 µk = 1 − M−1 k=1 µk exp M−1 k=1 xk ln µk 1 − M−1 k=1 µk . June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 25. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family E.g. 2') The Multinomial Distribution w/o constraint where p(x|η) = µxk k = 1 + M−1 k=1 exp(ηk) −1 exp(ηT x) η = ln µ1 1− P j µj , . . . , ln µM−1 1− P j µj , 0 T u(x) h(x) = 1 g(η) June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 26. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family E.g. 3) The Gaussian Distribution p(x|η) = 1 (2πσ2)1/2 exp − 1 2σ2 (x − µ)2 = (2π)−1/2 (−2η2)1/2 exp η2 1 4η2 exp η1 η2 x x2 June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 27. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY The Exponential Family E.g. 3) The Gaussian Distribution where u(x) h(x) = 1 g(η) p(x|η) = 1 (2πσ2)1/2 exp − 1 2σ2 (x − µ)2 = (2π)−1/2 (−2η2)1/2 exp η2 1 4η2 exp η1 η2 x x2 η = µ σ2 , − 1 2σ2 T June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 28. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 29. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 30. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Maximum likelihood for EF OK, we know what EF looks like. Then, how to estimate the parameter? Maximize likelihood! Frequentist way. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 31. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Maximum likelihood for EF Suppose we have i.i.d. data , The log-likelihood of is June 11, 2014 PRML 2.4-2.5 η X = {x1, . . . , xN } Shinichi TAMURA ln p(X|η) = ln N n=1 p(xn|η) = ln N n=1 h(xn)g(η) exp ηT u(xn) = N n=1 ln h(xn) + N ln g(η) + ηT N n=1 u(xn). ∴ η ln p(X|η) = N η ln g(η) + N n=1 u(xn). −→ 0
  • 32. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Maximum likelihood for EF Suppose we have i.i.d. data , The log-likelihood of is June 11, 2014 PRML 2.4-2.5 η X = {x1, . . . , xN } Shinichi TAMURA ln p(X|η) = ln N n=1 p(xn|η) = ln N n=1 h(xn)g(η) exp ηT u(xn) = N n=1 ln h(xn) + N ln g(η) + ηT N n=1 u(xn). ∴ η ln p(X|η) = N η ln g(η) + N n=1 u(xn). −→ 0
  • 33. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Maximum likelihood for EF Suppose we have i.i.d. data , The log-likelihood of is June 11, 2014 PRML 2.4-2.5 η X = {x1, . . . , xN } Shinichi TAMURA ln p(X|η) = ln N n=1 p(xn|η) = ln N n=1 h(xn)g(η) exp ηT u(xn) = N n=1 ln h(xn) + N ln g(η) + ηT N n=1 u(xn). ∴ η ln p(X|η) = N η ln g(η) + N n=1 u(xn). −→ 0 By putting this to zero
  • 34. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Maximum likelihood for EF Therefore Here, is determined only through , so it is called “sufficient statistics”. We need to store only for estimation. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA − η ln g(ηML) = 1 N N n=1 u(xn). ηML n u(xn) n u(xn)
  • 35. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Maximum likelihood for EF E.g.) Gaussian distribution By and , That's what we already know. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA g(η) = (−2η2)1/2 exp η2 1/4η2 u(x) = (x, x2 )T − ln g(η) = − η1 2η2 − 1 2η2 + η2 1 4η2 2 = µ σ2 + µ2 . ∴ µML = 1 N n xn, σ2 ML = 1 N n x2 n − 1 N n xn 2 .
  • 36. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Maximum likelihood for EF By the way, we want to know the relation between and . June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA ηηML
  • 37. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Maximum likelihood for EF Gradient of by gives June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA η h(x)g(η) exp ηT u(x) dx = 1 g(η) h(x) exp ηT u(x) dx + h(x)g(η) exp ηT u(x) u(x)dx = 0. ⇔ − ln g(η) = E [u(x)] .
  • 38. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Maximum likelihood for EF Gradient of by gives June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA η h(x)g(η) exp ηT u(x) dx = 1 g(η) h(x) exp ηT u(x) dx + h(x)g(η) exp ηT u(x) u(x)dx = 0. ⇔ − ln g(η) = E [u(x)] . Similar to − η ln g(ηML) = 1 N N n=1 u(xn)
  • 39. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Maximum likelihood for EF According to LLN, sample mean will converge to the expectation, so will converge to . June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA ηηML − η ln g(ηML) = 1 N N n=1 u(xn) − ln g(η) = E [u(x)]
  • 40. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Maximum likelihood for EF According to LLN, sample mean will converge to the expectation, so will converge to . June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA ηηML − η ln g(ηML) = 1 N N n=1 u(xn) − ln g(η) = E [u(x)] Converge
  • 41. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Maximum likelihood for EF According to LLN, sample mean will converge to the expectation, so will converge to . June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA ηηML − η ln g(ηML) = 1 N N n=1 u(xn) − ln g(η) = E [u(x)] Converge Converge
  • 42. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 43. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 44. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF If you want to use the Bayesian inference, a prior distribution is needed. Then, how to decide it, if we don't know anything about the parameter? June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 45. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF Three candidates: 1. Conjugate priors 2. Uniform distributions 3. Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 46. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF Three candidates: 1. Conjugate priors ... Easy to handle 2. Uniform distributions 3. Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 47. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF Three candidates: 1. Conjugate priors ... Easy to handle 2. Uniform distributions ... Principle of indifference 3. Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 48. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF Three candidates: 1. Conjugate priors ... Easy to handle 2. Uniform distributions ... Principle of indifference 3. Noninformative priors ... Make effects of priors little June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 49. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Conjugate priors Three candidates: 1. Conjugate priors ... Easy to handle 2. Uniform distributions ... Principle of indifference 3. Noninformative priors ... Make effects of priors little June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 50. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Conjugate priors Distributions of EF has factors of , so conjugate priors is June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA g(η) exp(ηT u) p(η|X, ν) = f(X, ν) g(η) exp{ηT X} ν = f(X, ν)g(η)ν exp{νηT X}.
  • 51. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Conjugate priors Distributions of EF has factors of , so conjugate priors is June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA g(η) exp(ηT u) p(η|X, ν) = f(X, ν) g(η) exp{ηT X} ν = f(X, ν)g(η)ν exp{νηT X}. Correspond
  • 52. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Conjugate priors Distributions of EF has factors of , so conjugate priors is June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA g(η) exp(ηT u) p(η|X, ν) = f(X, ν) g(η) exp{ηT X} ν = f(X, ν)g(η)ν exp{νηT X}. Normalizing constant
  • 53. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Conjugate priors Distributions of EF has factors of , so conjugate priors is June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA g(η) exp(ηT u) p(η|X, ν) = f(X, ν) g(η) exp{ηT X} ν = f(X, ν)g(η)ν exp{νηT X}.
  • 54. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Conjugate priors Distributions of EF has factors of , so conjugate priors is It will give posteriors as follows. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA g(η) exp(ηT u) p(η|X, ν) = f(X, ν) g(η) exp{ηT X} ν = f(X, ν)g(η)ν exp{νηT X}. p(η|X, X, ν) ∝ N n=1 h(xn)g(η) exp ηT u(xn) × g(η)ν exp{ηT X} ∝ g(η)N+ν exp ηT N n=1 u(xn) + νX
  • 55. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Conjugate priors Distributions of EF has factors of , so conjugate priors is It will give posteriors as follows. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA g(η) exp(ηT u) p(η|X, ν) = f(X, ν) g(η) exp{ηT X} ν = f(X, ν)g(η)ν exp{νηT X}. p(η|X, X, ν) ∝ N n=1 h(xn)g(η) exp ηT u(xn) × g(η)ν exp{ηT X} ∝ g(η)N+ν exp ηT N n=1 u(xn) + νX Correspond
  • 56. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Uniform distributions Three candidates: 1. Conjugate priors ... Easy to handle 2. Uniform distributions ... Principle of indifference 3. Noninformative priors ... Make effects of priors little June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 57. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Uniform distributions June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA The uniform distribution is common choice for discrete bounded variable. C.f.: Principle of insufficient reason (or Principle of indifference)
  • 58. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Uniform distributions June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA The uniform distribution is common choice for discrete bounded variable. C.f.: Principle of insufficient reason (or Principle of indifference) But two problems arise when it is applied to continuous variables: 1.  The normalization problem 2.  The transformation problem
  • 59. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Uniform distributions June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 1. Normalization Problem If the parameter is unbounded These priors are called “improper”. ∞ −∞ p(λ)dλ = ∞ −∞ const dλ → ∞
  • 60. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Uniform distributions June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 1. Normalization Problem If the parameter is unbounded These priors are called “improper”. Note that these priors can give proper posteriors, because posteriors are proportional to likelihood, which can be normalized. ∞ −∞ p(λ)dλ = ∞ −∞ const dλ → ∞
  • 61. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Uniform distributions June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 2. Transformation problem Non-linear transformation gives non-constant priors. E.g.) (Sometimes, the posteriors are not sensitive to the difference.) p(λ) = 1   η= √ λ p(η) = p(λ) dλ dη = 2η
  • 62. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Uniform distributions June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 2. Transformation problem Non-linear transformation gives non-constant priors. E.g.) (Sometimes, the posteriors are not sensitive to the difference.) Not constant for η p(λ) = 1   η= √ λ p(η) = p(λ) dλ dη = 2η
  • 63. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Uniform distributions June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 2. Transformation problem Non-linear transformation gives non-constant priors. E.g.) (Sometimes, the posteriors are not sensitive to the difference.) Not constant for η Think "constant for what?" p(λ) = 1   η= √ λ p(η) = p(λ) dλ dη = 2η
  • 64. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Uniform distributions June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA Keep these problems in mind: 1.  The normalization problem 2.  The transformation problem
  • 65. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors Three candidates: 1. Conjugate priors ... Easy to handle 2. Uniform distributions ... Principle of indifference 3. Noninformative priors ... Make effects of priors little June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 66. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA Two examples of noninformative priors: 1. Priors for location parameters 2. Priors for scale parameters These are constructed to make effects to posteriors as little as possible, so that the inference would be objective.
  • 67. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 1. Priors for location parameters If the density form is p(x|µ) = f(x − µ),
  • 68. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 1. Priors for location parameters If the density form is the constant shift gives same density: x = x + c p(x|µ) = f(x − µ), p(x|µ) = f(x − µ).
  • 69. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 1. Priors for location parameters If the density form is the constant shift gives same density: This property is “translation invariance” and these parameter is “location parameter”. x = x + c p(x|µ) = f(x − µ), p(x|µ) = f(x − µ).
  • 70. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 1. Priors for location parameters To reflect the translation invariance, priors should be A B p(µ)dµ = A B p(µ − c)dµ for∀A, B. ⇐⇒ p(µ) = p(µ − c). ⇐⇒ p(µ) = constant.
  • 71. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 1. Priors for location parameters To reflect the translation invariance, priors should be A B p(µ)dµ = A B p(µ − c)dµ for∀A, B. ⇐⇒ p(µ) = p(µ − c). ⇐⇒ p(µ) = constant. We obtained uniform distributions after all. But unlike before, we know when to use it.
  • 72. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 1. Priors for location parameters E.g.) The mean in Gaussian p(x|µ) = 1 (2πσ2)1/2 exp − 1 2σ2 (x − µ)2
  • 73. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 1. Priors for location parameters E.g.) The mean in Gaussian p(x|µ) = 1 (2πσ2)1/2 exp − 1 2σ2 (x − µ)2 f(x − µ)This form is
  • 74. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 1. Priors for location parameters E.g.) The mean in Gaussian This prior is also obtained as a limit of conjugates. p(x|µ) = 1 (2πσ2)1/2 exp − 1 2σ2 (x − µ)2 f(x − µ)This form is p(µ) = N(µ|µ0, σ2 0) σ2 0 →∞ −−−−→const., µN = σ2 Nσ2 0 + σ2 µ0 + Nσ2 0 Nσ2 0 + σ2 µML →µML, 1 σ2 N = 1 σ2 0 + N σ2 → N σ2 .
  • 75. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 2. Priors for scale parameters If the density form is p(x|σ) = 1 σ f x σ
  • 76. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 2. Priors for scale parameters If the density form is the constant scale gives same density: p(x|σ) = 1 σ f x σ p(x|σ) = 1 σ f x σ x = cx
  • 77. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 2. Priors for scale parameters If the density form is the constant scale gives same density: This property is “scale invariance” and these parameter is “scale parameter”. p(x|σ) = 1 σ f x σ p(x|σ) = 1 σ f x σ x = cx
  • 78. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 2. Priors for scale parameters To reflect the scale invariance, priors should be A B p(σ)dσ = A B p 1 c σ dσ d(cσ) dσ for∀A, B. ⇐⇒ p(σ) = 1 c p 1 c σ . ⇐⇒ p(σ) ∝ 1 σ . ⇐⇒ p(ln σ) = const.
  • 79. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 2. Priors for scale parameters E.g.) The deviation in Gaussian p(x|σ) = 1 (2πσ2)1/2 exp − 1 2σ2 x2
  • 80. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 2. Priors for scale parameters E.g.) The deviation in Gaussian This form is 1 σ f x σ p(x|σ) = 1 (2πσ2)1/2 exp − 1 2σ2 x2
  • 81. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA 2. Priors for scale parameters E.g.) The deviation in Gaussian This prior is also obtained as a limit of conjugates. This form is 1 σ f x σ p(x|σ) = 1 (2πσ2)1/2 exp − 1 2σ2 x2 p(λ) = Gam(λ|a0, b0) a0,b0→∞ −−−−−−→ const λ , aN = a0 + N 2 → N 2 , bN = b0 + N 2 σ2 ML → N 2 σ2 ML,
  • 82. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Priors for EF – Noninformative priors June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA Two examples of noninformative priors: 1. Priors for location parameters 2. Priors for scale parameters p(x|µ) = f(x − µ) =⇒ p(µ) = const. p(x|σ) = 1 σ f x σ =⇒ p(σ) ∝ 1 σ
  • 83. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 84. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 85. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 86. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods We learned “parametric approach” June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 87. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods We learned “parametric approach” vs. We will learn “nonparametric approach” June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 88. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods We learned “parametric approach” vs. We will learn “nonparametric approach” What is the difference? June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 89. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA Parametric Nonparametric Assume a specific form of the distribution Put few assumption about the form of distribution Simple Complex (depend on data size) Poor Rich / Flexible Efficient Inefficient
  • 90. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA Parametric Nonparametric Assume a specific form of the distribution Put few assumption about the form of distribution Simple Complex (depend on data size) Poor Rich / Flexible Efficient Inefficient
  • 91. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 92. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods We will learn: 1. Histogram methods 2. Kernel density estimators 3. Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 93. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods 1. Histogram methods Split the space into grids (or bins), and count data points. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 94. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods 1. Histogram methods Split the space into grids (or bins), and count data points. where June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA p(x) = pi = ni N∆i (x ∈ i-th bin), ∆i = Width of ith bin (usually same for all i), ni = # of observations which is assigned to ith bin, N = Total # of observations.
  • 95. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods 1. Histogram methods Split the space into grids (or bins), and count data points. where June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA p(x) = pi = ni N∆i (x ∈ i-th bin), ∆i = Width of ith bin (usually same for all i), ni = # of observations which is assigned to ith bin, N = Total # of observations. This is piecewise constant, hence discontinuous.
  • 96. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods 1. Histogram methods – Example is... June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA ∆ = 0.04 0 0.5 1 0 5 ∆ = 0.08 0 0.5 1 0 5 ∆ = 0.25 0 0.5 1 0 5 ∆
  • 97. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods 1. Histogram methods – Example is... June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA ∆ = 0.04 0 0.5 1 0 5 ∆ = 0.08 0 0.5 1 0 5 ∆ = 0.25 0 0.5 1 0 5 Too narrow to catch enough points Too spiky (noisy) ∆
  • 98. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods 1. Histogram methods – Example is... June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA ∆ = 0.04 0 0.5 1 0 5 ∆ = 0.08 0 0.5 1 0 5 ∆ = 0.25 0 0.5 1 0 5 Too narrow to catch enough points Too spiky (noisy) # of bins = MD (curse of dimensionality) ∆
  • 99. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods 1. Histogram methods – Example is... June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA ∆ = 0.04 0 0.5 1 0 5 ∆ = 0.08 0 0.5 1 0 5 ∆ = 0.25 0 0.5 1 0 5 Too narrow to catch enough points Too spiky (noisy) Good intermediate value # of bins = MD (curse of dimensionality) ∆
  • 100. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods 1. Histogram methods – Example is... June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA ∆ = 0.04 0 0.5 1 0 5 ∆ = 0.08 0 0.5 1 0 5 ∆ = 0.25 0 0.5 1 0 5 Too narrow to catch enough points Too spiky (noisy) Good intermediate value Too wide to express the data Too smooth (less info) # of bins = MD (curse of dimensionality) ∆
  • 101. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods 1. Histogram methods – Example is... June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA ∆ = 0.04 0 0.5 1 0 5 ∆ = 0.08 0 0.5 1 0 5 ∆ = 0.25 0 0.5 1 0 5 Too narrow to catch enough points Too spiky (noisy) Good intermediate value Too wide to express the data Too smooth (less info) Find good value is very important! # of bins = MD (curse of dimensionality) ∆
  • 102. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods Lessons from histogram methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA Estimate density at a particular point from data points of small local region.
  • 103. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods Lessons from histogram methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA Estimate density at a particular point from data points of small local region. The regions are defined by “smoothing parameter”, which control the complexity in relation with data size.
  • 104. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods Lessons from histogram methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA Estimate density at a particular point from data points of small local region. The regions are defined by “smoothing parameter”, which control the complexity in relation with data size. Other problems •  Discontinuity •  Not scalable (curse of dimensionality)
  • 105. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods Lessons from histogram methods Let's consider a small local region , then where . June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA R P = R p(x)dx Pr(K out of N data ∈ R) = N! K!(N − K)! PK (1 − P)N−K ,
  • 106. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods Lessons from histogram methods Let's consider a small local region , then where . If 1.  K is large enough (smoother not too small) 2.  N is constant over (smoother small enough) June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA R P = R p(x)dx Pr(K out of N data ∈ R) = N! K!(N − K)! PK (1 − P)N−K , R
  • 107. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods Lessons from histogram methods Let's consider a small local region , then where . If 1.  K is large enough (smoother not too small) 2.  N is constant over (smoother small enough) June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA R P = R p(x)dx Pr(K out of N data ∈ R) = N! K!(N − K)! PK (1 − P)N−K , R Contradictory
  • 108. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods Lessons from histogram methods Let's consider a small local region , then where . If 1.  K is large enough (smoother not too small) 2.  N is constant over (smoother small enough) June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA R P = R p(x)dx Pr(K out of N data ∈ R) = N! K!(N − K)! PK (1 − P)N−K , R Contradictory Depend on data size
  • 109. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods Lessons from histogram methods Let's consider a small local region , then where . If 1.  K is large enough (smoother not too small) 2.  N is constant over (smoother small enough) June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA R P = R p(x)dx Pr(K out of N data ∈ R) = N! K!(N − K)! PK (1 − P)N−K , R ⇒ p(x) = K NV .
  • 110. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 111. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 112. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Kernel density estimators Fix a region (e.g., hypercube centered on x, side is h) and count data by kernel function k(u) (Parzen window). June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA k(u) = 1, |ui| 1/2, (i = 1, . . . D) 0, otherwise.
  • 113. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Kernel density estimators Fix a region (e.g., hypercube centered on x, side is h) and count data by kernel function k(u) (Parzen window). June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA Centered on origin, side is 1 k(u) = 1, |ui| 1/2, (i = 1, . . . D) 0, otherwise.
  • 114. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Kernel density estimators Fix a region (e.g., hypercube centered on x, side is h) and count data by kernel function k(u) (Parzen window). June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA k(u) = 1, |ui| 1/2, (i = 1, . . . D) 0, otherwise. Discontinuous kernel
  • 115. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Kernel density estimators Fix a region (e.g., hypercube centred on x, side is h) and count data by kernel function k(u) (Parzen window). June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA K = N n=1 k x − xn h , V = hD , ∴ p(x) = 1 N N n=1 1 hD k x − xn h . k(u) = 1, |ui| 1/2, (i = 1, . . . D) 0, otherwise.
  • 116. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Kernel density estimators Symmetry of k(u) let us re-interpret the result. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA N data points in the single cube centered on x
  • 117. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Kernel density estimators Symmetry of k(u) let us re-interpret the result. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA N data points in the single cube centered on x N cubes centered on xn around x
  • 118. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Kernel density estimators Other choice of k(u): Gaussian June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA k(u) = 1 (2π)D/2 exp − u 2 2 .
  • 119. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Kernel density estimators Other choice of k(u): Gaussian June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA k(u) = 1 (2π)D/2 exp − u 2 2 . This kernel give continuous density.
  • 120. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Kernel density estimators Other choice of k(u): Gaussian You can use anything as long as it holds June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA k(u) 0, k(u)du = 1. k(u) = 1 (2π)D/2 exp − u 2 2 .
  • 121. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Kernel density estimators Example Again, we can see that smooth parameter h controls the outcome of estimations. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA h = 0.005 0 0.5 1 0 5 h = 0.07 0 0.5 1 0 5 h = 0.2 0 0.5 1 0 5
  • 122. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 123. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 124. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nearest-neighbour methods Use a sphere as a region which centred on x and contains K (fixed number) data points. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 125. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nearest-neighbour methods Use a sphere as a region which centred on x and contains K (fixed number) data points. where V(x) denotes the volume of the sphere. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA p(x) = K NV (x) ,
  • 126. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nearest-neighbour methods Note that this density can not be normalized. From x* where faraway from all data points, the radius of the sphere is inversely proportional to x, thus integral diverge. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA ∞ −∞ dx r(x) ∞ x∗ dx r(x) ∞ x∗ dx x − x† → ∞. ∴ RD K NV (x) dx ∝ RD dx r(x)D → ∞.
  • 127. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nearest-neighbour estimators Example Here again, smooth parameter K controls the outcome of estimations. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA K = 1 0 0.5 1 0 5 K = 5 0 0.5 1 0 5 K = 30 0 0.5 1 0 5
  • 128. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nearest-neighbour estimators Example Here again, smooth parameter K controls the outcome of estimations. Furthermore, we can observe that in K=1 case. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA K = 1 0 0.5 1 0 5 K = 5 0 0.5 1 0 5 K = 30 0 0.5 1 0 5 p(x) → ∞
  • 129. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods Another problem of Kernels and NNs These methods need all observed data for estimation, so both time and space complexity is O(N). It is very inefficient. On that point, parametric methods are quite efficient (c.f., sufficient statistics). Histograms are also efficient. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 130. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA Histograms Kernels NNs K Not fixed Not fixed Fixed V Not fixed Fixed Not fixed Smoother h V Continuity No It depends Yes* Dimensionality Suffer Scalable Scalable Normalization Proper Proper Improper Data set Discard Keep Keep ∆ * If K=1, not continuous
  • 131. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA Histograms Kernels NNs K Not fixed Not fixed Fixed V Not fixed Fixed Not fixed Smoother h V Continuity No It depends Yes* Dimensionality Suffer Scalable Scalable Normalization Proper Proper Improper Data set Discard Keep Keep ∆ * If K=1, not continuous
  • 132. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA Histograms Kernels NNs K Not fixed Not fixed Fixed V Not fixed Fixed Not fixed Smoother h V Continuity No It depends Yes* Dimensionality Suffer Scalable Scalable Normalization Proper Proper Improper Data set Discard Keep Keep ∆ * If K=1, not continuous
  • 133. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nonparametric methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA Histograms Kernels NNs K Not fixed Not fixed Fixed V Not fixed Fixed Not fixed Smoother h V Continuity No It depends Yes* Dimensionality Suffer Scalable Scalable Normalization Proper Proper Improper Data set Discard Keep Keep ∆ * If K=1, not continuous
  • 134. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nearest-neighbour methods Use NNs as classifier To do this, use the sphere contains K points irrespective to the class. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 135. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nearest-neighbour methods Use NNs as classifier To do this, use the sphere contains K points irrespective to the class. where Kk is # in class k and sphere. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA p(x|Ck) = Kk NkV , p(x) = K NV ,
  • 136. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nearest-neighbour methods Use NNs as classifier To do this, use the sphere contains K points irrespective to the class. where Kk is # in class k and sphere. Class priors are , so June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA p(x|Ck) = Kk NkV , p(x) = K NV , p(Ck|x) = p(x|Ck)p(Ck) p(x) = Kk K . p(Ck) = Nk/N
  • 137. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nearest-neighbour methods Use NNs as classifier Therefore, x will be classified to the greatest majority among x's K-nearest neighbours. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 138. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nearest-neighbour methods Use NNs as classifier Therefore, x will be classified to the greatest majority among x's K-nearest neighbours. If K=1, it is called “nearest- neighbour rule”. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 139. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Nearest-neighbour methods Use NNs as classifier – Example Same as the discussion so far, here K acts as smooth parameter. June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA x6 x7 K = 1 0 1 2 0 1 2 x6 x7 K = 3 0 1 2 0 1 2 x6 x7 K = 31 0 1 2 0 1 2
  • 140. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA
  • 141. NONPARAMETRIC METHODS THE EXPONENTIAL FAMILY Today's topics 1. The exponential family 1.  What is exponential family? 2.  Maximum likelihood for EF 3.  How to decide priors for EF 2. Nonparametric methods 1.  What is the point of nonparametric methods ? 2.  Kernel density estimator 3.  Nearest-neighbour methods June 11, 2014 PRML 2.4-2.5 Shinichi TAMURA