2. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Computer Vision and Classiļ¬cation
The k-nearest-neighbor method
The k-nearest-neighbor (knn) procedure has been used in data
analysis and machine learning communities as a quick way to
classify objects into predeļ¬ned groups.
This approach requires a training dataset where both the class y
and the vector x of characteristics (or covariates) of each
observation are known.
414 / 458
3. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Computer Vision and Classiļ¬cation
The k-nearest-neighbor method
The training dataset (yi , xi )1ā¤iā¤n is used by the
k-nearest-neighbor procedure to predict the value of yn+1 given a
new vector of covariates xn+1 in a very rudimentary manner.
The predicted value of yn+1 is simply the most frequent class found
amongst the k nearest neighbors of xn+1 in the set (xi )1ā¤iā¤n .
415 / 458
4. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Computer Vision and Classiļ¬cation
The k-nearest-neighbor method
The classical knn procedure does not involve much calibration and
requires no statistical modeling at all!
There exists a Bayesian reformulation.
416 / 458
5. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Computer Vision and Classiļ¬cation
vision dataset (1)
vision: 1373 color pictures, described by 200 variables rather than
by the whole table of 500 Ć 375 pixels
Four classes of images: class C1 for motorcycles, class C2 for
bicycles, class C3 for humans and class C4 for cars
417 / 458
6. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Computer Vision and Classiļ¬cation
vision dataset (2)
Typical issue in computer vision problems: build a classiļ¬er to
identify a picture pertaining to a speciļ¬c topic without human
intervention
We use about half of the images (648 pictures) to construct the
training dataset and we save the 689 remaining images to test the
performance of our procedures.
418 / 458
8. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Computer Vision and Classiļ¬cation
A probabilistic version of the knn methodology (1)
Symmetrization of the neighborhood relation: If xi belongs to the
k-nearest-neighborhood of xj and if xj does not belong to the
k-nearest-neighborhood of xi , the point xj is added to the set of
neighbors of xi
Notation: i ā¼k j
The transformed set of neighbors is then called the symmetrized
k-nearest-neighbor system.
420 / 458
9. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Computer Vision and Classiļ¬cation
A probabilistic version of the knn methodology (2)
421 / 458
10. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Computer Vision and Classiļ¬cation
A probabilistic version of the knn methodology (3)
ļ£« ļ£¶
exp ļ£Ī² ICj (yā ) Nk ļ£ø
āā¼k i
P(yi = Cj |yāi , X, Ī², k) = ļ£« ļ£¶,
G
exp ļ£Ī² ICg (yā ) Nk ļ£ø
g=1 āā¼k i
Nk being the average number of neighbours over all xi ās, Ī² > 0
and
yāi = (y1 , . . . , yiā1 , yi+1 , . . . , yn ) and X = {x1 , . . . , xn } .
422 / 458
11. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Computer Vision and Classiļ¬cation
A probabilistic version of the knn methodology (4)
Ī² grades the inļ¬uence of the prevalent neighborhood class ;
the probabilistic knn model is conditional on the covariate
matrix X ;
the frequencies n1 /n, . . . , nG /n of the classes within the
training set are representative of the marginal probabilities
p1 = P(yi = C1 ), . . . , pG = P(yi = CG ):
If the marginal probabilities pg are known and diļ¬erent from ng /n
=ā reweighting the various classes according to their true
frequencies
423 / 458
12. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Computer Vision and Classiļ¬cation
Bayesian analysis of the knn probabilistic model
From a Bayesian perspective, given a prior distribution Ļ(Ī², k)
with support [0, Ī²max ] Ć {1, . . . , K}, the marginal predictive
distribution of yn+1 is
P(yn+1 = Cj |xn+1 , y, X) =
K Ī²max
P(yn+1 = Cj |xn+1 , y, X, Ī², k)Ļ(Ī², k|y, X) dĪ² ,
k=1 0
where Ļ(Ī², k|y, X) is the posterior distribution given the training
dataset (y, X).
424 / 458
13. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Computer Vision and Classiļ¬cation
Unknown normalizing constant
Major diļ¬culty: it is impossible to compute f (y|X, Ī², k)
Use instead the pseudo-likelihood made of the product of the full
conditionals
G
f (y|X, Ī², k) = P(yi = Cg |yāi , X, Ī², k)
g=1 yi =Cg
425 / 458
14. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Computer Vision and Classiļ¬cation
Pseudo-likelihood approximation
Pseudo-posterior distribution
Ļ(Ī², k|y, X) ā f (y|X, Ī², k)Ļ(Ī², k)
Pseudo-predictive distribution
P(yn+1 = Cj |xn+1 , y, X) =
K Ī²max
P(yn+1 = Cj |xn+1 , y, X, Ī², k)Ļ(Ī², k|y, X) dĪ² .
k=1 0
426 / 458
15. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Computer Vision and Classiļ¬cation
MCMC implementation (1)
MCMC approximation is required
Random walk MetropolisāHastings algorithm
Both Ī² and k are updated using random walk proposals:
(i) for Ī², logistic tranformation
Ī² (t) = Ī²max exp(Īø(t) ) (exp(Īø(t) ) + 1) ,
in order to be able to simulate a normal random walk on the Īøās,
Ė
Īø ā¼ N (Īø(t) , Ļ 2 );
427 / 458
17. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Computer Vision and Classiļ¬cation
MCMC implementation (3)
2000
8
1500
Frequency
7
Ī²
1000
6
500
5
0
0 5000 10000 15000 20000 5.0 5.5 6.0 6.5 7.0 7.5
Iterations
15000
40
30
10000
Frequency
k
20
5000
10
0
0 5000 10000 15000 20000 4 6 8 10 12 14 16
Iterations
Ī²max = 15, K = 83, Ļ 2 = 0.05 and r = 1
k sequences for the knn MetropolisāHastings Ī² based on 20, 000 iterations
429 / 458
18. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Image Segmentation
This underlying structure of the ātrueā pixels is denoted by x,
while the observed image is denoted by y.
Both objects x and y are arrays, with each entry of x taking a
ļ¬nite number of values and each entry of y taking real values.
We are interested in the posterior distribution of x given y
provided by Bayesā theorem, Ļ(x|y) ā f (y|x)Ļ(x).
The likelihood, f (y|x), describes the link between the observed
image and the underlying classiļ¬cation.
430 / 458
19. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Menteith dataset (1)
The Menteith dataset is a 100 Ć 100 pixel satellite image of the
lake of Menteith.
The lake of Menteith is located in Scotland, near Stirling, and
oļ¬ers the peculiarity of being called ālakeā rather than the
traditional Scottish āloch.ā
431 / 458
20. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Menteith dataset (2)
100
80
60
40
20
20 40 60 80 100
Satellite image of the lake of Menteith
432 / 458
21. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Random Fields
If we take a lattice I of sites or pixels in an image, we denote by
i ā I a coordinate in the lattice. The neighborhood relation is
then denoted by ā¼.
A random ļ¬eld on I is a random structure indexed by the lattice
I, a collection of random variables {xi ; i ā I} where each xi takes
values in a ļ¬nite set Ļ.
433 / 458
22. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Markov Random Fields
Let xn(i) be the set of values taken by the neighbors of i.
A random ļ¬eld is a Markov random ļ¬eld (MRF) if the conditional
distribution of any pixel given the other pixels only depends on the
values of the neighbors of that pixel; i.e., for i ā I,
Ļ(xi |xāi ) = Ļ(xi |xn(i) ) .
434 / 458
23. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Ising Models
If pixels of the underlying (true) image x can only take two colors
(black and white, say), x is then binary, while y is a grey-level
image.
We typically refer to each pixel xi as being foreground if xi = 1
(black) and background if xi = 0 (white).
We have
Ļ(xi = j|xāi ) ā exp(Ī²ni,j ) , Ī² > 0,
where ni,j = āān(i) Ixā =j is the number of neighbors of xi with
color j.
435 / 458
24. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Ising Models
The Ising model is deļ¬ned via full conditionals
exp(Ī²ni,1 )
Ļ(xi = 1|xāi ) = ,
exp(Ī²ni,0 ) + exp(Ī²ni,1 )
and the joint distribution satisļ¬es
ļ£« ļ£¶
Ļ(x) ā exp ļ£Ī² Ixj =xi ļ£ø ,
jā¼i
where the summation is taken over all pairs (i, j) of neighbors.
436 / 458
25. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Simulating from the Ising Models
The normalizing constant of the Ising Model is intractable except
for very small lattices I...
Direct simulation of x is not possible!
437 / 458
26. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Ising Gibbs Sampler
Algorithm (Ising Gibbs Sampler)
Initialization: For i ā I, generate independently
(0)
xi ā¼ B(1/2) .
Iteration t (t ā„ 1):
1 Generate u = (ui )iāI , a random ordering of the elements of I.
(t) (t)
2 For 1 ā¤ ā ā¤ |I|, update nuā ,0 and nuā ,1 , and generate
(t)
exp(Ī²nuā ,1 )
x(t) ā¼ B
uā (t) (t)
.
exp(Ī²nuā ,0 ) + exp(Ī²nuā ,1 )
438 / 458
27. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Ising Gibbs Sampler (2)
20 60 100 20 60 100 20 60 100 20 60 100 20 60 100
20
20
20
20
20
40
40
40
40
40
60
60
60
60
60
80
80
80
80
80
100
100
100
100
100
20 60 100 20 60 100 20 60 100 20 60 100 20 60 100
20
20
20
20
20
40
40
40
40
40
60
60
60
60
60
80
80
80
80
80
100
100
100
100
100
Simulations from the Ising model with a four-neighbor neighborhood structure
on a 100 Ć 100 array after 1, 000 iterations of the Gibbs sampler: Ī² varies in
steps of 0.1 from 0.3 to 1.2.
439 / 458
28. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Potts Models
If there are G colors and if ni,g denotes the number of neighbors of
i ā I with color g (1 ā¤ g ā¤ G) (that is, ni,g = jā¼i Ixj =g )
In that case, the full conditional distribution of xi is chosen to
satisfy Ļ(xi = g|xāi ) ā exp(Ī²ni,g ).
This choice corresponds to the Potts model, whose joint density is
given by ļ£« ļ£¶
Ļ(x) ā exp ļ£Ī² Ixj =xi ļ£ø .
jā¼i
440 / 458
29. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Simulating from the Potts Models (1)
Algorithm (Potts MetropolisāHastings Sampler)
Initialization: For i ā I, generate independently
(0)
xi ā¼ U ({1, . . . , G}) .
Iteration t (t ā„ 1):
1 Generate u = (ui )iāI a random ordering of the elements of I.
2 For 1 ā¤ ā ā¤ |I|,
generate
x(t) ā¼ U ({1, xuā ā 1, xuā + 1, . . . , G}) ,
Ė uā (tā1) (tā1)
441 / 458
30. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Simulating from the Potts Models (2)
Algorithm (continued)
(t)
compute the nul ,g and
(t)(t)
Ļl = exp(Ī²nuā ,Ė )/ exp(Ī²nuā ,xu ) ā§ 1 ,
x ā
(t)
and set xuā equal to xuā with probability Ļl .
Ė
442 / 458
31. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Simulating from the Potts Models (3)
20 60 100 20 60 100 20 60 100 20 60 100 20 60 100
20
20
20
20
20
40
40
40
40
40
60
60
60
60
60
80
80
80
80
80
100
100
100
100
100
20 60 100 20 60 100 20 60 100 20 60 100 20 60 100
20
20
20
20
20
40
40
40
40
40
60
60
60
60
60
80
80
80
80
80
100
100
100
100
100
Simulations from the Potts model with four grey levels and a four-neighbor
neighborhood structure based on 1000 iterations of the MetropolisāHastings
sampler. The parameter Ī² varies in steps of 0.1 from 0.3 to 1.2.
443 / 458
32. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Posterior Inference (1)
The prior on x is a Potts model with G categories,
ļ£« ļ£¶
1
Ļ(x|Ī²) = exp ļ£Ī² Ixj =xi ļ£ø ,
Z(Ī²)
iāI jā¼i
where Z(Ī²) is the normalizing constant.
Given x, we assume that the observations in y are independent
normal random variables,
1 1
f (y|x, Ļ 2 , Āµ1 , . . . , ĀµG ) = exp ā 2 (yi ā Āµxi )2 .
(2ĻĻ 2 )1/2 2Ļ
iāI
444 / 458
33. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Posterior Inference (2)
Priors
Ī² ā¼ U ([0, 2]) ,
Āµ = (Āµ1 , . . . , ĀµG ) ā¼ U ({Āµ ; 0 ā¤ Āµ1 ā¤ . . . ā¤ ĀµG ā¤ 255}) ,
Ļ(Ļ 2 ) ā Ļ ā2 I]0,ā[ (Ļ 2 ) ,
the last prior corresponding to a uniform prior on log Ļ.
445 / 458
35. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Full conditionals (1)
ļ£± ļ£¼
ļ£² 1 ļ£½
P(xi = g|y, Ī², Ļ 2 , Āµ) ā exp Ī² Ixj =g ā (yi ā Āµg )2 ,
ļ£³ 2Ļ 2 ļ£¾
jā¼i
can be simulated directly.
ng = Ixi =g and sg = Ixi =g yi
iāI iāI
the full conditional distribution of Āµg is a truncated normal
distribution on [Āµgā1 , Āµg+1 ] with mean sg /ng and variance Ļ 2 /ng
447 / 458
36. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Full conditionals (2)
The full conditional distribution of Ļ 2 is an inverse gamma
distribution with parameters |I|2 /2 and iāI (yi ā Āµxi )2 /2.
The full conditional distribution of Ī² is such that
ļ£« ļ£¶
1
Ļ(Ī²|y) ā exp ļ£Ī² Ixj =xi ļ£ø .
Z(Ī²)
iāI jā¼i
448 / 458
39. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Path sampling Approximation (3)
For a given value of Ī², EĪ² [S(X)] can be approximated from an
MCMC sequence.
The integral itself can be approximated by computing the value of
f (Ī²) = EĪ² [S(X)] for a ļ¬nite number of values of Ī² and
Ė
approximating f (Ī²) by a piecewise-linear function f (Ī²) for the
intermediate values of Ī².
451 / 458
40. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Path sampling Approximation (3)
40000
35000
30000
25000
20000
15000
10000
0.0 0.5 1.0 1.5 2.0
Approximation of f (Ī²) for the Potts model on a 100 Ć 100 image, a
four-neighbor neighborhood, and G = 6, based on 1500 MCMC iterations after
burn-in. 452 / 458
43. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Posterior Approximation (3)
120
20.5
100
20.0
80
60
19.5
40
19.0
20
18.5
0
0 500 1000 1500 18.5 19.0 19.5 20.0 20.5
1.295 1.300 1.305 1.310 1.315 1.320
100 120
80
60
40
20
0
0 500 1000 1500 1.295 1.300 1.305 1.310 1.315 1.320
Raw plots and histograms of the Ļ 2 ās and Ī²ās based on 2000 iterations of the
hybrid Gibbs sampler
455 / 458
44. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Image segmentation (1)
Based on (x(t) )1ā¤tā¤T , an estimator of x needs to be derived from
an evaluation of the consequences of wrong allocations.
Two common ways corresponding to two loss functions
L1 (x, x) = Ixi =Ėi ,
x
iāI
L2 (x, x) = Ix=b ,
x
Corresponding estimators
xM P M = arg max PĻ (xi = g|y) ,
Ėi iāI,
1ā¤gā¤G
xM AP = arg max Ļ(x|y) ,
x
456 / 458
45. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Image segmentation (2)
xM P M and xM AP not available in closed form!
Approximation
N
xM P M
Ėi = max Ix(j) =g ,
gā{1,...,G} i
j=1
based on a simulated sequence, x(1) , . . . , x(N ) .
457 / 458
46. Bayesian Core:A Practical Approach to Computational Bayesian Statistics
Image analysis
Image Segmentation
Image segmentation (3)
100
80
60
40
20
20 40 60 80 100
100
80
60
40
20
20 40 60 80 100
(top) Segmented image based on the MPM estimate produced after 2000
iterations of the Gibbs sampler and (bottom) the observed image
458 / 458