Lec17 sparse signal processing & applications

Image Analysis & Retrieval
CS/EE 5590 Special Topics (Class Ids: 44873, 44874)
Fall 2016, M/W 4-5:15pm@Bloch 0012
Lec 17
Sparse Signal Processing & Applications
Zhu Li
Dept of CSEE, UMKC
Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346.
http://l.web.umkc.edu/lizhu
p.1Z. Li, Image Analysis & Retrv, 2016 Fall

Outline
 Recap:
 Piece-wise Linear Models via Query Driven Solution
 Subspace Indexing on Grassmann Manifold
 Optimization of Subspace on Grassmann Manifold
 Sparse Signal Processing
 Sparse Representation and Robust PCA
 Sparse Signal Processing
 L1 norm and L1 Magic Solution
 Application in occluded face recognition
 Summary

Piece-wise Linear : Query Driven
• Query-Driven Piece-wise Linear Model
– No pre-determined structure on the training data
– Local neighborhood data patch identified from query point q,
– Local model built with local data, A(X, q)

DPC – Discriminant Power Coefficient
 The tradeoffs in local data support size

Face Recognition
 On ATT Data set: 40 subjects, 400 images:
Extra credit: 10pts 
 Develop a query driven local Laplacianface model for HW-3

Subspace Indexing on Grassmann Manifold
• Subspace Clustering by Grassmann Metric:
– It is a VQ like process.
– Start with a data partition kd-tree, their leaf nodes and associated subspaces {Ak},
k=1..2h
– Repeat
» Find Ai and Aj, if darc(Ai, Aj) is the smallest among all, and the associated data
patch are adjacent in the data space.
» Delete Ai and Aj, replace with merged new subspace, and update associated data
patch leaf nodes set.
» Compute the empirical identification accuracy for the merged subspace
» Add parent pointer to the merged new subspace for Ai and Aj .
» Stop if only 1 subspace left.

Simulation
• Face data set
– Mixed data set of 242 individuals, and 4840 face images
– Performance compared with PCA, LDA and LPP modeling

Newtonian Method in Optimization
 Recall that in optimizing a functional over vector
variables f(X), X in Rn,
p.8
Credit: Kerstin Johnsson, Lund Univ
Z. Li, Image Analysis & Retrv, 2016 Fall

Gradient & Hessian on Grassmann Manifold
 Gradient on Grassmann manifold:

Hessian on Grassmann Manifold
 Hessian:
 FY = nxp 1st order differentiation
 FYY= 2nd order differentiation along Y

Newton’s Method on Grassmann Manifold
 Overall framework
Prof. A. Edelman’s matlab package:
 https://umkc.box.com/s/g2oyqvsb2lx2v9wzf0ju60wnspts4t9g

Outline
 Recap:
 Piece-wise Linear Models via Query Driven Solution
 Subspace Indexing on Grassmann Manifold
 Optimization of Subspace on Grassmann Manifold
 Sparse Representation and Robust PCA
 Sparse Signal Processing
 L1 norm and L1 Magic Solution
 Application in occluded face recognition
 Summary

Sparse representation
• Signals/Images are sparse if it can have very few non-zero
coefficients representation in certain subspace:
– E.g. cameraman image X represented as 2-D DCT in Y:
• How is this related to classification problem ?
– Intuitively, sparse is good for classification, because it is to
separate samples from different classes
– Only when data points are dense and intertwined , classification is
hard
– How to characterize this mathematically ? 
x y=dct2(x)
Eigen face

Sparsity in Human Visual System

Sparse Signal Recovery
 If x is sparse, i.e |x|0 is small, we can recovery x by a
random projection measurement, y=Ax
 Basis pursuit de-noising:
 LASSO:

Sparse Face Model
Consider a face recognition system
 We have k=1,2,…,K subjects, each subject has nk training samples
{[v1,1, .., v1,n1], [v2,1, .., v2,n2], …, [vK,1, .., vK,nK]}, each is a
thumbnail image with d=wxh pixels.
 Let us stack all training samples as a collection of column vectors,
A, of d N, N=n1 + n2 + … + nK.
 The problem is, for a given thumbnail image, y, with unknown
class label, how to solve for its label ?K

Assume y is belonging to class i, then,
Or,
Where only a small number of coefficients in x has non-zero entry, thus sparse.
Sparsity

Assume y is belonging to class 1, then,
Most co-efficients related to other classes are zero, only a small
number of non-zero coefficients in alpha 1
Illustration of Sparsity

• So the problem is rather straight forward
– Give y = Ax, where
• y is the unknown face image in Rd,
• A is the d x N training data matrix, or dictionary, with N large
• x is the coefficients of y as linear combination of training
samples that is sparse, out of total N coefficients, only a small
number of them are non-zero
– Mathematically, we are looking for :
• Where |x|0 is L0 norm, which counts number of non-zero
coefficients in x.
Mathematical formulation
𝑥0 = arg min
𝑥
𝑥 0, 𝑠. 𝑡. , 𝐴𝑥 = 𝑦

• The L0 minimization problem is basically a
combinatorial optimization problem
• Not much structure to exploit fast algorithm
• Dumbest solution:
– Assuming that x has at most 3 non-zero coefficients, then
search total
– Possible coefficients combinations and find the one gives
the best match
– It is an kNN search in effect !
L0 minimization is NP hard
𝑁
1
+
𝑁
2
+
𝑁
3

L0 and L1 norm
 Lk norm (recall minkowski distance)

L1 solution
L-2 ball

L1 based recognition

L1 solution for invalid input images
• For non-face images:
– Non sparse coefficients in x
• Can threshold on residual to return not found result

Occlusion and Disguise
• A big problem in biometrics is disguise and occlusion
• The magic of sparsity and L1 minimization can deal with
that effectively !
• Consider a face image with a small fraction p of its pixels
corrupted:

• Let the occluded face images be y = Ax + e
• Then re-state the constraint as,
• then solve for P1 with y=Bw. Notice that sparsity in w is
achieved thru sparsity in both x and e.
Sparsity criteria takes care of occlusion

• Occlusion example
– Large L2 errors, not recoverable by Eigenface/Fisherface:
• Accuracy for sunglasses and scarves effects:
Occluded face recognition

L1 vs L2 minimization
• A natural question is why not solve y=Ax with L2
minimization ?
– Typically, number of training samples is smaller than number of pixels
in the training images, so why not do a pseudo-inverse like:
– Which looks for a Maximum Likelihood estimation of true x, if noises
are Gaussian with covariance sI.
– However, the noises are non-gaussian and can be unbounded. The
resulting L2 solution pretty bad

L2 solution for Occlusion
• Example with occlusion:
– (a): Occluded face
– (b): x solved from L2 minimization, not sparse at all
– (c ): error
– (d ): reconstruction from x

L1 vs L2 minimization
L1 vs L2 in 2D space:
y=Ax

Sparsity is bad news for L2
• Given training set A, the unknown image y is under-
determined in A:
– R(A): a set of y that satisfies y=Ax:

Numeric solution for L1 minimization
Candes (of CalTech)’s group has this L1 magic matlab
toolbox
 Check out manual on course webpage
 Stephen Boyd:
 Boyd’s nice book on Optimization can be downloaded from
his webpage at Stanford.
 Excellent book, with slides, homework and solutions.
 https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf

Numerical Tool from L1 Magic
 L1 Magic Toolbox:
p.33
% signal length
N = 512;
% number of spikes in the signal, must be sparse w.r.t N
T = 20;
% number of observations to make
K = 120;
% random +/- 1 signal
x = zeros(N,1);
q = randperm(N);
x(q(1:T)) = sign(randn(T,1));
subplot(3,1,1); plot(x); title('x(t)'); axis([1 500 -1.2 1.2]);
% measurement matrix: random measuring
fprintf('n Creating random measurement matrix...');
A = randn(K,N);
% othorgonalize
A = orth(A')';
% observations
y = A*x;
% initial guess = min energy
x0 = A'*y;
subplot(3,1,2); plot(x0); title('x_0(t)'); axis([1 500 -1.2 1.2]);
% solve with primal-dual method
xp = l1eq_pd(x0, A, [], y, 1e-3);
subplot(3,1,3); plot(xp); title('x(t) recovered by L1 magic'); axis([1 500 -1.2 1.2]);
% test l1magic
end

L1 Magic demo
50 100 150 200 250 300 350 400 450 500
-1
0
1
x(t)
50 100 150 200 250 300 350 400 450 500
-1
0
1
x0(t)
50 100 150 200 250 300 350 400 450 500
-1
0
1
x(t) recovered by l1 magic
Original sparse signal
L1 magic recovered
sparse signal
Pseudo-inverse: L2 recovery

Sparse face
 Recover face as sparse signal
p.35
% create our measure matrix A: face + nonface icons
A=zeros(1600, w*h);
A(1:400, :) = faces; A(401:1600, :) = nfaces;
[N, dim]=size(A);
% in col vec form
A = A';
% pick a face: offs in 1-400
figure(3); colormap('gray');
offs = 20; y = faces(offs, :)';
subplot(2,2,1); axis off; imagesc(reshape(y, h,w)); title('fontsize{11}original');
% solve for xp = min |x|, s.t. y=Ax
% initial guess = min energy
x0 = A'*y;
% solve with primal-dual method
xp = l1eq_pd(x0, A, [], y, 1e-3);
% normalize
x0 = x0./norm(x0);
xp = xp./norm(xp);
% reconstructed face
yp = A*xp;
subplot(2,2,2); axis off; imagesc(reshape(yp, h,w)); title('fontsize{11}sparse
reconstruction');

L1Magic for Face Recognition

Super Resolution
Super-Resolution
 Super-resolves a lower
resolution patch, say k x k,
to 3k x 3k.
 Mathematically, learn a
function:
p.37
𝑓 𝑥 → 𝑌, 𝑥 ∈ 𝑅 𝑑, 𝑌 ∈ 𝑅 𝐷

Basic Framework
 Super-resolve is the inverse of down scaling:
 Low res patch y is the blurred and scaled high res patch x:
 Assume the high res image is sparse on some dictionary (true,
say DCT):
p.38
Output OriginalInput
Training patches
≈
𝑦 = 𝑆𝐻𝑥

Coupled Dictionary Learning
 Pre-train a common set of coupled low and high
resolution dictionary
 Super-resolve by solving L1 minimization on lower
resolution patch, and use the same coeffiients to
superresolve the higher resolution patch

 Learn two sets of Dictionaries, Dh, Dl, that have
common sparse coefficients for low and high resolution
image patches, y and x:
 Reconstruction of low res patch with sparse coefficients:
 Furthermore, introduce a linear projection, F, to enforce
perceptual metrics
 Then the high res patch x, can be constructed as
p.40
min 𝛼 0
, 𝑠. 𝑡. , 𝐷𝑙 𝛼 − 𝑦
2
≤ 𝜖
min 𝛼 0
, 𝑠. 𝑡. , 𝐹𝐷𝑙 𝛼 − 𝐹𝑦
2
≤ 𝜖
𝑥 = 𝐷ℎ 𝛼
Yang, J Wright, TS Huang, Y Ma, Image super-resolution via sparse representation, IEEE Trans.
Image Processing, vol.19 (11), 2861-2873

 Put together, super resolve is to solve:
 Sparse reconstruction of lower resolution y
 Enforce local consistence with high res patches, extract
adjacent overlapping stripes, via P, to be in agreement, w is
the previously reconstructed patch pixels:
 Solution via Lagrangian relaxation:

Overall Algorithm
 Patch level super-resolution, complete with global
image gradient search

Dictionary Training
Training data: low and high
resolution image patches Yl={yk},
Xh={xk}:
 Enforce the common sparse
coefficients

Results
 Dictionary Training
 From flowers and animals data set, covering a variety
of texture
 Training dictionary from more than 100,000
samples
p.44
𝐷ℎ
𝐷𝑙

Results
 3x super-resolution
p.45
Bicubic Neighbor embedding
[Chang CVPR ‘04]
Low-resolution
input
OriginalCoupled Dictionary

Related Work
 Potential paper review project

Summary
 If signal is sparse in some (unknown) domain, then from a random measurement,
we can reliably recover the signal via L1 minimization
 Applications: Robust PCA and Face Recognition with Occlusion
 Face images are sparse linear combination from a face dictionary
 Recovery from solving L1 problem ~ caveat: only additive noises can be delt.
 Applications: Coupled Dictionary for Image Super Resolution
 Coupled dictionary: high and low res image patches sharing the same coefficients.
p.47
min
𝑥
𝑥 1, 𝑠. 𝑡. 𝑦 = 𝐴𝑥

Lec17 sparse signal processing & applications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Lec17 sparse signal processing & applications

Similar to Lec17 sparse signal processing & applications (20)

More from United States Air Force Academy

More from United States Air Force Academy (11)

Recently uploaded

Recently uploaded (20)

Lec17 sparse signal processing & applications