SlideShare a Scribd company logo
1 of 32
Download to read offline
Feature Extraction for Universal Hypothesis Testing via
           Rank-Constrained Optimization

                          Dayu Huang and Sean Meyn

                    Department of Electrical and Computer Engineering
                          and Coordinated Science Laboratory
                        University of Illinois, Urbana-Champaign


                                   June 18, 2010




 Huang and Meyn (UIUC)               Feature Extraction                 June 2010   1 / 18
Introduction
Universal Hypothesis Testing
                                n
     Sequence of observations: Z1 := (Z1 , . . . , Zn ).
     i.i.d. π 0 under H0, π 1 under H1
     π 0 : known           π 1 : not known
     Observation space Z is finite.
     Task: Design a test to decide in favor of H0 or H1.




   Huang and Meyn (UIUC)             Feature Extraction    June 2010   2 / 18
Introduction
Universal Hypothesis Testing
                                n
     Sequence of observations: Z1 := (Z1 , . . . , Zn ).
     i.i.d. π 0 under H0, π 1 under H1
     π 0 : known           π 1 : not known
     Observation space Z is finite.
     Task: Design a test to decide in favor of H0 or H1.

The Hoeffding test
                               φH = 1{D(Γn π 0 ) ≥ η},
                                n

Empirical distribution
                                       n
                                  1
                       Γn (A) =              1{Zk ∈ A},         A ⊂ Z.
                                  n
                                      k=1



   Huang and Meyn (UIUC)                   Feature Extraction            June 2010   2 / 18
The Hoeffding Test

Theorem
  1     1 The
            Hoeffding test achieves the optimal error exponents in
        Neyman-Pearson criterion.




1. Hoeffding 1963;

      Huang and Meyn (UIUC)      Feature Extraction            June 2010   3 / 18
The Hoeffding Test

Theorem
  1     1 The
            Hoeffding test achieves the optimal error exponents in
        Neyman-Pearson criterion.

  2     2 Theasymptotic variance of the Hoeffding test depends on the size of
        the observation space. When Z1 has marginal π 0 , we have
                                       n


                              lim Var [nD(Γn π 0 )] = 2 (|Z| − 1).
                                                      1
                              n→∞




                                                Large variance when |Z| large
1. Hoeffding 1963; 2. Unnikrishnan, Huang, Meyn, Surana & Veeravalli; Wilks 1938;
Clarke & Barron 1990.
      Huang and Meyn (UIUC)            Feature Extraction                June 2010   3 / 18
Performance of the Hoeffding Test 1




     Probability of Detection
                                0.8




                                                                                   |Z|=19
                                0.6
   Pr(φ = 1|H1)




                                                                                   |Z|=39
                                0.4




                                0.2




                                          Red: Better error exponent but larger variance
                                 0
                                      0        0.2         0.4               0.6      0.8        1


                                                     Probability of False Alarm
                                                                 Pr(φ = 1|H0).
  Huang and Meyn (UIUC)                                 Feature Extraction                  June 2010   4 / 18
Mismatched Universal Test

Variational representation of KL divergence

                 D(µ π) = sup µ, f − log( π, e f )
                           f
                                                  µ, f =       z   µ(z)f (z)
Mismatched divergence 1


                      DF (µ π) := sup µ, f − log( π, e f )
                       MM

                                   f ∈F




   Huang and Meyn (UIUC)           Feature Extraction        June 2010   5 / 18
Mismatched Universal Test

Variational representation of KL divergence

                 D(µ π) = sup µ, f − log( π, e f )
                           f
                                                  µ, f =       z   µ(z)f (z)
Mismatched divergence 1


                      DF (µ π) := sup µ, f − log( π, e f )
                       MM

                                   f ∈F


Mismatched universal test     2


                           φMM = 1{DF (Γn π 0 ) ≥ η}
                            n
                                    MM




1. Abbe, M´dard, Meyn & Zheng 2007; 2. Unnikrishnan et al.
          e
   Huang and Meyn (UIUC)           Feature Extraction        June 2010   5 / 18
Function Class and Performance

Consider a linear function class:
                                                 d
                             F = fr :=                   ri ψi
                                                  i

Choice of function class F determines performance:
     Mismatched divergence approximates KL divergence. Determines
     error exponent of the mismatched universal test. When d is smaller
     than |Z|, it is optimal for a restricted set of alternative distributions.




1. Unnikrishnan et al.
   Huang and Meyn (UIUC)            Feature Extraction              June 2010   6 / 18
Function Class and Performance

Consider a linear function class:
                                                 d
                             F = fr :=                   ri ψi
                                                  i

Choice of function class F determines performance:
     Mismatched divergence approximates KL divergence. Determines
     error exponent of the mismatched universal test. When d is smaller
     than |Z|, it is optimal for a restricted set of alternative distributions.
     Dimension d determines asymptotic variance1 : Under H0,

                           lim Var [nDF (Γn π 0 )] = 1 d
                                      MM
                                                     2
                           n→∞

Problem: How to choose function class F?

1. Unnikrishnan et al.
   Huang and Meyn (UIUC)            Feature Extraction              June 2010   6 / 18
Our Contribution




 1     Mismatched test even with a small dimension d is optimal for a large
       set of alternative distributions.
 2     Framework to choose F for the mismatched test.




     Huang and Meyn (UIUC)       Feature Extraction             June 2010   7 / 18
How powerful is mismatched test?


Example
                                 0.2                                        1

                                0.15                                       0.8


                          π0     0.1

                                0.05
                                                                           0.6
                                                                           0.4
                                                                           0.2
                                  0                                         0
                                       1   2   3   4   5   6   7   8   9         1   2   3   4   5   6    7   8   9

                                  1                                         1
                                 0.8                                       0.8
                                 0.6                                       0.6
                                 0.4                                       0.4
                                 0.2                                       0.2
                                  0                                         0
                                       1   2   3   4   5   6   7   8   9         1   2   3   4   5   6    7   8   9

                                  1                                         1


 10 distributions.               0.8
                                 0.6
                                 0.4
                                                                           0.8
                                                                           0.6
                                                                           0.4



 d =?                            0.2
                                  0
                                       1   2   3   4   5   6   7   8   9
                                                                           0.2
                                                                            0
                                                                                 1   2   3   4   5   6    7   8   9

                                  1                                        0.8
                                                                           0.7
                                 0.8                                       0.6
                                 0.6                                       0.5
                                                                           0.4
                                 0.4                                       0.3
                                 0.2                                       0.2
                                                                           0.1
                                  0                                          0
                                       1   2   3   4   5   6   7   8   9         1   2   3   4   5   6    7   8   9

                                 0.7                                       0.7
                                 0.6                                       0.6
                                 0.5                                       0.5
                                 0.4                                       0.4
                                 0.3                                       0.3
                                 0.2                                       0.2
                                 0.1                                       0.1
                                   0                                         0
                                       1   2   3   4   5   6   7   8   9         1   2   3   4   5   6    7   8   9




  Huang and Meyn (UIUC)        Feature Extraction                                                        June 2010    8 / 18
When MM is optimal?

When does DF (π 1 π 0 ) = D(π 1 π 0 )?
           MM



Fact (1)
When F includes LLR.




   Huang and Meyn (UIUC)      Feature Extraction   June 2010   9 / 18
When MM is optimal?

When does DF (π 1 π 0 ) = D(π 1 π 0 )?
           MM



Fact (1)
When F includes LLR.




Exponential family E(F) = {µ : µ(z) ∝ (exp f (z)), f ∈ F}.

Fact (2)
When π 0 , π 1 are in the same exponential family.


How many distributions in an d-dimensional exponential family?


   Huang and Meyn (UIUC)        Feature Extraction           June 2010   9 / 18
-Extremal Distributions
πθ (z) ∝ exp(θf (z)) ∈ E(F)
                           θ→∞
Extremal distributions: πθ − − Distributions on the boundary of E(F).
                            −→




   Huang and Meyn (UIUC)      Feature Extraction          June 2010   10 / 18
-Extremal Distributions
πθ (z) ∝ exp(θf (z)) ∈ E(F)
                           θ→∞
Extremal distributions: πθ − − Distributions on the boundary of E(F).
                            −→

Example
F = span(ψ): ψ = [5, −1, −1] i.e. ψ(z1 ) = −5, ψ(z2 ) = ψ(z3 ) = −1.
What are the extremal distributions?
[1, 0, 0]      : f = [5, −1, −1]
[0, 0.5, 0.5] : f = [−5, 1, 1]
[1/3, 1/3, 1/3]: f = [0, 0, 0]




   Huang and Meyn (UIUC)       Feature Extraction           June 2010   10 / 18
-Extremal Distributions
πθ (z) ∝ exp(θf (z)) ∈ E(F)
                           θ→∞
Extremal distributions: πθ − − Distributions on the boundary of E(F).
                            −→

Example
F = span(ψ): ψ = [5, −1, −1] i.e. ψ(z1 ) = −5, ψ(z2 ) = ψ(z3 ) = −1.
What are the extremal distributions?
[1, 0, 0]      : f = [5, −1, −1]
[0, 0.5, 0.5] : f = [−5, 1, 1]
[1/3, 1/3, 1/3]: f = [0, 0, 0]


F (π) := {z : π(z) ≥ maxz (π(z)) − }

Definition
• π is called -extremal if π(F (π)) ≥ 1 − .
Example: [0.004, 0.499, 0.497].

   Huang and Meyn (UIUC)          Feature Extraction        June 2010   10 / 18
-Distinguishable Distributions
Distinguishable
D(π 1 π 0 ) = D(π 0 π 1 ) = ∞ ⇔ π 1           π 0 and π 0   π1.

Example
π 0 (z1 ) = 0.5, π 0 (z2 ) = 0.5, π 0 (z3 ) = 0
π 1 (z1 ) = 0, π 1 (z2 ) = 0.5, π 1 (z3 ) = 0.5




   Huang and Meyn (UIUC)            Feature Extraction            June 2010   11 / 18
-Distinguishable Distributions
Distinguishable
D(π 1 π 0 ) = D(π 0 π 1 ) = ∞ ⇔ π 1           π 0 and π 0   π1.

Example
π 0 (z1 ) = 0.5, π 0 (z2 ) = 0.5, π 0 (z3 ) = 0
π 1 (z1 ) = 0, π 1 (z2 ) = 0.5, π 1 (z3 ) = 0.5

Approximately distinguishable
Example
π 0 (z1 ) = 0.49999, π 0 (z2 ) = 0.49999, π 0 (z3 ) = 0.00002
π 1 (z1 ) = 0.00002, π 1 (z2 ) = 0.49999, π 1 (z3 ) = 0.49999




   Huang and Meyn (UIUC)            Feature Extraction            June 2010   11 / 18
-Distinguishable Distributions
Distinguishable
D(π 1 π 0 ) = D(π 0 π 1 ) = ∞ ⇔ π 1           π 0 and π 0   π1.

Example
π 0 (z1 ) = 0.5, π 0 (z2 ) = 0.5, π 0 (z3 ) = 0
π 1 (z1 ) = 0, π 1 (z2 ) = 0.5, π 1 (z3 ) = 0.5

Approximately distinguishable
Example
π 0 (z1 ) = 0.49999, π 0 (z2 ) = 0.49999, π 0 (z3 ) = 0.00002
π 1 (z1 ) = 0.00002, π 1 (z2 ) = 0.49999, π 1 (z3 ) = 0.49999

Definition
π 1 , π 2 are -distinguishable if F (π 1 ) ⊆ F (π 1 ) and F (π 2 ) ⊆ F (π 1 ).


   Huang and Meyn (UIUC)            Feature Extraction               June 2010   11 / 18
The Number of -Distinguishable -Extremal Distributions
Definition
    N(E): The maximum N such that for any small > 0, there exist N
    distributions in E that are -extremal and pairwise -distinguishable.




  Huang and Meyn (UIUC)       Feature Extraction             June 2010   12 / 18
The Number of -Distinguishable -Extremal Distributions
Definition
     N(E): The maximum N such that for any small > 0, there exist N
     distributions in E that are -extremal and pairwise -distinguishable.

Proposition
Denote
                     ¯
                     N(d) : max{N(E) : E is d-dimensional }
It admits the following lower and upper bounds:

            ¯                  d                 d
            N(d) ≥ exp           [log(|Z|) − log   − 1]
                               2                 2
            ¯
            N(d) ≤ exp (d + 1)(1 + log(|Z|) − log(d + 1))

Many alternative distributions can be distinguished even with small
dimension d
   Huang and Meyn (UIUC)          Feature Extraction          June 2010   12 / 18
A Framework for Choosing Function Class

Scenario: Alternative distributions are in a set S (not known to the
algorithm). Observe p distributions from the set: π 1 , . . . , π p .

Objective function to be maximized:
                                          1      p    i MM i
                                maxF      p      i=1 γ DF (π   π0)
                           subject to    dim(F) ≤ d




   Huang and Meyn (UIUC)                Feature Extraction           June 2010   13 / 18
A Framework for Choosing Function Class

Scenario: Alternative distributions are in a set S (not known to the
algorithm). Observe p distributions from the set: π 1 , . . . , π p .

Objective function to be maximized:
                                           1       p    i MM i
                                  maxF     p       i=1 γ DF (π      π0)
                           subject to      dim(F) ≤ d

Rank-constrained optimization:
                                   1     p     i
                           maxX    p     i=1 γ      π i , Xi − log( π 0 , e Xi
                subject to         rank (X ) ≤ d

                                                                       µ, f =       z   µ(z)f (z)


   Huang and Meyn (UIUC)                 Feature Extraction                      June 2010   13 / 18
Algorithm




Iterative gradient projection:
  1     Y k+1 = X k + αk h(X k ).
  2     X k+1 = PS (Y k+1 ).
                     PS (Y ) = arg min{ Y − X : rank (X ) ≤ d}.
Provable local convergence.




      Huang and Meyn (UIUC)         Feature Extraction            June 2010   14 / 18
Numerical Experiment


Randomly from a set S of distributions.
     π0,
     π 1 , . . . , π p for feature extraction.
     π 1 for testing.

Experiment steps:
     Feature extraction: Extract a d-dimensional function class F based
     on π 0 and π 1 , . . . , π p .
     Test: Alternative distribution is π 1 . Estimate probability of error by
     simulation.




   Huang and Meyn (UIUC)              Feature Extraction         June 2010   15 / 18
Numerical Experiment               S: 12-dimensional exponential family.
                                   |Z | = 20. n = 30.
    Pr(φ = 1|H1)




                               Pr(φ = 1|H0).
  Huang and Meyn (UIUC)   Feature Extraction               June 2010   16 / 18
Numerical Experiment               S: 12-dimensional exponential family.
                                   |Z | = 20. n = 30.
    Pr(φ = 1|H1)




                               Pr(φ = 1|H0).
  Huang and Meyn (UIUC)   Feature Extraction               June 2010   16 / 18
Numerical Experiment               S: 12-dimensional exponential family.
                                   |Z | = 20. n = 30.
    Pr(φ = 1|H1)




                               Pr(φ = 1|H0).
  Huang and Meyn (UIUC)   Feature Extraction               June 2010   16 / 18
Numerical Experiment               S: 12-dimensional exponential family.
                                   |Z | = 20. n = 30.
    Pr(φ = 1|H1)




                               Pr(φ = 1|H0).
  Huang and Meyn (UIUC)   Feature Extraction               June 2010   16 / 18
Numerical Experiment               S: 12-dimensional exponential family.
                                   |Z | = 20. n = 30.
    Pr(φ = 1|H1)




                               Pr(φ = 1|H0).
  Huang and Meyn (UIUC)   Feature Extraction               June 2010   16 / 18
Conclusion and Future Work


Conclusions:
     Variance is as important as error exponent.
     Balance between variance and error-exponent.
     Feature extraction algorithm: Exploit prior information to optimize
     performance of mismatched test.
Future Work:
     Bound probability of error based on finer statistics.
     Extend to processes with long memory.
     Other heuristics (such as nuclear-norm) for algorithm design.




   Huang and Meyn (UIUC)        Feature Extraction            June 2010    17 / 18

More Related Content

What's hot

CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slideszukun
 
Lesson 26: Integration by Substitution (slides)
Lesson 26: Integration by Substitution (slides)Lesson 26: Integration by Substitution (slides)
Lesson 26: Integration by Substitution (slides)Matthew Leingang
 
Lesson 23: Antiderivatives (Section 041 slides)
Lesson 23: Antiderivatives (Section 041 slides)Lesson 23: Antiderivatives (Section 041 slides)
Lesson 23: Antiderivatives (Section 041 slides)Matthew Leingang
 
Lesson 23: The Definite Integral (handout)
Lesson 23: The Definite Integral (handout)Lesson 23: The Definite Integral (handout)
Lesson 23: The Definite Integral (handout)Matthew Leingang
 
Unsupervised Change Detection in the Feature Space Using Kernels.pdf
Unsupervised Change Detection in the Feature Space Using Kernels.pdfUnsupervised Change Detection in the Feature Space Using Kernels.pdf
Unsupervised Change Detection in the Feature Space Using Kernels.pdfgrssieee
 
Lesson 26: The Fundamental Theorem of Calculus (handout)
Lesson 26: The Fundamental Theorem of Calculus (handout)Lesson 26: The Fundamental Theorem of Calculus (handout)
Lesson 26: The Fundamental Theorem of Calculus (handout)Matthew Leingang
 
IGARSS_AMASM_woo_20110727.pdf
IGARSS_AMASM_woo_20110727.pdfIGARSS_AMASM_woo_20110727.pdf
IGARSS_AMASM_woo_20110727.pdfgrssieee
 
Camera calibration
Camera calibrationCamera calibration
Camera calibrationYuji Oyamada
 
Condition Monitoring Of Unsteadily Operating Equipment
Condition Monitoring Of Unsteadily Operating EquipmentCondition Monitoring Of Unsteadily Operating Equipment
Condition Monitoring Of Unsteadily Operating EquipmentJordan McBain
 
Lesson 23: Antiderivatives (Section 021 slides)
Lesson 23: Antiderivatives (Section 021 slides)Lesson 23: Antiderivatives (Section 021 slides)
Lesson 23: Antiderivatives (Section 021 slides)Matthew Leingang
 
Face Recognition Using Sign Only Correlation
Face Recognition Using Sign Only CorrelationFace Recognition Using Sign Only Correlation
Face Recognition Using Sign Only CorrelationIDES Editor
 
GMatasci_Talk_DomainSeparationForEfficientAdaptiveAL_IGARSS2011.pdf
GMatasci_Talk_DomainSeparationForEfficientAdaptiveAL_IGARSS2011.pdfGMatasci_Talk_DomainSeparationForEfficientAdaptiveAL_IGARSS2011.pdf
GMatasci_Talk_DomainSeparationForEfficientAdaptiveAL_IGARSS2011.pdfgrssieee
 
Exact Computation of the Expectation Curves of the Bit-Flip Mutation using La...
Exact Computation of the Expectation Curves of the Bit-Flip Mutation using La...Exact Computation of the Expectation Curves of the Bit-Flip Mutation using La...
Exact Computation of the Expectation Curves of the Bit-Flip Mutation using La...jfrchicanog
 
A current perspectives of corrected operator splitting (os) for systems
A current perspectives of corrected operator splitting (os) for systemsA current perspectives of corrected operator splitting (os) for systems
A current perspectives of corrected operator splitting (os) for systemsAlexander Decker
 

What's hot (17)

CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
 
22 confidence
22 confidence22 confidence
22 confidence
 
Lesson 26: Integration by Substitution (slides)
Lesson 26: Integration by Substitution (slides)Lesson 26: Integration by Substitution (slides)
Lesson 26: Integration by Substitution (slides)
 
Lesson 4: Continuity
Lesson 4: ContinuityLesson 4: Continuity
Lesson 4: Continuity
 
Lesson 23: Antiderivatives (Section 041 slides)
Lesson 23: Antiderivatives (Section 041 slides)Lesson 23: Antiderivatives (Section 041 slides)
Lesson 23: Antiderivatives (Section 041 slides)
 
Lesson 23: The Definite Integral (handout)
Lesson 23: The Definite Integral (handout)Lesson 23: The Definite Integral (handout)
Lesson 23: The Definite Integral (handout)
 
Unsupervised Change Detection in the Feature Space Using Kernels.pdf
Unsupervised Change Detection in the Feature Space Using Kernels.pdfUnsupervised Change Detection in the Feature Space Using Kernels.pdf
Unsupervised Change Detection in the Feature Space Using Kernels.pdf
 
Lesson 26: The Fundamental Theorem of Calculus (handout)
Lesson 26: The Fundamental Theorem of Calculus (handout)Lesson 26: The Fundamental Theorem of Calculus (handout)
Lesson 26: The Fundamental Theorem of Calculus (handout)
 
IGARSS_AMASM_woo_20110727.pdf
IGARSS_AMASM_woo_20110727.pdfIGARSS_AMASM_woo_20110727.pdf
IGARSS_AMASM_woo_20110727.pdf
 
Camera calibration
Camera calibrationCamera calibration
Camera calibration
 
Condition Monitoring Of Unsteadily Operating Equipment
Condition Monitoring Of Unsteadily Operating EquipmentCondition Monitoring Of Unsteadily Operating Equipment
Condition Monitoring Of Unsteadily Operating Equipment
 
Lesson 23: Antiderivatives (Section 021 slides)
Lesson 23: Antiderivatives (Section 021 slides)Lesson 23: Antiderivatives (Section 021 slides)
Lesson 23: Antiderivatives (Section 021 slides)
 
Face Recognition Using Sign Only Correlation
Face Recognition Using Sign Only CorrelationFace Recognition Using Sign Only Correlation
Face Recognition Using Sign Only Correlation
 
GMatasci_Talk_DomainSeparationForEfficientAdaptiveAL_IGARSS2011.pdf
GMatasci_Talk_DomainSeparationForEfficientAdaptiveAL_IGARSS2011.pdfGMatasci_Talk_DomainSeparationForEfficientAdaptiveAL_IGARSS2011.pdf
GMatasci_Talk_DomainSeparationForEfficientAdaptiveAL_IGARSS2011.pdf
 
Exact Computation of the Expectation Curves of the Bit-Flip Mutation using La...
Exact Computation of the Expectation Curves of the Bit-Flip Mutation using La...Exact Computation of the Expectation Curves of the Bit-Flip Mutation using La...
Exact Computation of the Expectation Curves of the Bit-Flip Mutation using La...
 
A current perspectives of corrected operator splitting (os) for systems
A current perspectives of corrected operator splitting (os) for systemsA current perspectives of corrected operator splitting (os) for systems
A current perspectives of corrected operator splitting (os) for systems
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
 

Similar to Feature Extraction for Universal Hypothesis Testing via Rank-Constrained Optimizaiton (ISIT 2010)

Anomaly Detection Using Projective Markov Models
Anomaly Detection Using Projective Markov ModelsAnomaly Detection Using Projective Markov Models
Anomaly Detection Using Projective Markov ModelsSean Meyn
 
CMA-ES with local meta-models
CMA-ES with local meta-modelsCMA-ES with local meta-models
CMA-ES with local meta-modelszyedb
 
Presentation cm2011
Presentation cm2011Presentation cm2011
Presentation cm2011antigonon
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Lesson 4: Calculating Limits (Section 21 slides)
Lesson 4: Calculating Limits (Section 21 slides)Lesson 4: Calculating Limits (Section 21 slides)
Lesson 4: Calculating Limits (Section 21 slides)Mel Anthony Pepito
 
Lesson 4: Calculating Limits (Section 21 slides)
Lesson 4: Calculating Limits (Section 21 slides)Lesson 4: Calculating Limits (Section 21 slides)
Lesson 4: Calculating Limits (Section 21 slides)Matthew Leingang
 
6. bounds test for cointegration within ardl or vecm
6. bounds test for cointegration within ardl or vecm 6. bounds test for cointegration within ardl or vecm
6. bounds test for cointegration within ardl or vecm Quang Hoang
 
PAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPierre Jacob
 
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...Joe Suzuki
 
Jackknife algorithm for the estimation of logistic regression parameters
Jackknife algorithm for the estimation of logistic regression parametersJackknife algorithm for the estimation of logistic regression parameters
Jackknife algorithm for the estimation of logistic regression parametersAlexander Decker
 
Dignostic Tests of Applied Economics
Dignostic  Tests of Applied EconomicsDignostic  Tests of Applied Economics
Dignostic Tests of Applied EconomicsSuniya Sheikh
 

Similar to Feature Extraction for Universal Hypothesis Testing via Rank-Constrained Optimizaiton (ISIT 2010) (13)

Anomaly Detection Using Projective Markov Models
Anomaly Detection Using Projective Markov ModelsAnomaly Detection Using Projective Markov Models
Anomaly Detection Using Projective Markov Models
 
73.1 s
73.1 s73.1 s
73.1 s
 
CMA-ES with local meta-models
CMA-ES with local meta-modelsCMA-ES with local meta-models
CMA-ES with local meta-models
 
Presentation cm2011
Presentation cm2011Presentation cm2011
Presentation cm2011
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
Lesson 4: Calculating Limits (Section 21 slides)
Lesson 4: Calculating Limits (Section 21 slides)Lesson 4: Calculating Limits (Section 21 slides)
Lesson 4: Calculating Limits (Section 21 slides)
 
Lesson 4: Calculating Limits (Section 21 slides)
Lesson 4: Calculating Limits (Section 21 slides)Lesson 4: Calculating Limits (Section 21 slides)
Lesson 4: Calculating Limits (Section 21 slides)
 
6. bounds test for cointegration within ardl or vecm
6. bounds test for cointegration within ardl or vecm 6. bounds test for cointegration within ardl or vecm
6. bounds test for cointegration within ardl or vecm
 
PAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ Warwick
 
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
 
Jackknife algorithm for the estimation of logistic regression parameters
Jackknife algorithm for the estimation of logistic regression parametersJackknife algorithm for the estimation of logistic regression parameters
Jackknife algorithm for the estimation of logistic regression parameters
 
Seminaire ihp
Seminaire ihpSeminaire ihp
Seminaire ihp
 
Dignostic Tests of Applied Economics
Dignostic  Tests of Applied EconomicsDignostic  Tests of Applied Economics
Dignostic Tests of Applied Economics
 

Recently uploaded

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

Recently uploaded (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Feature Extraction for Universal Hypothesis Testing via Rank-Constrained Optimizaiton (ISIT 2010)

  • 1. Feature Extraction for Universal Hypothesis Testing via Rank-Constrained Optimization Dayu Huang and Sean Meyn Department of Electrical and Computer Engineering and Coordinated Science Laboratory University of Illinois, Urbana-Champaign June 18, 2010 Huang and Meyn (UIUC) Feature Extraction June 2010 1 / 18
  • 2. Introduction Universal Hypothesis Testing n Sequence of observations: Z1 := (Z1 , . . . , Zn ). i.i.d. π 0 under H0, π 1 under H1 π 0 : known π 1 : not known Observation space Z is finite. Task: Design a test to decide in favor of H0 or H1. Huang and Meyn (UIUC) Feature Extraction June 2010 2 / 18
  • 3. Introduction Universal Hypothesis Testing n Sequence of observations: Z1 := (Z1 , . . . , Zn ). i.i.d. π 0 under H0, π 1 under H1 π 0 : known π 1 : not known Observation space Z is finite. Task: Design a test to decide in favor of H0 or H1. The Hoeffding test φH = 1{D(Γn π 0 ) ≥ η}, n Empirical distribution n 1 Γn (A) = 1{Zk ∈ A}, A ⊂ Z. n k=1 Huang and Meyn (UIUC) Feature Extraction June 2010 2 / 18
  • 4. The Hoeffding Test Theorem 1 1 The Hoeffding test achieves the optimal error exponents in Neyman-Pearson criterion. 1. Hoeffding 1963; Huang and Meyn (UIUC) Feature Extraction June 2010 3 / 18
  • 5. The Hoeffding Test Theorem 1 1 The Hoeffding test achieves the optimal error exponents in Neyman-Pearson criterion. 2 2 Theasymptotic variance of the Hoeffding test depends on the size of the observation space. When Z1 has marginal π 0 , we have n lim Var [nD(Γn π 0 )] = 2 (|Z| − 1). 1 n→∞ Large variance when |Z| large 1. Hoeffding 1963; 2. Unnikrishnan, Huang, Meyn, Surana & Veeravalli; Wilks 1938; Clarke & Barron 1990. Huang and Meyn (UIUC) Feature Extraction June 2010 3 / 18
  • 6. Performance of the Hoeffding Test 1 Probability of Detection 0.8 |Z|=19 0.6 Pr(φ = 1|H1) |Z|=39 0.4 0.2 Red: Better error exponent but larger variance 0 0 0.2 0.4 0.6 0.8 1 Probability of False Alarm Pr(φ = 1|H0). Huang and Meyn (UIUC) Feature Extraction June 2010 4 / 18
  • 7. Mismatched Universal Test Variational representation of KL divergence D(µ π) = sup µ, f − log( π, e f ) f µ, f = z µ(z)f (z) Mismatched divergence 1 DF (µ π) := sup µ, f − log( π, e f ) MM f ∈F Huang and Meyn (UIUC) Feature Extraction June 2010 5 / 18
  • 8. Mismatched Universal Test Variational representation of KL divergence D(µ π) = sup µ, f − log( π, e f ) f µ, f = z µ(z)f (z) Mismatched divergence 1 DF (µ π) := sup µ, f − log( π, e f ) MM f ∈F Mismatched universal test 2 φMM = 1{DF (Γn π 0 ) ≥ η} n MM 1. Abbe, M´dard, Meyn & Zheng 2007; 2. Unnikrishnan et al. e Huang and Meyn (UIUC) Feature Extraction June 2010 5 / 18
  • 9. Function Class and Performance Consider a linear function class: d F = fr := ri ψi i Choice of function class F determines performance: Mismatched divergence approximates KL divergence. Determines error exponent of the mismatched universal test. When d is smaller than |Z|, it is optimal for a restricted set of alternative distributions. 1. Unnikrishnan et al. Huang and Meyn (UIUC) Feature Extraction June 2010 6 / 18
  • 10. Function Class and Performance Consider a linear function class: d F = fr := ri ψi i Choice of function class F determines performance: Mismatched divergence approximates KL divergence. Determines error exponent of the mismatched universal test. When d is smaller than |Z|, it is optimal for a restricted set of alternative distributions. Dimension d determines asymptotic variance1 : Under H0, lim Var [nDF (Γn π 0 )] = 1 d MM 2 n→∞ Problem: How to choose function class F? 1. Unnikrishnan et al. Huang and Meyn (UIUC) Feature Extraction June 2010 6 / 18
  • 11. Our Contribution 1 Mismatched test even with a small dimension d is optimal for a large set of alternative distributions. 2 Framework to choose F for the mismatched test. Huang and Meyn (UIUC) Feature Extraction June 2010 7 / 18
  • 12. How powerful is mismatched test? Example 0.2 1 0.15 0.8 π0 0.1 0.05 0.6 0.4 0.2 0 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 1 10 distributions. 0.8 0.6 0.4 0.8 0.6 0.4 d =? 0.2 0 1 2 3 4 5 6 7 8 9 0.2 0 1 2 3 4 5 6 7 8 9 1 0.8 0.7 0.8 0.6 0.6 0.5 0.4 0.4 0.3 0.2 0.2 0.1 0 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 Huang and Meyn (UIUC) Feature Extraction June 2010 8 / 18
  • 13. When MM is optimal? When does DF (π 1 π 0 ) = D(π 1 π 0 )? MM Fact (1) When F includes LLR. Huang and Meyn (UIUC) Feature Extraction June 2010 9 / 18
  • 14. When MM is optimal? When does DF (π 1 π 0 ) = D(π 1 π 0 )? MM Fact (1) When F includes LLR. Exponential family E(F) = {µ : µ(z) ∝ (exp f (z)), f ∈ F}. Fact (2) When π 0 , π 1 are in the same exponential family. How many distributions in an d-dimensional exponential family? Huang and Meyn (UIUC) Feature Extraction June 2010 9 / 18
  • 15. -Extremal Distributions πθ (z) ∝ exp(θf (z)) ∈ E(F) θ→∞ Extremal distributions: πθ − − Distributions on the boundary of E(F). −→ Huang and Meyn (UIUC) Feature Extraction June 2010 10 / 18
  • 16. -Extremal Distributions πθ (z) ∝ exp(θf (z)) ∈ E(F) θ→∞ Extremal distributions: πθ − − Distributions on the boundary of E(F). −→ Example F = span(ψ): ψ = [5, −1, −1] i.e. ψ(z1 ) = −5, ψ(z2 ) = ψ(z3 ) = −1. What are the extremal distributions? [1, 0, 0] : f = [5, −1, −1] [0, 0.5, 0.5] : f = [−5, 1, 1] [1/3, 1/3, 1/3]: f = [0, 0, 0] Huang and Meyn (UIUC) Feature Extraction June 2010 10 / 18
  • 17. -Extremal Distributions πθ (z) ∝ exp(θf (z)) ∈ E(F) θ→∞ Extremal distributions: πθ − − Distributions on the boundary of E(F). −→ Example F = span(ψ): ψ = [5, −1, −1] i.e. ψ(z1 ) = −5, ψ(z2 ) = ψ(z3 ) = −1. What are the extremal distributions? [1, 0, 0] : f = [5, −1, −1] [0, 0.5, 0.5] : f = [−5, 1, 1] [1/3, 1/3, 1/3]: f = [0, 0, 0] F (π) := {z : π(z) ≥ maxz (π(z)) − } Definition • π is called -extremal if π(F (π)) ≥ 1 − . Example: [0.004, 0.499, 0.497]. Huang and Meyn (UIUC) Feature Extraction June 2010 10 / 18
  • 18. -Distinguishable Distributions Distinguishable D(π 1 π 0 ) = D(π 0 π 1 ) = ∞ ⇔ π 1 π 0 and π 0 π1. Example π 0 (z1 ) = 0.5, π 0 (z2 ) = 0.5, π 0 (z3 ) = 0 π 1 (z1 ) = 0, π 1 (z2 ) = 0.5, π 1 (z3 ) = 0.5 Huang and Meyn (UIUC) Feature Extraction June 2010 11 / 18
  • 19. -Distinguishable Distributions Distinguishable D(π 1 π 0 ) = D(π 0 π 1 ) = ∞ ⇔ π 1 π 0 and π 0 π1. Example π 0 (z1 ) = 0.5, π 0 (z2 ) = 0.5, π 0 (z3 ) = 0 π 1 (z1 ) = 0, π 1 (z2 ) = 0.5, π 1 (z3 ) = 0.5 Approximately distinguishable Example π 0 (z1 ) = 0.49999, π 0 (z2 ) = 0.49999, π 0 (z3 ) = 0.00002 π 1 (z1 ) = 0.00002, π 1 (z2 ) = 0.49999, π 1 (z3 ) = 0.49999 Huang and Meyn (UIUC) Feature Extraction June 2010 11 / 18
  • 20. -Distinguishable Distributions Distinguishable D(π 1 π 0 ) = D(π 0 π 1 ) = ∞ ⇔ π 1 π 0 and π 0 π1. Example π 0 (z1 ) = 0.5, π 0 (z2 ) = 0.5, π 0 (z3 ) = 0 π 1 (z1 ) = 0, π 1 (z2 ) = 0.5, π 1 (z3 ) = 0.5 Approximately distinguishable Example π 0 (z1 ) = 0.49999, π 0 (z2 ) = 0.49999, π 0 (z3 ) = 0.00002 π 1 (z1 ) = 0.00002, π 1 (z2 ) = 0.49999, π 1 (z3 ) = 0.49999 Definition π 1 , π 2 are -distinguishable if F (π 1 ) ⊆ F (π 1 ) and F (π 2 ) ⊆ F (π 1 ). Huang and Meyn (UIUC) Feature Extraction June 2010 11 / 18
  • 21. The Number of -Distinguishable -Extremal Distributions Definition N(E): The maximum N such that for any small > 0, there exist N distributions in E that are -extremal and pairwise -distinguishable. Huang and Meyn (UIUC) Feature Extraction June 2010 12 / 18
  • 22. The Number of -Distinguishable -Extremal Distributions Definition N(E): The maximum N such that for any small > 0, there exist N distributions in E that are -extremal and pairwise -distinguishable. Proposition Denote ¯ N(d) : max{N(E) : E is d-dimensional } It admits the following lower and upper bounds: ¯ d d N(d) ≥ exp [log(|Z|) − log − 1] 2 2 ¯ N(d) ≤ exp (d + 1)(1 + log(|Z|) − log(d + 1)) Many alternative distributions can be distinguished even with small dimension d Huang and Meyn (UIUC) Feature Extraction June 2010 12 / 18
  • 23. A Framework for Choosing Function Class Scenario: Alternative distributions are in a set S (not known to the algorithm). Observe p distributions from the set: π 1 , . . . , π p . Objective function to be maximized: 1 p i MM i maxF p i=1 γ DF (π π0) subject to dim(F) ≤ d Huang and Meyn (UIUC) Feature Extraction June 2010 13 / 18
  • 24. A Framework for Choosing Function Class Scenario: Alternative distributions are in a set S (not known to the algorithm). Observe p distributions from the set: π 1 , . . . , π p . Objective function to be maximized: 1 p i MM i maxF p i=1 γ DF (π π0) subject to dim(F) ≤ d Rank-constrained optimization: 1 p i maxX p i=1 γ π i , Xi − log( π 0 , e Xi subject to rank (X ) ≤ d µ, f = z µ(z)f (z) Huang and Meyn (UIUC) Feature Extraction June 2010 13 / 18
  • 25. Algorithm Iterative gradient projection: 1 Y k+1 = X k + αk h(X k ). 2 X k+1 = PS (Y k+1 ). PS (Y ) = arg min{ Y − X : rank (X ) ≤ d}. Provable local convergence. Huang and Meyn (UIUC) Feature Extraction June 2010 14 / 18
  • 26. Numerical Experiment Randomly from a set S of distributions. π0, π 1 , . . . , π p for feature extraction. π 1 for testing. Experiment steps: Feature extraction: Extract a d-dimensional function class F based on π 0 and π 1 , . . . , π p . Test: Alternative distribution is π 1 . Estimate probability of error by simulation. Huang and Meyn (UIUC) Feature Extraction June 2010 15 / 18
  • 27. Numerical Experiment S: 12-dimensional exponential family. |Z | = 20. n = 30. Pr(φ = 1|H1) Pr(φ = 1|H0). Huang and Meyn (UIUC) Feature Extraction June 2010 16 / 18
  • 28. Numerical Experiment S: 12-dimensional exponential family. |Z | = 20. n = 30. Pr(φ = 1|H1) Pr(φ = 1|H0). Huang and Meyn (UIUC) Feature Extraction June 2010 16 / 18
  • 29. Numerical Experiment S: 12-dimensional exponential family. |Z | = 20. n = 30. Pr(φ = 1|H1) Pr(φ = 1|H0). Huang and Meyn (UIUC) Feature Extraction June 2010 16 / 18
  • 30. Numerical Experiment S: 12-dimensional exponential family. |Z | = 20. n = 30. Pr(φ = 1|H1) Pr(φ = 1|H0). Huang and Meyn (UIUC) Feature Extraction June 2010 16 / 18
  • 31. Numerical Experiment S: 12-dimensional exponential family. |Z | = 20. n = 30. Pr(φ = 1|H1) Pr(φ = 1|H0). Huang and Meyn (UIUC) Feature Extraction June 2010 16 / 18
  • 32. Conclusion and Future Work Conclusions: Variance is as important as error exponent. Balance between variance and error-exponent. Feature extraction algorithm: Exploit prior information to optimize performance of mismatched test. Future Work: Bound probability of error based on finer statistics. Extend to processes with long memory. Other heuristics (such as nuclear-norm) for algorithm design. Huang and Meyn (UIUC) Feature Extraction June 2010 17 / 18