SlideShare a Scribd company logo
1 of 51
Download to read offline
Statistical Methods for Handling Incomplete Data
Chapter 2: Likelihood-based approach
Jae-Kwang Kim
Department of Statistics, Iowa State University
Section 1: Introduction
Basic Setup (No missing data)
• y = (y1, y2, · · · yn) is a realization of the random sample from an infinite population
with density f (y).
• Assume that the true density f (y) belongs to a parametric family of densities
P = {f (y; θ) ; θ ∈ Ω} indexed by θ ∈ Ω. That is, there exist θ0 ∈ Ω such that
f (y; θ0) = f (y) for all y.
Kim (ISU) Ch 2 2 / 50
Likelihood
Definitions for likelihood theory
• The likelihood function of θ is defined as
L(θ) = f (y; θ)
where f (y; θ) is the joint pdf of y.
• Let ˆθ be the maximum likelihood estimator (MLE) of θ0 if it satisfies
L(ˆθ) = max
θ∈Θ
L(θ).
Kim (ISU) Ch 2 3 / 50
Kullback-Leibler information
Shannon-Kolmogorov information inequality
Let f0(y) = f (y; θ0) and f1(y) = f (y; θ1). Kullback-Leibler information number defined
by
K(f0, f1) = Eθ0 ln
f0(Y )
f1(Y )
satisfies
K(f0, f1) ≥ 0,
with equality if and only if Pθ0 {f0(Y ) = f1(Y )} = 1.
Kim (ISU) Ch 2 4 / 50
Proof
Jensen’s inequality
If g(t) is a convex function, then for any random variable X, g{E(X)} ≤ E{g(X)}.
Furthermore, if g(t) is strictly convex, then E{g(X)} = g{E(X)} if and only if
P(X = c) = 1 for some constant c.
Since φ(x) = − ln(x) is a strictly convex function of x, we have, using Jensen’s
inequality,
K(f0, f1) = Eθ0 − ln
f1(Y )
f0(Y )
≥ − ln Eθ0
f1(Y )
f0(Y )
≥ 0
with equality if and only if Pθ0 {f0(Y ) = f1(Y )} = 1.
Kim (ISU) Ch 2 5 / 50
Identifiability
Definition
Let P = {Pθ; θ ∈ Ω} be a statistical model with parameter space Ω. Model P is
identifiable if the mapping θ → Pθ is one-to-one:
Pθ1 = Pθ2 ⇒ θ1 = θ2.
If the distributions are defined in terms of the probability density functions (pdfs), then
two pdfs should be considered distinct only if they differ on a set of non-zero measure
(for example two functions f1(x) = I(0 ≤ x < 1) and f2(x) = I(0 ≤ x ≤ 1) differ only at
a single point x = 1, a set of measure zero, and thus cannot be considered as distinct
pdfs).
Kim (ISU) Ch 2 6 / 50
Lemma 2.1 Properties of identifiable distribution
Lemma 2.1.
If P = {f (y; θ) ; θ ∈ Ω} is identifiable and E {|ln f (Y ; θ)|} < ∞ for all θ, then
M (θ) = Eθ0 ln
f (Y ; θ)
f (Y ; θ0)
has a unique maximum at θ0.
Kim (ISU) Ch 2 7 / 50
Proof
1 By Shannon-Kolmogorov information inequality, we can obtain that
M(θ) ≤ 0
for all θ ∈ Ω, with equality iff Pθ0 {f (Y ; θ) = f (Y ; θ0)} = 1.
2 Note that M(θ0) = 0, by definition of M(θ). Thus, M(θ) has a maximum at θ0.
3 To show uniqueness, note that if θ1 satisfies M(θ1) = 0, it should satisfy
Pθ0 {f (Y ; θ1) = f (Y ; θ0)} = 1,
which implies θ1 = θ0, by the identifiability assumption.
Kim (ISU) Ch 2 8 / 50
Remark
1 Lemma 2.1 simply says that, under identifiability, Q(θ) = Eθ0 {log f (Y ; θ)} takes
the (unique) maximum value at θ = θ0.
2 Define
Qn(θ) = n−1
n
i=1
log f (yi ; θ)
then the MLE ˆθ is the maximizer of Qn(θ). Since Qn(θ) converges in probability to
Q(θ), can we say that the maximizer of Qn(θ) converges to the maximizer of
Q(θ)?
Kim (ISU) Ch 2 9 / 50
Theorem 2.1: (Weak) consistency of MLE
Theorem 2.1
Let
Qn (θ) =
1
n
n
i=1
log f (yi , θ)
and ˆθ be any solution of
Qn(ˆθ) = max
θ∈Ω
Qn (θ) .
Assume the following two conditions:
1 Uniform weak convergence:
sup
θ∈Ω
|Qn (θ) − Q (θ)|
p
−→ 0
for some non-stochastic function Q (θ)
2 Identification: Q (θ) is uniquely maximized at θ0.
Then, ˆθ
p
−→ θ0.
Kim (ISU) Ch 2 10 / 50
Remark
1 Convergence in probability:
ˆθ
p
−→ θ0 ⇐⇒ P |ˆθ − θ0| > → 0 as n → ∞,
for any > 0.
2 If P = {f (y; θ) ; θ ∈ Ω} is not identifiable, then Q (θ) may not have a unique
maximum and ˆθ may not converge (in probability) to a single point.
3 If the true distribution f (y) does not belong to the class P = {f (y; θ) ; θ ∈ Ω},
which point does ˆθ converge to?
Kim (ISU) Ch 2 11 / 50
Other properties of MLE
• Asymptotic normality (Theorem 2.2)
• Asymptotic optimality: MLE achieves Cramer-Rao lower bound
• Wilks’ theorem:
2{ln(ˆθ) − ln(θ0)}
d
→ χ2
p
Kim (ISU) Ch 2 12 / 50
1 Introduction - Fisher information
Definition
1 Score function:
S(θ) =
∂ log L(θ)
∂θ
2 Fisher information = curvature of the log-likelihood:
I(θ) = −
∂2
∂θ∂θT
log L(θ) = −
∂
∂θT
S(θ)
3 Observed (Fisher) information: I(ˆθn) where ˆθn is the MLE.
4 Expected (Fisher) information: I(θ) = Eθ {I(θ)}
• The observed information is always positive. The observed information applies to a
single dataset.
• The expected information is meaningful as a function of θ across the admissible
values of θ. The expected information is an average quantity over all possible
datasets.
• I(ˆθ) = I(ˆθ) for exponential family.
Kim (ISU) Ch 2 13 / 50
Properties of score function
Theorem 2.3. Bartlett identities
Under the regularity conditions allowing for the exchange of the order of integration and
differentiation,
Eθ {S(θ)} = 0 and Vθ {S(θ)} = I(θ).
Kim (ISU) Ch 2 14 / 50
Proof
Kim (ISU) Ch 2 15 / 50
1 Introduction - Fisher information
Remark
• Since ˆθ
p
−→ θ0, we can apply a Taylor linearization on S(ˆθ) = 0 to get
ˆθ − θ0
∼= {I(θ0)}−1
S(θ0).
Here, we use the fact that I(θ) = −∂S(θ)/∂θT
converges in probability to I(θ).
• Thus, the (asymptotic) variance of MLE is
V (ˆθ)
.
= {I(θ0)}−1
V {S(θ0)} {I(θ0)}−1
= {I(θ0)}−1
,
where the last equality follows from Theorem 2.3.
Kim (ISU) Ch 2 16 / 50
Section 2: Observed likelihood
Basic Setup
1 Let y = (y1, y2, · · · yn) be a realization of the random sample from an infinite
population with density f (y; θ0).
2 Let δi be an indicator function defined by
δi =
1 if yi is observed
0 otherwise
and we assume that
Pr (δi = 1 | yi ) = π (yi , φ)
for some (unknown) parameter φ and π(·) is a known function.
3 Thus, instead of observing (δi , yi ) directly, we observe (δi , yi,obs )
yi,obs =
yi if δi = 1
∗ if δi = 0.
4 What is the (marginal) density of (yi,obs , δi )?
Kim (ISU) Ch 2 17 / 50
Motivation (Change of variable technique)
1 Suppose that z is a random variable with density f (z).
2 Instead of observing z directly, we observe only y = y (z), where the mapping
z → y (z) is known.
3 The density of y is
g (y) =
R(y)
f (z) dz
where R (y) = {z; y (z) = y}.
Kim (ISU) Ch 2 18 / 50
Derivation
1 The joint density for z = (y, δ) is
f (y, δ) = f1 (y) f2 (δ | y)
where f1 (y) is the density of y and f2 (δ | y) is the conditional density of δ
conditional on y and is given by f2 (δ | y) = {π(y)}δ
{1 − π(y)}1−δ
, where
π(y) = Pr (δ = 1 | y).
2 Instead of observing z = (y, δ) directly, we observe only (yobs, δ) where
yobs = yobs (y, δ) and the mapping (y, δ) → yobs is known.
3 The (marginal) density of (yobs, δ) is
g (yobs, δ) =
R(yobs,δ)
f (y, δ) dy
where R (yobs, δ) = {y; yobs (y, δ) = yobs}.
Kim (ISU) Ch 2 19 / 50
Likelihood under missing data (=Observed likelihood)
The observed likelihood is the likelihood obtained from the marginal density of
(yi,obs, δi ) , i = 1, 2, · · · , n, and can be written as, under the IID setup,
Lobs (θ, φ) =
δi =1
[f1 (yi ; θ) f2 (δi | yi ; φ)] ×
δi =0
f1 (yi ; θ) f2 (δi | yi ; φ) dyi
=
δi =1
[f1 (yi ; θ) π (yi ; φ)] ×
δi =0
f1 (y; θ) {1 − π (y; φ)} dy ,
where π (y; φ) = P (δ = 1 | y; φ).
Kim (ISU) Ch 2 20 / 50
Example 2.2 (Censored regression model, or Tobit model)
zi = xi β + i i ∼ N 0, σ2
yi =
zi if zi ≥ 0
0 if zi < 0.
The observed log-likelihood is
l β, σ2
= −
1
2 yi >0
ln 2π + ln σ2
+
(yi − xi β)
2
σ2
+
yi =0
ln 1 − Φ
xi β
σ
where Φ (x) is the cdf of the standard normal distribution.
Kim (ISU) Ch 2 21 / 50
Example 2.3
Let t1, t2, · · · , tn be an IID sample from a distribution with density
fθ (t) = θe−θt
I (t > 0). Instead of observing ti , we observe (yi , δi ) where
yi =
ti if δi = 1
c if δi = 0
and
δi =
1 if ti ≤ c
0 if ti > c,
where c is a known censoring time. The observed likelihood for θ can be derived as
Lobs (θ) =
n
i=1
{fθ(ti )}δi
{P (ti > c)}1−δi
= θ
n
i=1 δi
exp(−θ
n
i=1
yi ).
Kim (ISU) Ch 2 22 / 50
Multivariate extension
Basic Setup
• Let y = (y1, . . . , yp) be a p-dimensional random vector with probability density
function f (y; θ).
• Let y1, · · · , yn be n independent realizations y from f (y; θ). (IID sample)
• Let δij be the response indicator function of yij with δij =
1 if yij is observed
0 otherwise.
• δi = (δi1, · · · , δip): p-dimensional random vector with density P(δ | y) assuming
P(δ|y) = P(δ|y; φ) for some φ.
• Let (yi,obs , yi,mis ) be the observed part and missing part of yi , respectively.
Kim (ISU) Ch 2 23 / 50
Observed Likelihood for multivariate Y case
• Under IID setup: The observed likelihood is
Lobs (θ, φ) =
n
i=1
f (yi ; θ)P(δi |yi ; φ)dyi,mis ,
where it is understood that, if yi = yi,obs and yi,mis is empty then there is nothing
to integrate out.
• In the special case of scalar y, the observed likelihood is
Lobs (θ, φ) =
δi =1
[f (yi ; θ)π(yi ; φ)] ×
δi =0
f (y; θ) {1 − π(y; φ)} dy ,
where π(y; φ) = P(δ = 1|y; φ).
Kim (ISU) Ch 2 24 / 50
Missing At Random
Definition: Missing At Random (MAR)
P(δ|y) is the density of the conditional distribution of δ given y. Let yobs = yobs (y, δ)
where
yi,obs =
yi if δi = 1
∗ if δi = 0.
The response mechanism is MAR if P (δ|y1) = P (δ|y2) { or P(δ|y) = P(δ|yobs )} for all
y1 and y2 satisfying yobs (y1, δ) = yobs (y2, δ).
• MAR: the response mechanism P(δ|y) depends on y only through yobs .
• Let y = (yobs , ymis ). By Bayes theorem,
P (ymis |yobs , δ) =
P(δ|ymis , yobs )
P(δ|yobs )
P (ymis |yobs ) .
• MAR: P(ymis |yobs , δ) = P(ymis |yobs ). That is, ymis ⊥ δ | yobs .
• MAR: the conditional independence of δ and ymis given yobs .
Kim (ISU) Ch 2 25 / 50
Remark
• MCAR (Missing Completely at random): P(δ | y) does not depend on y.
• MAR (Missing at random): P(δ | y) = P(δ | yobs )
• NMAR (Not Missing at random): P(δ | y) = P(δ | yobs )
• Thus, MCAR is a special case of MAR.
Kim (ISU) Ch 2 26 / 50
Likelihood factorization theorem
Theorem 2.4 (Rubin, 1976)
Pφ(δ|y) is the joint density of δ given y and fθ(y) is the joint density of y. Under
conditions
1 the parameters θ and φ are distinct and
2 MAR condition holds,
the observed likelihood can be written as
Lobs (θ, φ) = L1(θ)L2(φ),
and the MLE of θ can be obtained by maximizing L1(θ).
Thus, we do not have to specify the model for response mechanism. The response
mechanism is called ignorable if the above likelihood factorization holds.
Kim (ISU) Ch 2 27 / 50
Example 2.4
• Bivariate data (xi , yi ) with pdf f (x, y) = f1(y | x)f2(x).
• xi is always observed and yi is subject to missingness.
• Assume that the response status variable δi of yi satisfies
P (δi = 1 | xi , yi ) = Λ1 (φ0 + φ1xi + φ2yi )
for some function Λ1 (·) of known form.
• Let θ be the parameter of interest in the regression model f1 (y | x; θ). Let α be
the parameter in the marginal distribution of x, denoted by f2 (xi ; α). Define
Λ0(x) = 1 − Λ1(x).
• Three parameters
• θ: parameter of interest
• α and φ: nuisance parameter
Kim (ISU) Ch 2 28 / 50
Example 2.4 (Cont’d)
• Observed likelihood
Lobs (θ, α, φ) =


δi =1
f1 (yi | xi ; θ) f2 (xi ; α) Λ1 (φ0 + φ1xi + φ2yi )


×


δi =0
f1 (y | xi ; θ) f2 (xi ; α) Λ0 (φ0 + φ1xi + φ2y) dy


= L1 (θ, φ) × L2 (α)
where L2 (α) = n
i=1 f2 (xi ; α) .
• Thus, we can safely ignore the marginal distribution of x if x is completely
observed.
Kim (ISU) Ch 2 29 / 50
Example 2.4 (Cont’d)
• If φ2 = 0, then MAR holds and
L1 (θ, φ) = L1a (θ) × L1b (φ)
where
L1a (θ) =
δi =1
f1 (yi | xi ; θ)
and
L1b (φ) =
δi =1
Λ1 (φ0 + φ1xi ) ×
δi =0
Λ0 (φ0 + φ1xi ) .
• Thus, under MAR, the MLE of θ can be obtained by maximizing L1a (θ), which is
obtained by ignoring the missing part of the data.
Kim (ISU) Ch 2 30 / 50
Example 2.4 (Cont’d)
• Instead of yi subject to missingness, if xi is subject to missingness, then the
observed likelihood becomes
Lobs (θ, φ, α) =


δi =1
f1 (yi | xi ; θ) f2 (xi ; α) Λ1 (φ0 + φ1xi + φ2yi )


×


δi =0
f1 (yi | x; θ) f2 (x; α) Λ0 (φ0 + φ1x + φ2yi ) dx


= L1 (θ, φ) × L2 (α) .
• If φ1 = 0 then
Lobs (θ, α, φ) = L1 (θ, α) × L2 (φ)
and MAR holds. Although we are not interested in the marginal distribution of x,
we still need to specify the model for the marginal distribution of x.
Kim (ISU) Ch 2 31 / 50
Section 3: Mean Score Approach
Motivation
• The observed likelihood is the marginal density of (yobs , δ).
• The observed likelihood is
Lobs (η) =
R(yobs ,δ)
f (y; θ)P(δ|y; φ)dµ(y) = f (y; θ)P(δ|y; φ)dµ(ymis )
where ymis is the missing part of y and η = (θ, φ).
• Observed score equation:
Sobs (η) ≡
∂
∂η
log Lobs (η) = 0
• Computing the observed score function can be computationally challenging
because the observed likelihood is an integral form.
Kim (ISU) Ch 2 32 / 50
3 Mean Score Approach
Theorem 2.5: Mean Score Theorem (Fisher, 1922)
Under some regularity conditions, the observed score function equals to the mean score
function. That is,
Sobs (η) = ¯S(η)
where
¯S(η) = E{Scom(η)|yobs , δ}
Scom(η) =
∂
∂η
log f (y, δ; η),
f (y, δ; η) = f (y; θ)P(δ|y; φ).
• The mean score function is computed by taking the conditional expectation of the
complete-sample score function given the observation.
• The mean score function is easier to compute than the observed score function.
Kim (ISU) Ch 2 33 / 50
3 Mean Score Approach
Proof of Theorem 2.5
Since Lobs (η) = f (y, δ; η) /f (y, δ | yobs, δ; η), we have
∂
∂η
ln Lobs (η) =
∂
∂η
ln f (y, δ; η) −
∂
∂η
ln f (y, δ | yobs, δ; η) ,
taking a conditional expectation of the above equation over the conditional distribution
of (y, δ) given (yobs, δ), we have
∂
∂η
ln Lobs (η) = E
∂
∂η
ln Lobs (η) | yobs, δ
= E {Scom (η) | yobs, δ} − E
∂
∂η
ln f (y, δ | yobs, δ; η) | yobs, δ .
Here, the first equality holds because Lobs (η) is a function of (yobs, δ) only. The last
term is equal to zero by Theorem 2.3, which states that the expected value of the score
function is zero and the reference distribution in this case is the conditional distribution
of (y, δ) given (yobs, δ).
Kim (ISU) Ch 2 34 / 50
Example 2.6
1 Suppose that the study variable y follows from a normal distribution with mean x β
and variance σ2
. The score equations for β and σ2
under complete response are
S1(β, σ2
) =
n
i=1
yi − xi β xi /σ2
= 0
and
S2(β, σ2
) = −n/(2σ2
) +
n
i=1
yi − xi β
2
/(2σ4
) = 0.
2 Assume that yi are observed only for the first r elements and the MAR assumption
holds. In this case, the mean score function reduces to
¯S1(β, σ2
) =
r
i=1
yi − xi β xi /σ2
and
¯S2(β, σ2
) = −n/(2σ2
) +
r
i=1
yi − xi β
2
/(2σ4
) + (n − r)/(2σ2
).
Kim (ISU) Ch 2 35 / 50
Example 2.6 (Cont’d)
3 The maximum likelihood estimator obtained by solving the mean score equations is
ˆβ =
r
i=1
xi xi
−1 r
i=1
xi yi
and
ˆσ2
=
1
r
r
i=1
yi − xi
ˆβ
2
.
Thus, the resulting estimators can be also obtained by simply ignoring the missing
part of the sample, which is consistent with the result in Example 2.4 (for φ2 = 0).
Kim (ISU) Ch 2 36 / 50
Discussion of Example 2.6
• We are interested in estimating θ for the conditional density f (y | x; θ).
• Under MAR, the observed likelihood for θ is
Lobs (θ) =
r
i=1
f (yi | xi ; θ) ×
n
i=r+1
f (y | xi ; θ)dy =
r
i=1
f (yi | xi ; θ).
• The same conclusion can follow from the mean score theorem. Under MAR, the
mean score function is
¯S(θ) =
r
i=1
S(θ; xi , yi ) +
n
i=r+1
E{S(θ; xi , Y ) | xi }
=
r
i=1
S(θ; xi , yi )
where S(θ; x, y) is the score function for θ and the second equality follows from
Theorem 2.3.
Kim (ISU) Ch 2 37 / 50
Example 2.5
1 Suppose that the study variable y is randomly distributed with Bernoulli
distribution with probability of success pi , where
pi = pi (β) =
exp (xi β)
1 + exp (xi β)
for some unknown parameter β and xi is a vector of the covariates in the logistic
regression model for yi . We assume that 1 is in the column space of xi .
2 Under complete response, the score function for β is
S1(β) =
n
i=1
(yi − pi (β)) xi .
Kim (ISU) Ch 2 38 / 50
Example 2.5 (Cont’d)
3 Let δi be the response indicator function for yi with distribution Bernoulli(πi )
where
πi =
exp (xi φ0 + yi φ1)
1 + exp (xi φ0 + yi φ1)
.
We assume that xi is always observed, but yi is missing if δi = 0.
4 Under missing data, the mean score function for β is
¯S1 (β, φ) =
δi =1
{yi − pi (β)} xi +
δi =0
1
y=0
wi (y; β, φ) {y − pi (β)} xi ,
where wi (y; β, φ) is the conditional probability of yi = y given xi and δi = 0:
wi (y; β, φ) =
Pβ (yi = y | xi ) Pφ (δi = 0 | yi = y, xi )
1
z=0 Pβ (yi = z | xi ) Pφ (δi = 0 | yi = z, xi )
Thus, ¯S1 (β, φ) is also a function of φ.
Kim (ISU) Ch 2 39 / 50
Example 2.5 (Cont’d)
5 If the response mechanism is MAR so that φ1 = 0, then
wi (y; β, φ) =
Pβ (yi = y | xi )
1
z=0 Pβ (yi = z | xi )
= Pβ (yi = y | xi )
and so
¯S1 (β, φ) =
δi =1
{yi − pi (β)} xi = ¯S1 (β) .
6 If MAR does not hold, then (ˆβ, ˆφ) can be obtained by solving ¯S1 (β, φ) = 0 and
¯S2(β, φ) = 0 jointly, where
¯S2 (β, φ) =
δi =1
{δi − π (φ; xi , yi )} (xi , yi )
+
δi =0
1
y=0
wi (y; β, φ) {δi − πi (φ; xi , y)} (xi , y) .
Kim (ISU) Ch 2 40 / 50
Section 4: Observed information
Definition
1 Observed score function: Sobs (η) = ∂
∂η
log Lobs (η)
2 Fisher information from observed likelihood: Iobs (η) = − ∂2
∂η∂ηT log Lobs (η)
3 Expected (Fisher) information from observed likelihood: Iobs (η) = Eη {Iobs (η)}.
Theorem 2.6
Under regularity conditions,
E{Sobs (η)} = 0, and V{Sobs (η)} = Iobs (η),
where Iobs (η) = Eη{Iobs (η)} is the expected information from the observed likelihood.
Kim (ISU) Ch 2 41 / 50
4 Observed information
• Under missing data, the MLE ˆη is the solution to ¯S(η) = 0.
• Under some regularity conditions, ˆη converges in probability to η0 and has the
asymptotic variance {Iobs (η0)}−1
with
Iobs (η) = E −
∂
∂ηT
Sobs (η) = E S⊗2
obs (η) = E ¯S⊗2
(η) & B⊗2
= BBT
.
• For variance estimation of ˆη, may use {Iobs (ˆη)}−1
.
• In general, Iobs (ˆη) is preferred to Iobs (ˆη) for variance estimation of ˆη.
Kim (ISU) Ch 2 42 / 50
Return to Example 2.3
• Observed log-likelihood
ln Lobs (θ) =
n
i=1
δi log(θ) − θ
n
i=1
yi
• MLE for θ:
ˆθ =
n
i=1 yi
n
i=1 δi
• Fisher information: Iobs (θ) = n
i=1 δi /θ2
• Expected information: Iobs (θ) = n
i=1(1 − e−θc
)/θ2
= n(1 − e−θc
)/θ2
.
Which one do you prefer ?
Kim (ISU) Ch 2 43 / 50
4 Observed information
Motivation
• Lcom(η) = f (y, δ; η): complete-sample likelihood with no missing data
• Fisher information associated with Lcom(η):
Icom(η) = −
∂
∂ηT
Scom(η) = −
∂2
∂η∂ηT
log Lcom(η)
• Lobs (η): the observed likelihood
• Fisher information associated with Lobs (η):
Iobs (η) = −
∂
∂ηT
Sobs (η) = −
∂2
∂η∂ηT
log Lobs (η)
• How to express Iobs (η) in terms of Icom(η) and Scom(η) ?
Kim (ISU) Ch 2 44 / 50
Louis’ formula
Theorem 2.7 (Louis, 1982; Oakes, 1999)
Under regularity conditions allowing the exchange of the order of integration and
differentiation,
Iobs (η) = E{Icom(η)|yobs , δ} − E{S⊗2
com(η)|yobs , δ} − ¯S(η)⊗2
= E{Icom(η)|yobs , δ} − V {Scom(η)|yobs , δ},
where ¯S(η) = E{Scom(η)|yobs , δ}.
Kim (ISU) Ch 2 45 / 50
Proof of Theorem 2.7
By Theorem 2.5, the observed information associated with Lobs(η) can be expressed as
Iobs (η) = −
∂
∂η
¯S (η)
where ¯S (η) = E {Scom(η) | yobs, δ; η}. Thus, we have
∂
∂η
¯S (η) =
∂
∂η
Scom(η; y)f (y, δ | yobs, δ; η) dµ(y)
=
∂
∂η
Scom(η; y) f (y, δ | yobs, δ; η) dµ(y)
+ Scom(η; y)
∂
∂η
f (y, δ | yobs, δ; η) dµ(y)
= E ∂Scom(η)/∂η | yobs, δ
+ Scom(η; y)
∂
∂η
log f (y, δ | yobs, δ; η) f (y, δ | yobs, δ; η) dµ(y).
The first term is equal to −E {Icom(η) | yobs, δ} and the second term is equal to
E Scom(η)Smis(η) | yobs, δ = E ¯S(η) + Smis(η) Smis(η) | yobs, δ
= E Smis(η)Smis(η) | yobs, δ
because E ¯S(η)Smis(η) | yobs, δ = 0.
Kim (ISU) Ch 2 46 / 50
Missing Information Principle
• Smis (η): the score function with the conditional density f (y, δ|yobs , δ).
• Expected missing information: Imis (η) = E − ∂
∂ηT Smis (η) satisfying
Imis (η) = E Smis (η)⊗2
.
• Missing information principle (Orchard and Woodbury, 1972):
Imis (η) = Icom(η) − Iobs (η),
where Icom(η) = E −∂Scom(η)/∂ηT
is the expected information with
complete-sample likelihood .
• An alternative expression of the missing information principle is
V{Smis (η)} = V{Scom(η)} − V{ ¯S(η)}.
Note that V{Scom(η)} = Icom(η) and V{Sobs (η)} = Iobs (η).
Kim (ISU) Ch 2 47 / 50
Example 2.7
1 Consider the following bivariate normal distribution:
y1i
y2i
∼ N
µ1
µ2
,
σ11 σ12
σ12 σ22
,
for i = 1, 2, · · · , n. Assume for simplicity that σ11, σ12 and σ22 are known
constants and µ = (µ1, µ2) be the parameter of interest.
2 The complete sample score function for µ is
Scom (µ) =
n
i=1
S(i)
com (µ) =
n
i=1
σ11 σ12
σ12 σ22
−1
y1i − µ1
y2i − µ2
.
The information matrix of µ based on the complete sample is
Icom (µ) = n
σ11 σ12
σ12 σ22
−1
.
Kim (ISU) Ch 2 48 / 50
Example 2.7 (Cont’d)
3 Suppose that there are some missing values in y1i and y2i and the original sample
is partitioned into four sets:
H = both y1 and y2 respond
K = only y1 is observed
L = only y2 is observed
M = both y1 and y2 are missing.
Let nH , nK , nL, nM represent the size of H, K, L, M, respectively.
4 Assume that the response mechanism does not depend on the value of (y1, y2) and
so it is MAR. In this case, the observed score function of µ based on a single
observation in set K is
E S(i)
com (µ) | y1i , i ∈ K =
σ11 σ12
σ12 σ22
−1
y1i − µ1
E(y2i | y1i ) − µ2
=
σ−1
11 (y1i − µ1)
0
.
Kim (ISU) Ch 2 49 / 50
Example 2.7 (Cont’d)
5 Similarly, we have
E S(i)
com (µ) | y2i , i ∈ L =
0
σ−1
22 (y2i − µ2)
.
6 Therefore, the observed information matrix of µ is
Iobs (µ) = nH
σ11 σ12
σ12 σ22
−1
+ nK
σ−1
11 0
0 0
+ nL
0 0
0 σ−1
22
and the asymptotic variance of the MLE of µ can be obtained by the inverse of
Iobs (µ).
Kim (ISU) Ch 2 50 / 50
REFERENCES
Fisher, R. A. (1922), ‘On the mathematical foundations of theoretical statistics’,
Philosophical Transactions of the Royal Society of London A 222, 309–368.
Louis, T. A. (1982), ‘Finding the observed information matrix when using the EM
algorithm’, Journal of the Royal Statistical Society: Series B 44, 226–233.
Oakes, D. (1999), ‘Direct calculation of the information matrix via the em algorithm’,
Journal of the Royal Statistical Society: Series B 61, 479–482.
Orchard, T. and M.A. Woodbury (1972), A missing information principle: theory and
applications, in ‘Proceedings of the 6th Berkeley Symposium on Mathematical
Statistics and Probability’, Vol. 1, University of California Press, Berkeley, California,
pp. 697–715.
Rubin, D. B. (1976), ‘Inference and missing data’, Biometrika 63, 581–592.
Kim (ISU) Ch 2 50 / 50

More Related Content

What's hot

Image Processing
Image ProcessingImage Processing
Image Processingtijeel
 
Image Processing: Spatial filters
Image Processing: Spatial filtersImage Processing: Spatial filters
Image Processing: Spatial filtersA B Shinde
 
RF transmitter & receiver
RF transmitter & receiverRF transmitter & receiver
RF transmitter & receivercodexdesign
 
Image Registration (Digital Image Processing)
Image Registration (Digital Image Processing)Image Registration (Digital Image Processing)
Image Registration (Digital Image Processing)VARUN KUMAR
 
Photodetector (Photodiode)
Photodetector (Photodiode)Photodetector (Photodiode)
Photodetector (Photodiode)Kalyan Acharjya
 
Lecture6 Signal and Systems
Lecture6 Signal and SystemsLecture6 Signal and Systems
Lecture6 Signal and Systemsbabak danyal
 
Frequency Domain FIltering.pdf
Frequency Domain FIltering.pdfFrequency Domain FIltering.pdf
Frequency Domain FIltering.pdfMuhammad_Ilham_21
 
application of fibre optics in communication
application of fibre optics in communicationapplication of fibre optics in communication
application of fibre optics in communicationRimmi07
 
Semiconductor Optical Amplifier
Semiconductor Optical AmplifierSemiconductor Optical Amplifier
Semiconductor Optical AmplifierNikhila Nazarudeen
 
Image Restoration And Reconstruction
Image Restoration And ReconstructionImage Restoration And Reconstruction
Image Restoration And ReconstructionAmnaakhaan
 
The relationship between bandwidth and rise time
The relationship between bandwidth and rise timeThe relationship between bandwidth and rise time
The relationship between bandwidth and rise timePei-Che Chang
 
Optical modulator (8,12,17,29)
Optical modulator (8,12,17,29)Optical modulator (8,12,17,29)
Optical modulator (8,12,17,29)boneychatterjee
 
HFSS 3D Layout Phi vs HFSS CAD Classic
HFSS 3D Layout Phi vs HFSS CAD ClassicHFSS 3D Layout Phi vs HFSS CAD Classic
HFSS 3D Layout Phi vs HFSS CAD ClassicAnsys
 
Lecture 1 for Digital Image Processing (2nd Edition)
Lecture 1 for Digital Image Processing (2nd Edition)Lecture 1 for Digital Image Processing (2nd Edition)
Lecture 1 for Digital Image Processing (2nd Edition)Moe Moe Myint
 
Microwave Photonics
Microwave PhotonicsMicrowave Photonics
Microwave PhotonicsRomil Shah
 

What's hot (20)

Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016
 
Image Processing
Image ProcessingImage Processing
Image Processing
 
Image Processing: Spatial filters
Image Processing: Spatial filtersImage Processing: Spatial filters
Image Processing: Spatial filters
 
RF transmitter & receiver
RF transmitter & receiverRF transmitter & receiver
RF transmitter & receiver
 
Image Registration (Digital Image Processing)
Image Registration (Digital Image Processing)Image Registration (Digital Image Processing)
Image Registration (Digital Image Processing)
 
Photodetector (Photodiode)
Photodetector (Photodiode)Photodetector (Photodiode)
Photodetector (Photodiode)
 
Lecture6 Signal and Systems
Lecture6 Signal and SystemsLecture6 Signal and Systems
Lecture6 Signal and Systems
 
Frequency Domain FIltering.pdf
Frequency Domain FIltering.pdfFrequency Domain FIltering.pdf
Frequency Domain FIltering.pdf
 
application of fibre optics in communication
application of fibre optics in communicationapplication of fibre optics in communication
application of fibre optics in communication
 
Frequency deviation
Frequency deviation Frequency deviation
Frequency deviation
 
Lecture5c
Lecture5cLecture5c
Lecture5c
 
Semiconductor Optical Amplifier
Semiconductor Optical AmplifierSemiconductor Optical Amplifier
Semiconductor Optical Amplifier
 
Image Restoration And Reconstruction
Image Restoration And ReconstructionImage Restoration And Reconstruction
Image Restoration And Reconstruction
 
The relationship between bandwidth and rise time
The relationship between bandwidth and rise timeThe relationship between bandwidth and rise time
The relationship between bandwidth and rise time
 
Optical modulator (8,12,17,29)
Optical modulator (8,12,17,29)Optical modulator (8,12,17,29)
Optical modulator (8,12,17,29)
 
HFSS 3D Layout Phi vs HFSS CAD Classic
HFSS 3D Layout Phi vs HFSS CAD ClassicHFSS 3D Layout Phi vs HFSS CAD Classic
HFSS 3D Layout Phi vs HFSS CAD Classic
 
Lecture 1 for Digital Image Processing (2nd Edition)
Lecture 1 for Digital Image Processing (2nd Edition)Lecture 1 for Digital Image Processing (2nd Edition)
Lecture 1 for Digital Image Processing (2nd Edition)
 
Microwave Photonics
Microwave PhotonicsMicrowave Photonics
Microwave Photonics
 
Wiener Filter
Wiener FilterWiener Filter
Wiener Filter
 
Sharpening spatial filters
Sharpening spatial filtersSharpening spatial filters
Sharpening spatial filters
 

Similar to Chapter2: Likelihood-based approach

The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon informationFrank Nielsen
 
Fixed point theorem of discontinuity and weak compatibility in non complete n...
Fixed point theorem of discontinuity and weak compatibility in non complete n...Fixed point theorem of discontinuity and weak compatibility in non complete n...
Fixed point theorem of discontinuity and weak compatibility in non complete n...Alexander Decker
 
11.fixed point theorem of discontinuity and weak compatibility in non complet...
11.fixed point theorem of discontinuity and weak compatibility in non complet...11.fixed point theorem of discontinuity and weak compatibility in non complet...
11.fixed point theorem of discontinuity and weak compatibility in non complet...Alexander Decker
 
Machine learning (9)
Machine learning (9)Machine learning (9)
Machine learning (9)NYversity
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
Fisher_info_ppt and mathematical process to find time domain and frequency do...
Fisher_info_ppt and mathematical process to find time domain and frequency do...Fisher_info_ppt and mathematical process to find time domain and frequency do...
Fisher_info_ppt and mathematical process to find time domain and frequency do...praveenyadav2020
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...
ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...
ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...UniversitasGadjahMada
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
Introduction to inverse problems
Introduction to inverse problemsIntroduction to inverse problems
Introduction to inverse problemsDelta Pi Systems
 
On Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsOn Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsVjekoslavKovac1
 
cmftJYeZhuanTalk.pdf
cmftJYeZhuanTalk.pdfcmftJYeZhuanTalk.pdf
cmftJYeZhuanTalk.pdfjyjyzr69t7
 

Similar to Chapter2: Likelihood-based approach (20)

CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
 
The dual geometry of Shannon information
The dual geometry of Shannon informationThe dual geometry of Shannon information
The dual geometry of Shannon information
 
Fixed point theorem of discontinuity and weak compatibility in non complete n...
Fixed point theorem of discontinuity and weak compatibility in non complete n...Fixed point theorem of discontinuity and weak compatibility in non complete n...
Fixed point theorem of discontinuity and weak compatibility in non complete n...
 
11.fixed point theorem of discontinuity and weak compatibility in non complet...
11.fixed point theorem of discontinuity and weak compatibility in non complet...11.fixed point theorem of discontinuity and weak compatibility in non complet...
11.fixed point theorem of discontinuity and weak compatibility in non complet...
 
Imc2017 day2-solutions
Imc2017 day2-solutionsImc2017 day2-solutions
Imc2017 day2-solutions
 
Holographic Cotton Tensor
Holographic Cotton TensorHolographic Cotton Tensor
Holographic Cotton Tensor
 
Slides erasmus
Slides erasmusSlides erasmus
Slides erasmus
 
Machine learning (9)
Machine learning (9)Machine learning (9)
Machine learning (9)
 
MNAR
MNARMNAR
MNAR
 
Deep Learning Opening Workshop - Statistical and Computational Guarantees of ...
Deep Learning Opening Workshop - Statistical and Computational Guarantees of ...Deep Learning Opening Workshop - Statistical and Computational Guarantees of ...
Deep Learning Opening Workshop - Statistical and Computational Guarantees of ...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Fisher_info_ppt and mathematical process to find time domain and frequency do...
Fisher_info_ppt and mathematical process to find time domain and frequency do...Fisher_info_ppt and mathematical process to find time domain and frequency do...
Fisher_info_ppt and mathematical process to find time domain and frequency do...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...
ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...
ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Introduction to inverse problems
Introduction to inverse problemsIntroduction to inverse problems
Introduction to inverse problems
 
Ichimura 1993: Semiparametric Least Squares (non-technical)
Ichimura 1993: Semiparametric Least Squares (non-technical)Ichimura 1993: Semiparametric Least Squares (non-technical)
Ichimura 1993: Semiparametric Least Squares (non-technical)
 
Nokton theory-en
Nokton theory-enNokton theory-en
Nokton theory-en
 
On Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsOn Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular Integrals
 
cmftJYeZhuanTalk.pdf
cmftJYeZhuanTalk.pdfcmftJYeZhuanTalk.pdf
cmftJYeZhuanTalk.pdf
 

Recently uploaded

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsNurulAfiqah307317
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 

Recently uploaded (20)

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 

Chapter2: Likelihood-based approach

  • 1. Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University
  • 2. Section 1: Introduction Basic Setup (No missing data) • y = (y1, y2, · · · yn) is a realization of the random sample from an infinite population with density f (y). • Assume that the true density f (y) belongs to a parametric family of densities P = {f (y; θ) ; θ ∈ Ω} indexed by θ ∈ Ω. That is, there exist θ0 ∈ Ω such that f (y; θ0) = f (y) for all y. Kim (ISU) Ch 2 2 / 50
  • 3. Likelihood Definitions for likelihood theory • The likelihood function of θ is defined as L(θ) = f (y; θ) where f (y; θ) is the joint pdf of y. • Let ˆθ be the maximum likelihood estimator (MLE) of θ0 if it satisfies L(ˆθ) = max θ∈Θ L(θ). Kim (ISU) Ch 2 3 / 50
  • 4. Kullback-Leibler information Shannon-Kolmogorov information inequality Let f0(y) = f (y; θ0) and f1(y) = f (y; θ1). Kullback-Leibler information number defined by K(f0, f1) = Eθ0 ln f0(Y ) f1(Y ) satisfies K(f0, f1) ≥ 0, with equality if and only if Pθ0 {f0(Y ) = f1(Y )} = 1. Kim (ISU) Ch 2 4 / 50
  • 5. Proof Jensen’s inequality If g(t) is a convex function, then for any random variable X, g{E(X)} ≤ E{g(X)}. Furthermore, if g(t) is strictly convex, then E{g(X)} = g{E(X)} if and only if P(X = c) = 1 for some constant c. Since φ(x) = − ln(x) is a strictly convex function of x, we have, using Jensen’s inequality, K(f0, f1) = Eθ0 − ln f1(Y ) f0(Y ) ≥ − ln Eθ0 f1(Y ) f0(Y ) ≥ 0 with equality if and only if Pθ0 {f0(Y ) = f1(Y )} = 1. Kim (ISU) Ch 2 5 / 50
  • 6. Identifiability Definition Let P = {Pθ; θ ∈ Ω} be a statistical model with parameter space Ω. Model P is identifiable if the mapping θ → Pθ is one-to-one: Pθ1 = Pθ2 ⇒ θ1 = θ2. If the distributions are defined in terms of the probability density functions (pdfs), then two pdfs should be considered distinct only if they differ on a set of non-zero measure (for example two functions f1(x) = I(0 ≤ x < 1) and f2(x) = I(0 ≤ x ≤ 1) differ only at a single point x = 1, a set of measure zero, and thus cannot be considered as distinct pdfs). Kim (ISU) Ch 2 6 / 50
  • 7. Lemma 2.1 Properties of identifiable distribution Lemma 2.1. If P = {f (y; θ) ; θ ∈ Ω} is identifiable and E {|ln f (Y ; θ)|} < ∞ for all θ, then M (θ) = Eθ0 ln f (Y ; θ) f (Y ; θ0) has a unique maximum at θ0. Kim (ISU) Ch 2 7 / 50
  • 8. Proof 1 By Shannon-Kolmogorov information inequality, we can obtain that M(θ) ≤ 0 for all θ ∈ Ω, with equality iff Pθ0 {f (Y ; θ) = f (Y ; θ0)} = 1. 2 Note that M(θ0) = 0, by definition of M(θ). Thus, M(θ) has a maximum at θ0. 3 To show uniqueness, note that if θ1 satisfies M(θ1) = 0, it should satisfy Pθ0 {f (Y ; θ1) = f (Y ; θ0)} = 1, which implies θ1 = θ0, by the identifiability assumption. Kim (ISU) Ch 2 8 / 50
  • 9. Remark 1 Lemma 2.1 simply says that, under identifiability, Q(θ) = Eθ0 {log f (Y ; θ)} takes the (unique) maximum value at θ = θ0. 2 Define Qn(θ) = n−1 n i=1 log f (yi ; θ) then the MLE ˆθ is the maximizer of Qn(θ). Since Qn(θ) converges in probability to Q(θ), can we say that the maximizer of Qn(θ) converges to the maximizer of Q(θ)? Kim (ISU) Ch 2 9 / 50
  • 10. Theorem 2.1: (Weak) consistency of MLE Theorem 2.1 Let Qn (θ) = 1 n n i=1 log f (yi , θ) and ˆθ be any solution of Qn(ˆθ) = max θ∈Ω Qn (θ) . Assume the following two conditions: 1 Uniform weak convergence: sup θ∈Ω |Qn (θ) − Q (θ)| p −→ 0 for some non-stochastic function Q (θ) 2 Identification: Q (θ) is uniquely maximized at θ0. Then, ˆθ p −→ θ0. Kim (ISU) Ch 2 10 / 50
  • 11. Remark 1 Convergence in probability: ˆθ p −→ θ0 ⇐⇒ P |ˆθ − θ0| > → 0 as n → ∞, for any > 0. 2 If P = {f (y; θ) ; θ ∈ Ω} is not identifiable, then Q (θ) may not have a unique maximum and ˆθ may not converge (in probability) to a single point. 3 If the true distribution f (y) does not belong to the class P = {f (y; θ) ; θ ∈ Ω}, which point does ˆθ converge to? Kim (ISU) Ch 2 11 / 50
  • 12. Other properties of MLE • Asymptotic normality (Theorem 2.2) • Asymptotic optimality: MLE achieves Cramer-Rao lower bound • Wilks’ theorem: 2{ln(ˆθ) − ln(θ0)} d → χ2 p Kim (ISU) Ch 2 12 / 50
  • 13. 1 Introduction - Fisher information Definition 1 Score function: S(θ) = ∂ log L(θ) ∂θ 2 Fisher information = curvature of the log-likelihood: I(θ) = − ∂2 ∂θ∂θT log L(θ) = − ∂ ∂θT S(θ) 3 Observed (Fisher) information: I(ˆθn) where ˆθn is the MLE. 4 Expected (Fisher) information: I(θ) = Eθ {I(θ)} • The observed information is always positive. The observed information applies to a single dataset. • The expected information is meaningful as a function of θ across the admissible values of θ. The expected information is an average quantity over all possible datasets. • I(ˆθ) = I(ˆθ) for exponential family. Kim (ISU) Ch 2 13 / 50
  • 14. Properties of score function Theorem 2.3. Bartlett identities Under the regularity conditions allowing for the exchange of the order of integration and differentiation, Eθ {S(θ)} = 0 and Vθ {S(θ)} = I(θ). Kim (ISU) Ch 2 14 / 50
  • 15. Proof Kim (ISU) Ch 2 15 / 50
  • 16. 1 Introduction - Fisher information Remark • Since ˆθ p −→ θ0, we can apply a Taylor linearization on S(ˆθ) = 0 to get ˆθ − θ0 ∼= {I(θ0)}−1 S(θ0). Here, we use the fact that I(θ) = −∂S(θ)/∂θT converges in probability to I(θ). • Thus, the (asymptotic) variance of MLE is V (ˆθ) . = {I(θ0)}−1 V {S(θ0)} {I(θ0)}−1 = {I(θ0)}−1 , where the last equality follows from Theorem 2.3. Kim (ISU) Ch 2 16 / 50
  • 17. Section 2: Observed likelihood Basic Setup 1 Let y = (y1, y2, · · · yn) be a realization of the random sample from an infinite population with density f (y; θ0). 2 Let δi be an indicator function defined by δi = 1 if yi is observed 0 otherwise and we assume that Pr (δi = 1 | yi ) = π (yi , φ) for some (unknown) parameter φ and π(·) is a known function. 3 Thus, instead of observing (δi , yi ) directly, we observe (δi , yi,obs ) yi,obs = yi if δi = 1 ∗ if δi = 0. 4 What is the (marginal) density of (yi,obs , δi )? Kim (ISU) Ch 2 17 / 50
  • 18. Motivation (Change of variable technique) 1 Suppose that z is a random variable with density f (z). 2 Instead of observing z directly, we observe only y = y (z), where the mapping z → y (z) is known. 3 The density of y is g (y) = R(y) f (z) dz where R (y) = {z; y (z) = y}. Kim (ISU) Ch 2 18 / 50
  • 19. Derivation 1 The joint density for z = (y, δ) is f (y, δ) = f1 (y) f2 (δ | y) where f1 (y) is the density of y and f2 (δ | y) is the conditional density of δ conditional on y and is given by f2 (δ | y) = {π(y)}δ {1 − π(y)}1−δ , where π(y) = Pr (δ = 1 | y). 2 Instead of observing z = (y, δ) directly, we observe only (yobs, δ) where yobs = yobs (y, δ) and the mapping (y, δ) → yobs is known. 3 The (marginal) density of (yobs, δ) is g (yobs, δ) = R(yobs,δ) f (y, δ) dy where R (yobs, δ) = {y; yobs (y, δ) = yobs}. Kim (ISU) Ch 2 19 / 50
  • 20. Likelihood under missing data (=Observed likelihood) The observed likelihood is the likelihood obtained from the marginal density of (yi,obs, δi ) , i = 1, 2, · · · , n, and can be written as, under the IID setup, Lobs (θ, φ) = δi =1 [f1 (yi ; θ) f2 (δi | yi ; φ)] × δi =0 f1 (yi ; θ) f2 (δi | yi ; φ) dyi = δi =1 [f1 (yi ; θ) π (yi ; φ)] × δi =0 f1 (y; θ) {1 − π (y; φ)} dy , where π (y; φ) = P (δ = 1 | y; φ). Kim (ISU) Ch 2 20 / 50
  • 21. Example 2.2 (Censored regression model, or Tobit model) zi = xi β + i i ∼ N 0, σ2 yi = zi if zi ≥ 0 0 if zi < 0. The observed log-likelihood is l β, σ2 = − 1 2 yi >0 ln 2π + ln σ2 + (yi − xi β) 2 σ2 + yi =0 ln 1 − Φ xi β σ where Φ (x) is the cdf of the standard normal distribution. Kim (ISU) Ch 2 21 / 50
  • 22. Example 2.3 Let t1, t2, · · · , tn be an IID sample from a distribution with density fθ (t) = θe−θt I (t > 0). Instead of observing ti , we observe (yi , δi ) where yi = ti if δi = 1 c if δi = 0 and δi = 1 if ti ≤ c 0 if ti > c, where c is a known censoring time. The observed likelihood for θ can be derived as Lobs (θ) = n i=1 {fθ(ti )}δi {P (ti > c)}1−δi = θ n i=1 δi exp(−θ n i=1 yi ). Kim (ISU) Ch 2 22 / 50
  • 23. Multivariate extension Basic Setup • Let y = (y1, . . . , yp) be a p-dimensional random vector with probability density function f (y; θ). • Let y1, · · · , yn be n independent realizations y from f (y; θ). (IID sample) • Let δij be the response indicator function of yij with δij = 1 if yij is observed 0 otherwise. • δi = (δi1, · · · , δip): p-dimensional random vector with density P(δ | y) assuming P(δ|y) = P(δ|y; φ) for some φ. • Let (yi,obs , yi,mis ) be the observed part and missing part of yi , respectively. Kim (ISU) Ch 2 23 / 50
  • 24. Observed Likelihood for multivariate Y case • Under IID setup: The observed likelihood is Lobs (θ, φ) = n i=1 f (yi ; θ)P(δi |yi ; φ)dyi,mis , where it is understood that, if yi = yi,obs and yi,mis is empty then there is nothing to integrate out. • In the special case of scalar y, the observed likelihood is Lobs (θ, φ) = δi =1 [f (yi ; θ)π(yi ; φ)] × δi =0 f (y; θ) {1 − π(y; φ)} dy , where π(y; φ) = P(δ = 1|y; φ). Kim (ISU) Ch 2 24 / 50
  • 25. Missing At Random Definition: Missing At Random (MAR) P(δ|y) is the density of the conditional distribution of δ given y. Let yobs = yobs (y, δ) where yi,obs = yi if δi = 1 ∗ if δi = 0. The response mechanism is MAR if P (δ|y1) = P (δ|y2) { or P(δ|y) = P(δ|yobs )} for all y1 and y2 satisfying yobs (y1, δ) = yobs (y2, δ). • MAR: the response mechanism P(δ|y) depends on y only through yobs . • Let y = (yobs , ymis ). By Bayes theorem, P (ymis |yobs , δ) = P(δ|ymis , yobs ) P(δ|yobs ) P (ymis |yobs ) . • MAR: P(ymis |yobs , δ) = P(ymis |yobs ). That is, ymis ⊥ δ | yobs . • MAR: the conditional independence of δ and ymis given yobs . Kim (ISU) Ch 2 25 / 50
  • 26. Remark • MCAR (Missing Completely at random): P(δ | y) does not depend on y. • MAR (Missing at random): P(δ | y) = P(δ | yobs ) • NMAR (Not Missing at random): P(δ | y) = P(δ | yobs ) • Thus, MCAR is a special case of MAR. Kim (ISU) Ch 2 26 / 50
  • 27. Likelihood factorization theorem Theorem 2.4 (Rubin, 1976) Pφ(δ|y) is the joint density of δ given y and fθ(y) is the joint density of y. Under conditions 1 the parameters θ and φ are distinct and 2 MAR condition holds, the observed likelihood can be written as Lobs (θ, φ) = L1(θ)L2(φ), and the MLE of θ can be obtained by maximizing L1(θ). Thus, we do not have to specify the model for response mechanism. The response mechanism is called ignorable if the above likelihood factorization holds. Kim (ISU) Ch 2 27 / 50
  • 28. Example 2.4 • Bivariate data (xi , yi ) with pdf f (x, y) = f1(y | x)f2(x). • xi is always observed and yi is subject to missingness. • Assume that the response status variable δi of yi satisfies P (δi = 1 | xi , yi ) = Λ1 (φ0 + φ1xi + φ2yi ) for some function Λ1 (·) of known form. • Let θ be the parameter of interest in the regression model f1 (y | x; θ). Let α be the parameter in the marginal distribution of x, denoted by f2 (xi ; α). Define Λ0(x) = 1 − Λ1(x). • Three parameters • θ: parameter of interest • α and φ: nuisance parameter Kim (ISU) Ch 2 28 / 50
  • 29. Example 2.4 (Cont’d) • Observed likelihood Lobs (θ, α, φ) =   δi =1 f1 (yi | xi ; θ) f2 (xi ; α) Λ1 (φ0 + φ1xi + φ2yi )   ×   δi =0 f1 (y | xi ; θ) f2 (xi ; α) Λ0 (φ0 + φ1xi + φ2y) dy   = L1 (θ, φ) × L2 (α) where L2 (α) = n i=1 f2 (xi ; α) . • Thus, we can safely ignore the marginal distribution of x if x is completely observed. Kim (ISU) Ch 2 29 / 50
  • 30. Example 2.4 (Cont’d) • If φ2 = 0, then MAR holds and L1 (θ, φ) = L1a (θ) × L1b (φ) where L1a (θ) = δi =1 f1 (yi | xi ; θ) and L1b (φ) = δi =1 Λ1 (φ0 + φ1xi ) × δi =0 Λ0 (φ0 + φ1xi ) . • Thus, under MAR, the MLE of θ can be obtained by maximizing L1a (θ), which is obtained by ignoring the missing part of the data. Kim (ISU) Ch 2 30 / 50
  • 31. Example 2.4 (Cont’d) • Instead of yi subject to missingness, if xi is subject to missingness, then the observed likelihood becomes Lobs (θ, φ, α) =   δi =1 f1 (yi | xi ; θ) f2 (xi ; α) Λ1 (φ0 + φ1xi + φ2yi )   ×   δi =0 f1 (yi | x; θ) f2 (x; α) Λ0 (φ0 + φ1x + φ2yi ) dx   = L1 (θ, φ) × L2 (α) . • If φ1 = 0 then Lobs (θ, α, φ) = L1 (θ, α) × L2 (φ) and MAR holds. Although we are not interested in the marginal distribution of x, we still need to specify the model for the marginal distribution of x. Kim (ISU) Ch 2 31 / 50
  • 32. Section 3: Mean Score Approach Motivation • The observed likelihood is the marginal density of (yobs , δ). • The observed likelihood is Lobs (η) = R(yobs ,δ) f (y; θ)P(δ|y; φ)dµ(y) = f (y; θ)P(δ|y; φ)dµ(ymis ) where ymis is the missing part of y and η = (θ, φ). • Observed score equation: Sobs (η) ≡ ∂ ∂η log Lobs (η) = 0 • Computing the observed score function can be computationally challenging because the observed likelihood is an integral form. Kim (ISU) Ch 2 32 / 50
  • 33. 3 Mean Score Approach Theorem 2.5: Mean Score Theorem (Fisher, 1922) Under some regularity conditions, the observed score function equals to the mean score function. That is, Sobs (η) = ¯S(η) where ¯S(η) = E{Scom(η)|yobs , δ} Scom(η) = ∂ ∂η log f (y, δ; η), f (y, δ; η) = f (y; θ)P(δ|y; φ). • The mean score function is computed by taking the conditional expectation of the complete-sample score function given the observation. • The mean score function is easier to compute than the observed score function. Kim (ISU) Ch 2 33 / 50
  • 34. 3 Mean Score Approach Proof of Theorem 2.5 Since Lobs (η) = f (y, δ; η) /f (y, δ | yobs, δ; η), we have ∂ ∂η ln Lobs (η) = ∂ ∂η ln f (y, δ; η) − ∂ ∂η ln f (y, δ | yobs, δ; η) , taking a conditional expectation of the above equation over the conditional distribution of (y, δ) given (yobs, δ), we have ∂ ∂η ln Lobs (η) = E ∂ ∂η ln Lobs (η) | yobs, δ = E {Scom (η) | yobs, δ} − E ∂ ∂η ln f (y, δ | yobs, δ; η) | yobs, δ . Here, the first equality holds because Lobs (η) is a function of (yobs, δ) only. The last term is equal to zero by Theorem 2.3, which states that the expected value of the score function is zero and the reference distribution in this case is the conditional distribution of (y, δ) given (yobs, δ). Kim (ISU) Ch 2 34 / 50
  • 35. Example 2.6 1 Suppose that the study variable y follows from a normal distribution with mean x β and variance σ2 . The score equations for β and σ2 under complete response are S1(β, σ2 ) = n i=1 yi − xi β xi /σ2 = 0 and S2(β, σ2 ) = −n/(2σ2 ) + n i=1 yi − xi β 2 /(2σ4 ) = 0. 2 Assume that yi are observed only for the first r elements and the MAR assumption holds. In this case, the mean score function reduces to ¯S1(β, σ2 ) = r i=1 yi − xi β xi /σ2 and ¯S2(β, σ2 ) = −n/(2σ2 ) + r i=1 yi − xi β 2 /(2σ4 ) + (n − r)/(2σ2 ). Kim (ISU) Ch 2 35 / 50
  • 36. Example 2.6 (Cont’d) 3 The maximum likelihood estimator obtained by solving the mean score equations is ˆβ = r i=1 xi xi −1 r i=1 xi yi and ˆσ2 = 1 r r i=1 yi − xi ˆβ 2 . Thus, the resulting estimators can be also obtained by simply ignoring the missing part of the sample, which is consistent with the result in Example 2.4 (for φ2 = 0). Kim (ISU) Ch 2 36 / 50
  • 37. Discussion of Example 2.6 • We are interested in estimating θ for the conditional density f (y | x; θ). • Under MAR, the observed likelihood for θ is Lobs (θ) = r i=1 f (yi | xi ; θ) × n i=r+1 f (y | xi ; θ)dy = r i=1 f (yi | xi ; θ). • The same conclusion can follow from the mean score theorem. Under MAR, the mean score function is ¯S(θ) = r i=1 S(θ; xi , yi ) + n i=r+1 E{S(θ; xi , Y ) | xi } = r i=1 S(θ; xi , yi ) where S(θ; x, y) is the score function for θ and the second equality follows from Theorem 2.3. Kim (ISU) Ch 2 37 / 50
  • 38. Example 2.5 1 Suppose that the study variable y is randomly distributed with Bernoulli distribution with probability of success pi , where pi = pi (β) = exp (xi β) 1 + exp (xi β) for some unknown parameter β and xi is a vector of the covariates in the logistic regression model for yi . We assume that 1 is in the column space of xi . 2 Under complete response, the score function for β is S1(β) = n i=1 (yi − pi (β)) xi . Kim (ISU) Ch 2 38 / 50
  • 39. Example 2.5 (Cont’d) 3 Let δi be the response indicator function for yi with distribution Bernoulli(πi ) where πi = exp (xi φ0 + yi φ1) 1 + exp (xi φ0 + yi φ1) . We assume that xi is always observed, but yi is missing if δi = 0. 4 Under missing data, the mean score function for β is ¯S1 (β, φ) = δi =1 {yi − pi (β)} xi + δi =0 1 y=0 wi (y; β, φ) {y − pi (β)} xi , where wi (y; β, φ) is the conditional probability of yi = y given xi and δi = 0: wi (y; β, φ) = Pβ (yi = y | xi ) Pφ (δi = 0 | yi = y, xi ) 1 z=0 Pβ (yi = z | xi ) Pφ (δi = 0 | yi = z, xi ) Thus, ¯S1 (β, φ) is also a function of φ. Kim (ISU) Ch 2 39 / 50
  • 40. Example 2.5 (Cont’d) 5 If the response mechanism is MAR so that φ1 = 0, then wi (y; β, φ) = Pβ (yi = y | xi ) 1 z=0 Pβ (yi = z | xi ) = Pβ (yi = y | xi ) and so ¯S1 (β, φ) = δi =1 {yi − pi (β)} xi = ¯S1 (β) . 6 If MAR does not hold, then (ˆβ, ˆφ) can be obtained by solving ¯S1 (β, φ) = 0 and ¯S2(β, φ) = 0 jointly, where ¯S2 (β, φ) = δi =1 {δi − π (φ; xi , yi )} (xi , yi ) + δi =0 1 y=0 wi (y; β, φ) {δi − πi (φ; xi , y)} (xi , y) . Kim (ISU) Ch 2 40 / 50
  • 41. Section 4: Observed information Definition 1 Observed score function: Sobs (η) = ∂ ∂η log Lobs (η) 2 Fisher information from observed likelihood: Iobs (η) = − ∂2 ∂η∂ηT log Lobs (η) 3 Expected (Fisher) information from observed likelihood: Iobs (η) = Eη {Iobs (η)}. Theorem 2.6 Under regularity conditions, E{Sobs (η)} = 0, and V{Sobs (η)} = Iobs (η), where Iobs (η) = Eη{Iobs (η)} is the expected information from the observed likelihood. Kim (ISU) Ch 2 41 / 50
  • 42. 4 Observed information • Under missing data, the MLE ˆη is the solution to ¯S(η) = 0. • Under some regularity conditions, ˆη converges in probability to η0 and has the asymptotic variance {Iobs (η0)}−1 with Iobs (η) = E − ∂ ∂ηT Sobs (η) = E S⊗2 obs (η) = E ¯S⊗2 (η) & B⊗2 = BBT . • For variance estimation of ˆη, may use {Iobs (ˆη)}−1 . • In general, Iobs (ˆη) is preferred to Iobs (ˆη) for variance estimation of ˆη. Kim (ISU) Ch 2 42 / 50
  • 43. Return to Example 2.3 • Observed log-likelihood ln Lobs (θ) = n i=1 δi log(θ) − θ n i=1 yi • MLE for θ: ˆθ = n i=1 yi n i=1 δi • Fisher information: Iobs (θ) = n i=1 δi /θ2 • Expected information: Iobs (θ) = n i=1(1 − e−θc )/θ2 = n(1 − e−θc )/θ2 . Which one do you prefer ? Kim (ISU) Ch 2 43 / 50
  • 44. 4 Observed information Motivation • Lcom(η) = f (y, δ; η): complete-sample likelihood with no missing data • Fisher information associated with Lcom(η): Icom(η) = − ∂ ∂ηT Scom(η) = − ∂2 ∂η∂ηT log Lcom(η) • Lobs (η): the observed likelihood • Fisher information associated with Lobs (η): Iobs (η) = − ∂ ∂ηT Sobs (η) = − ∂2 ∂η∂ηT log Lobs (η) • How to express Iobs (η) in terms of Icom(η) and Scom(η) ? Kim (ISU) Ch 2 44 / 50
  • 45. Louis’ formula Theorem 2.7 (Louis, 1982; Oakes, 1999) Under regularity conditions allowing the exchange of the order of integration and differentiation, Iobs (η) = E{Icom(η)|yobs , δ} − E{S⊗2 com(η)|yobs , δ} − ¯S(η)⊗2 = E{Icom(η)|yobs , δ} − V {Scom(η)|yobs , δ}, where ¯S(η) = E{Scom(η)|yobs , δ}. Kim (ISU) Ch 2 45 / 50
  • 46. Proof of Theorem 2.7 By Theorem 2.5, the observed information associated with Lobs(η) can be expressed as Iobs (η) = − ∂ ∂η ¯S (η) where ¯S (η) = E {Scom(η) | yobs, δ; η}. Thus, we have ∂ ∂η ¯S (η) = ∂ ∂η Scom(η; y)f (y, δ | yobs, δ; η) dµ(y) = ∂ ∂η Scom(η; y) f (y, δ | yobs, δ; η) dµ(y) + Scom(η; y) ∂ ∂η f (y, δ | yobs, δ; η) dµ(y) = E ∂Scom(η)/∂η | yobs, δ + Scom(η; y) ∂ ∂η log f (y, δ | yobs, δ; η) f (y, δ | yobs, δ; η) dµ(y). The first term is equal to −E {Icom(η) | yobs, δ} and the second term is equal to E Scom(η)Smis(η) | yobs, δ = E ¯S(η) + Smis(η) Smis(η) | yobs, δ = E Smis(η)Smis(η) | yobs, δ because E ¯S(η)Smis(η) | yobs, δ = 0. Kim (ISU) Ch 2 46 / 50
  • 47. Missing Information Principle • Smis (η): the score function with the conditional density f (y, δ|yobs , δ). • Expected missing information: Imis (η) = E − ∂ ∂ηT Smis (η) satisfying Imis (η) = E Smis (η)⊗2 . • Missing information principle (Orchard and Woodbury, 1972): Imis (η) = Icom(η) − Iobs (η), where Icom(η) = E −∂Scom(η)/∂ηT is the expected information with complete-sample likelihood . • An alternative expression of the missing information principle is V{Smis (η)} = V{Scom(η)} − V{ ¯S(η)}. Note that V{Scom(η)} = Icom(η) and V{Sobs (η)} = Iobs (η). Kim (ISU) Ch 2 47 / 50
  • 48. Example 2.7 1 Consider the following bivariate normal distribution: y1i y2i ∼ N µ1 µ2 , σ11 σ12 σ12 σ22 , for i = 1, 2, · · · , n. Assume for simplicity that σ11, σ12 and σ22 are known constants and µ = (µ1, µ2) be the parameter of interest. 2 The complete sample score function for µ is Scom (µ) = n i=1 S(i) com (µ) = n i=1 σ11 σ12 σ12 σ22 −1 y1i − µ1 y2i − µ2 . The information matrix of µ based on the complete sample is Icom (µ) = n σ11 σ12 σ12 σ22 −1 . Kim (ISU) Ch 2 48 / 50
  • 49. Example 2.7 (Cont’d) 3 Suppose that there are some missing values in y1i and y2i and the original sample is partitioned into four sets: H = both y1 and y2 respond K = only y1 is observed L = only y2 is observed M = both y1 and y2 are missing. Let nH , nK , nL, nM represent the size of H, K, L, M, respectively. 4 Assume that the response mechanism does not depend on the value of (y1, y2) and so it is MAR. In this case, the observed score function of µ based on a single observation in set K is E S(i) com (µ) | y1i , i ∈ K = σ11 σ12 σ12 σ22 −1 y1i − µ1 E(y2i | y1i ) − µ2 = σ−1 11 (y1i − µ1) 0 . Kim (ISU) Ch 2 49 / 50
  • 50. Example 2.7 (Cont’d) 5 Similarly, we have E S(i) com (µ) | y2i , i ∈ L = 0 σ−1 22 (y2i − µ2) . 6 Therefore, the observed information matrix of µ is Iobs (µ) = nH σ11 σ12 σ12 σ22 −1 + nK σ−1 11 0 0 0 + nL 0 0 0 σ−1 22 and the asymptotic variance of the MLE of µ can be obtained by the inverse of Iobs (µ). Kim (ISU) Ch 2 50 / 50
  • 51. REFERENCES Fisher, R. A. (1922), ‘On the mathematical foundations of theoretical statistics’, Philosophical Transactions of the Royal Society of London A 222, 309–368. Louis, T. A. (1982), ‘Finding the observed information matrix when using the EM algorithm’, Journal of the Royal Statistical Society: Series B 44, 226–233. Oakes, D. (1999), ‘Direct calculation of the information matrix via the em algorithm’, Journal of the Royal Statistical Society: Series B 61, 479–482. Orchard, T. and M.A. Woodbury (1972), A missing information principle: theory and applications, in ‘Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and Probability’, Vol. 1, University of California Press, Berkeley, California, pp. 697–715. Rubin, D. B. (1976), ‘Inference and missing data’, Biometrika 63, 581–592. Kim (ISU) Ch 2 50 / 50