SlideShare a Scribd company logo
1 of 58
Download to read offline
Statistical analysis of non-ignorable nonresponse:
Review of the existing methods
Jae-Kwang Kim 1
Department of Statistics, Iowa State University
June 8, 2017
1
Joint work with Kosuke Morikawa at Osaka University
1 Introduction
2 Full likelihood-based ML estimation
3 Partial Likelihood approach
4 GMM method
5 Exponential tilting method
6 Semiparametric efficient estimation
7 Concluding Remarks
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 2 / 56
Introduction
• (X, Y ): random variable, y is subject to missingness
• Response indicator function
δi =
1 if yi is observed
0 otherwise.
• Nonignorable nonresponse
f (y | x) = f (y | x, δ = 1).
• In general,
f (y | x, δ = 1) =
P(δ = 1 | x, y)
P(δ = 1 | x)
f (y | x).
Thus, P(δ = 1 | x, y) = P(δ = 1 | x) implies nonignorable nonresponse.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 3 / 56
Observed likelihood
• f (y | x; θ): model of y on x
• g(δ | x, y; φ): model of δ on (x, y)
• Observed likelihood
Lobs (θ, φ) =
δi =1
f (yi | xi ; θ) g (δi | xi , yi ; φ)
×
δi =0
f (yi | xi ; θ) g (δi | xi , yi ; φ) dyi
• Under what conditions are the parameters identifiable (or estimable)?
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 4 / 56
Lemma 1
Suppose that we can decompose the covariate vector x = (u, z) such that
g(δ|y, x) = g(δ|y, u) (1)
and, for any given u, there exist zu,1 and zu,2 such that
f (y|u, z = zu,1) = f (y|u, z = zu,2). (2)
Under some other minor conditions, all the parameters in f and g are identifiable.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 5 / 56
Remark
• Condition (1) means
δ ⊥ z | y, u.
• That is, given (y, u), z does not help in explaining δ.
Figure: A DAG for understanding nonresponse instrumental variable Z
Z U
Y δ
• We may call z the nonresponse instrument variable.
• Rigorous theory developed by Wang et al. (2014).
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 6 / 56
Another condition for identification problem
• It is hard to find the instrumental variable Z by using only observed data.
• Also, if the dependence of Y on Z is weak, the estimator can be unstable.
• There is a necessary and sufficient condition for model identification by restricting
on the outcome model.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 7 / 56
Lemma 2
• Let O(φ; x, y) = 1/P(δ = 1 | x, y; φ) − 1, f (y | x, δ = 1; γ0) be a parametric model
known up to γ; f (y | x, δ = 1; γ) is identifiable. Also let E1(· | x) = E(· | x, δ = 1).
• If E1{O(φ; x, Y ) | x; γ0} exists for almost all x and it is identifiable for φ, the
observed likelihood is identifiable.
• We can show a model is identifiable without using any instrumental variable.
• The proof is given in Morikawa and Kim (2017).
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 8 / 56
Identifiable Model
Example 1
• Suppose that
yi | xi , δi = 1 ∼ N(τ(xi ), σ2
).
Assume that xi is always observed but we observe yi only when δi = 1 where
δi ∼ Bernoulli [πi (φ)] and
πi (φ) =
exp (φ0 + φ1xi + φ2yi )
1 + exp (φ0 + φ1xi + φ2yi )
.
• Then, E1{O(φ, x, Y ) | x; γ0} = exp(−φ0 − φ1x − φ2τ(x) − φ2
2σ2
/2).
• Therefore, the model is identifiable unless τ(x) is constant or linear.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 9 / 56
Introduction
Remark
• MCAR (Missing Completely at random): P(δ | y) does not depend on y.
• MAR (Missing at random): P(δ | y) = P(δ | yobs )
• NMAR (Not Missing at random): P(δ | y) = P(δ | yobs )
• Thus, MCAR is a special case of MAR.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 10 / 56
Parameter estimation under nonresponse nonresponse
• Full likelihood-based ML estimation
• Generalized method of moment (GMM) approach (Section 6.3 of KS)
• Conditional likelihood approach (Section 6.2 of KS)
• Pseudo likelihood approach (Section 6.4 of KS)
• Exponential tilting method (Section 6.5 of KS)
• Latent variable approach (Section 6.6 of KS)
Reference
Kim, J.K. and Shao, J. (2013). “Statistical Methods for Handling Incomplete Data”,
Chapman & Hall / CRC.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 11 / 56
1 Introduction
2 Full likelihood-based ML estimation
3 Partial Likelihood approach
4 GMM method
5 Exponential tilting method
6 Semiparametric efficient estimation
7 Concluding Remarks
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 12 / 56
Full likelihood-based ML estimation
• Wish to find ˆη = (ˆθ, ˆφ), that maximizes the observed likelihood
Lobs (η) =
δi =1
f (yi | xi ; θ) g (δi | xi , yi ; φ)
×
δi =0
f (yi | xi ; θ) g (δi | xi , yi ; φ) dyi
• Mean score theorem: Under some regularity conditions, finding the MLE by
maximizing the observed likelihood is equivalent to finding the solution to
¯S(η) ≡ E{S(η) | yobs , δ; η} = 0,
where yobs is the observed data. The conditional expectation of the score function
is called mean score function.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 13 / 56
EM algorithm
• Interested in finding ˆη that maximizes Lobs (η). The MLE can be obtained by
solving Sobs (η) = 0, which is equivalent to solving ¯S(η) = 0 by the mean score
theorem.
• EM algorithm provides an alternative method of solving ¯S(η) = 0 by writing
¯S(η) = E {S(η) | yobs , δ; η}
and using the following iterative method:
ˆη(t+1)
← solve E S(η) | yobs , δ; ˆη(t)
= 0.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 14 / 56
EM algorithm
Definition
Let η(t)
be the current value of the parameter estimate of η. The EM algorithm can be
defined as iteratively carrying out the following E-step and M-steps:
• E-step: Compute
Q η | η(t)
= E ln f (y, δ; η) | yobs, δ, η(t)
• M-step: Find η(t+1)
that maximizes Q(η | η(t)
) w.r.t. η.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 15 / 56
Monte Carlo EM
Motivation: Monte Carlo samples in the EM algorithm can be used as imputed values.
Monte Carlo EM
1 In the EM algorithm defined by
• [E-step] Compute
Q η | η(t)
= E ln f (y, δ; η) | yobs, δ; η(t)
• [M-step] Find η(t+1)
that maximizes Q η | η(t)
,
E-step is computationally cumbersome because it involves integral.
2 Wei and Tanner (1990): In the E-step, first draw
y
∗(1)
mis , · · · , y
∗(m)
mis ∼ f ymis | yobs, δ; η(t)
and approximate
Q η | η(t) ∼=
1
m
m
j=1
ln f yobs , y
∗(j)
mis , δ; η .
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 16 / 56
Monte Carlo EM
Example 2
• Suppose that
yi ∼ f (yi | xi ; θ)
Assume that xi is always observed but we observe yi only when δi = 1 where
δi ∼ Bernoulli [πi (φ)] and
πi (φ) =
exp (φ0 + φ1xi + φ2yi )
1 + exp (φ0 + φ1xi + φ2yi )
.
• To implement the MCEM method, in the E-step, we need to generate samples from
f (yi | xi , δi = 0; ˆθ, ˆφ) =
f (yi | xi ; ˆθ){1 − πi (ˆφ)}
f (yi | xi ; ˆθ){1 − πi (ˆφ)}dyi
.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 17 / 56
Monte Carlo EM
Example 2 (Cont’d)
• We can use the following rejection method to generate samples from
f (yi | xi , δi = 0; ˆθ, ˆφ):
1 Generate y∗
i from f (yi | xi ; ˆθ).
2 Using y∗
i , compute
π∗
i (ˆφ) =
exp(ˆφ0 + ˆφ1xi + ˆφ2y∗
i )
1 + exp(ˆφ0 + ˆφ1xi + ˆφ2y∗
i )
.
Accept y∗
i with probability 1 − π∗
i (ˆφ).
3 If y∗
i is not accepted, then goto Step 1.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 18 / 56
Monte Carlo EM
Example 2 (Cont’d)
• Using the m imputed values of yi , denoted by y
∗(1)
i , · · · , y
∗(m)
i , and the M-step can
be implemented by solving
n
i=1
m
j=1
S θ; xi , y
∗(j)
i = 0
and
n
i=1
m
j=1
δi − π(φ; xi , y
∗(j)
i ) 1, xi , y
∗(j)
i = 0,
where S (θ; xi , yi ) = ∂ log f (yi | xi ; θ)/∂θ.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 19 / 56
Remark
• Identifiability condition is needed to guarantee the convergence of EM sequence.
• The fully parametric model approach is known to be sensitive to the failure of
model assumptions: Little (1985), Kenward and Molenberghs (1988).
• Sensitivity analysis is often recommended: Scharfstein et al. (1999).
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 20 / 56
1 Introduction
2 Full likelihood-based ML estimation
3 Partial Likelihood approach
4 GMM method
5 Exponential tilting method
6 Semiparametric efficient estimation
7 Concluding Remarks
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 21 / 56
Partial Likelihood approach
• A classical likelihood-based approach for parameter estimation under non ignorable
nonresponse is to maximize Lobs (θ, φ) with respect to (θ, φ), where
Lobs (θ, φ) =
δi =1
f (yi | xi ; θ) g (δi | xi , yi ; φ)
×
δi =0
f (yi | xi ; θ) g (δi | xi , yi ; φ) dyi
• Such approach can be called full likelihood-based approach because it uses full
information available in the observed data.
• On the other hand, partial likelihood-based approach (or conditional likelihood
approach) uses a subset of the sample.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 22 / 56
Conditional Likelihood approach
Idea
• Since
f (y | x)g(δ | x, y) = f1(y | x, δ)g1(δ | x),
for some f1 and g1, we can write
Lobs (θ) =
δi =1
f1 (yi | xi , δi = 1) g1 (δi | xi )
×
δi =0
f1 (yi | xi , δi = 0) g1 (δi | xi ) dyi
=
δi =1
f1 (yi | xi , δi = 1) ×
n
i=1
g1 (δi | xi ) .
• The conditional likelihood is defined to be the first component:
Lc (θ) =
δi =1
f1 (yi | xi , δi = 1) =
δi =1
f (yi | xi ; θ)π(xi , yi )
f (y | xi ; θ)π(xi , y)dy
,
where π(x, yi ) = Pr(δi = 1 | xi , yi ). Popular in biased sampling literature.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 23 / 56
Pseudo Likelihood approach
Idea
• Consider bivariate (xi , yi ) with density f (y | x; θ)h(x) where yi are subject to
missingness.
• We are interested in estimating θ.
• Suppose that Pr(δ = 1 | x, y) depends only on y. (i.e. x is nonresponse
instrument)
• Note that f (x | y, δ) = f (x | y).
• Thus, we can consider the following conditional likelihood
Lc (θ) =
δi =1
f (xi | yi , δi = 1) =
δi =1
f (xi | yi ).
• We can consider maximizing the pseudo likelihood
Lp(θ) =
δi =1
f (yi | xi ; θ)ˆh(xi )
f (yi | x; θ)ˆh(x)dx
,
where ˆh(x) is a consistent estimator of the marginal density of x.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 24 / 56
Pseudo Likelihood approach
Idea
• We may use the empirical density in ˆh(x). That is, ˆh(x) = 1/n if x = xi . In this
case,
Lc (θ) =
δi =1
f (yi | xi ; θ)
n
k=1 f (yi | xk ; θ)
.
• We can extend the idea to the case of x = (u, z) where z is a nonresponse
instrument. In this case, the conditional likelihood becomes
i:δi =1
p(zi | yi , ui ) =
i:δi =1
f (yi | ui , zi ; θ)p(zi |ui )
f (yi | ui , z; θ)p(z|ui )dz
. (3)
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 25 / 56
Pseudo Likelihood approach
• Let ˆp(z|u) be an estimated conditional probability density of z given u. Substituting
this estimate into the likelihood in (3), we obtain the following pseudo likelihood:
i:δi =1
f (yi | ui , zi ; θ)ˆp(zi |ui )
f (yi | ui , z; θ)ˆp(z|ui )dz
. (4)
• The pseudo maximum likelihood estimator (PMLE) of θ, denoted by ˆθp, can be
obtained by solving
Sp(θ; ˆα) ≡
δi =1
[S(θ; xi , yi ) − E{S(θ; ui , z, yi ) | yi , ui ; θ, ˆα}] = 0
for θ, where S(θ; x, y) = S(θ; u, z, y) = ∂ log f (y | x; θ)/∂θ and
E{S(θ; ui , z, yi ) | yi , ui ; θ, ˆα} =
S(θ; ui , z, yi )f (yi | ui , z; θ)p(z | ui ; ˆα)dz
f (yi | ui , z; θ)p(z | ui ; ˆα)dz
.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 26 / 56
Pseudo Likelihood approach
• The Fisher-scoring method for obtaining the PMLE is given by
ˆθ(t+1)
p = ˆθ(t)
p + Ip
ˆθ(t)
, ˆα
−1
Sp(ˆθ(t)
, ˆα)
where
Ip (θ, ˆα) =
δi =1
E{S(θ; ui , z, yi )⊗2
| yi , ui ; θ, ˆα} − E{S(θ; ui , z, yi ) | yi , ui ; θ, ˆα}⊗2
.
• First considered by Tang et al. (2003) and further developed by Zhao and Shao
(2015).
• Model section with penalized validation criterion is proposed by Fang and Shao
(2016).
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 27 / 56
1 Introduction
2 Full likelihood-based ML estimation
3 Partial Likelihood approach
4 GMM method
5 Exponential tilting method
6 Semiparametric efficient estimation
7 Concluding Remarks
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 28 / 56
Basic setup
• (X, Y ): random variable
• θ: Defined by solving
E{U(θ; X, Y )} = 0.
• yi is subject to missingness
δi =
1 if yi responds
0 if yi is missing.
• Want to find wi such that the solution ˆθw to
n
i=1
δi wi U(θ; xi , yi ) = 0
is consistent for θ.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 29 / 56
Basic Setup
• Result 1: The choice of
wi =
1
E(δi | xi , yi )
(5)
makes the resulting estimator ˆθw consistent.
• Result 2: If δi ∼ Bernoulli(πi ), then using wi = 1/πi also makes the resulting
estimator consistent, but it is less efficient than ˆθw using wi in (5).
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 30 / 56
Parameter estimation : GMM method
• Because z is a nonresponse instrumental variable, we may assume
P(δ = 1 | x, y) = π(φ0 + φ1u + φ2y)
for some (φ0, φ1, φ2).
• Kott and Chang (2010): Construct a set of estimating equations such as
n
i=1
δi
π(φ0 + φ1ui + φ2yi )
− 1 (1, ui , zi ) = 0
that are unbiased to zero.
• May have overidentified situation: Use the generalized method of moments
(GMM).
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 31 / 56
GMM method
Example 3
• Suppose that we are interested in estimating the parameters in the regression
model
yi = β0 + β1x1i + β2x2i + ei (6)
where E(ei | xi ) = 0.
• Assume that yi is subject to missingness and assume that
P(δi = 1 | x1i , xi2, yi ) =
exp(φ0 + φ1x1i + φ2yi )
1 + exp(φ0 + φ1x1i + φ2yi )
.
Thus, x2i is the nonresponse instrument variable in this setup.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 32 / 56
Example 3 (Cont’d)
• A consistent estimator of φ can be obtained by solving
ˆU2(φ) ≡
n
i=1
δ
π(φ; x1i , yi )
− 1 (1, x1i , x2i ) = (0, 0, 0). (7)
Roughly speaking, the solution to (7) exists almost surely if E{∂ ˆU2(φ)/∂φ} is of
full rank in the neighborhood of the true value of φ. If x2 is vector, then (7) is
overidentified and the solution to (7) does not exist. In the case, the GMM
algorithm can be used.
• Finding the solution to ˆU2(φ) = 0 can be obtained by finding the minimizer of
Q(φ) = ˆU2(φ) ˆU2(φ) or QW (φ) = ˆU2(φ) W ˆU2(φ) where W = {V ( ˆU2)}−1
.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 33 / 56
Example 3 (Cont’d)
• Once the solution ˆφ to (7) is obtained, then a consistent estimator of
β = (β0, β1, β2) can be obtained by solving
ˆU1(β, ˆφ) ≡
n
i=1
δi
ˆπi
{yi − β0 − β1x1i − β2x2i } (1, x1i , x2i ) = (0, 0, 0) (8)
for β.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 34 / 56
Asymptotic Properties
• The asymptotic variance of ˆβ obtained from (8) with ˆφ computed from the GMM
can be obtained by
V (ˆθ) ∼= ΓaΣ−1
a Γa
−1
(9)
where
Γa = E{∂ ˆU(θ)/∂θ}
Σa = V ( ˆU)
ˆU = ( ˆU1, ˆU2)
and θ = (β, φ).
• Rigorous theory developed by Wang et al. (2014).
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 35 / 56
1 Introduction
2 Full likelihood-based ML estimation
3 Partial Likelihood approach
4 GMM method
5 Exponential tilting method
6 Semiparametric efficient estimation
7 Concluding Remarks
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 36 / 56
Exponential tilting method
Motivation
• Parameter θ defined by E{U(θ; X, Y )} = 0.
• We are interested in estimating θ from an expected estimating equation:
n
i=1
[δi U(θ; xi , yi ) + (1 − δi )E{U(θ; xi , Y ) | xi , δi = 0}] = 0. (10)
• The conditional expectation in (10) can be evaluated by using
f (y|x, δ = 0) = f (y|x)
P(δ = 0|x, y)
E{P(δ = 0|x, y)|x}
(11)
which requires correct specification of f (y | x; θ). Known to be sensitive to the
choice of f (y | x; θ).
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 37 / 56
Exponential tilting method
Idea
Instead of specifying a parametric model for f (y | x), consider specifying a parametric
model for f (y | x, δ = 1), denoted by f1(y | x). In this case,
f0 (yi | xi ) = f1 (yi | xi ) ×
O (xi , yi )
E {O (xi , Yi ) | xi , δi = 1}
, (12)
where fδ (yi | xi ) = f (yi | xi , δi = δ) and
O (xi , yi ) =
Pr (δi = 0 | xi , yi )
Pr (δi = 1 | xi , yi )
(13)
is the conditional odds of nonresponse.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 38 / 56
Remark
• If the response probability follows from a logistic regression model
π(xi , yi ) ≡ Pr (δi = 1 | xi , yi ) =
exp {g(xi ) + φyi }
1 + exp {g(xi ) + φyi }
, (14)
where g(x) is completely unspecified, the expression (12) can be simplified to
f0 (yi | xi ) = f1 (yi | xi ) ×
exp (γyi )
E {exp (γY ) | xi , δi = 1}
, (15)
where γ = −φ and f1 (y | x) is the conditional density of y given x and δ = 1.
• Model (15) states that the density for the nonrespondents is an exponential tilting
of the density for the respondents. The parameter γ is the tilting parameter that
determines the amount of departure from the ignorability of the response
mechanism. If γ = 0, the the response mechanism is ignorable and
f0(y|x) = f1(y|x).
• Kim and Yu (2011) proposed a method to estimate γ with validation sample.
• Shao and Wang (2016) developed a method to estimate γ without using the
validation sample.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 39 / 56
Maximum Likelihood Estimation of tilting parameter
• We wish to solve
n
i=1
[δi Si (φ) + (1 − δi )E{Si (φ) | xi , δi = 0}] = 0
with respect to φ, where Si (φ) is the score function of φ evaluated at (xi , yi ).
• The conditional expectation can be written as
E{Si (φ) | xi , δi = 0} =
Si (φ)O(xi , y; φ)f1(y | xi )dy
O(xi , y; φ)f1(y | xi )dy
where
O(x, y; φ) =
1 − π(φ; x, y)
π(φ; x, y)
.
• Thus,
E{Si (φ) | xi , δi = 0} =
E1{Si (φ)O(xi,Y ; φ) | xi }
E1{O(xi,Y ; φ) | xi }
where
E1{Q(xi , Y ) | xi } = Q(xi , y)f1(y | xi )dy.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 40 / 56
Riddles et al. (2016) method
• In practice, f1(y | x) is unknown and is estimated by ˆf1(y | x) = f1(y | x; ˆγ).
• Thus, given ˆγ, a fully efficient estimator of φ can be obtained by solving
S2(φ, ˆγ) ≡
n
i=1
δi S(φ; xi , yi ) + (1 − δi ) ¯S0(φ | xi ; ˆγ, φ) = 0, (16)
where
¯S0(φ | xi ; ˆγ, φ) =
δj =1 S(φ; xi , yj )O(φ; xi , yj )f1(yj | xi ; ˆγ)/ˆf1(yj )
δj =1 O(φ; xi , yj )f1(yj | xi ; ˆγ)/ˆf1(yj )
and
ˆf1(y) = n−1
R
n
i=1
δi f1(y | xi ; ˆγ).
• Use EM algorithm to solve (16) for φ.
• Semiparametric extension (Morikawa et al., 2017): Use a nonparametric density for
f1(y | x).
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 41 / 56
A Toy Example: Categorical Data (All dichotomous)
Example (SRS, n = 10)
ID Weight x1 x2 y
1 0.1 1 0 1
2 0.1 1 1 1
3 0.1 0 1 M
4 0.1 1 0 0
5 0.1 0 1 1
6 0.1 1 0 M
7 0.1 0 1 M
8 0.1 1 0 0
9 0.1 0 0 0
10 0.1 1 1 0
M: Missing
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 42 / 56
A Toy Example (Cont’d)
Assume P(δ = 1 | x1, x2, y) = π(x1, y)
ID Weight x1 x2 y
1 0.1 1 0 1
2 0.1 1 1 1
3 0.1 · w3,0 0 1 0
0.1 · w3,1 0 1 1
4 0.1 1 0 0
5 0.1 0 1 1
w3,j = ˆP(Y = j | X1 = 0, X2 = 1, δ = 0)
∝ ˆP(Y = j | X1 = 0, X2 = 1, δ = 1)
1 − ˆπ(0, j)
ˆπ(0, j)
with
w3,0 + w3,1 = 1
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 43 / 56
A Toy Example (Cont’d)
ID Weight x1 x2 y
6 0.1 · w6,0 1 0 0
0.1 · w6,1 1 0 1
7 0.1 · w7,0 0 1 0
0.1 · w7,1 0 1 1
8 0.1 1 0 0
9 0.1 0 0 0
10 0.1 1 1 0
w6,j ∝ ˆP(Y = j | X1 = 1, X2 = 0, δ = 1)
1 − ˆπ(1, j)
ˆπ(1, j)
w7,j ∝ ˆP(Y = j | X1 = 0, X2 = 1, δ = 1)
1 − ˆπ(0, j)
ˆπ(0, j)
with
w6,0 + w6,1 = w7,0 + w7,1 = 1.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 44 / 56
Example (Cont’d)
• E-step: Compute the conditional probability using the estimated response
probability ˆπab.
• M-step: Update the response probability using the fractional weights. For fully
nonparametric model,
ˆπab =
δi =1 I(x1i = a, yi = b)
δi =1 I(x1i = a, yi = b) + δi =0
1
j=0 wi,j I(x1i = a, y∗
ij = b)
• The solution from the proposed method is ˆπ11 = 1, ˆπ10 = 3/4, ˆπ01 = 1/3, ˆπ00 = 1.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 45 / 56
A Toy Example (Cont’d)
Example (Cont’d)
• The method can be viewed as a fractional imputation method of Kim (2011).
• On the other hand, GMM method is more close to nonresponse weighting
adjustment.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 46 / 56
A Toy Example (Cont’d)
Example GMM method
ID Wgt 1 Wgt2 x1 x2 y
1 0.1 0.1ˆπ−1
11 1 0 1
2 0.1 0.1ˆπ−1
11 1 1 1
3 0.1 0.0 0 1 M
4 0.1 0.1ˆπ−1
10 1 0 0
5 0.1 0.1ˆπ−1
01 0 1 1
6 0.1 0.0 1 0 M
7 0.1 0.0 0 1 M
8 0.1 0.1ˆπ−1
10 1 0 0
9 0.1 0.1ˆπ−1
00 0 0 0
10 0.1 0.1ˆπ−1
10 1 1 0
M: Missing
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 47 / 56
A Toy Example (Cont’d)
• GMM method: Calibration equation
i
δi
ˆπi
I(x1i = a, x2i = b) =
i
I(x1i = a, x2i = b).
1 X1 = 1, X2 = 1: ˆπ−1
11 + ˆπ−1
10 = 2
2 X1 = 1, X2 = 0: ˆπ−1
11 + ˆπ−1
10 + ˆπ−1
10 = 4
3 X1 = 0, X2 = 1: ˆπ−1
01 = 3
4 X1 = 0, X2 = 0: ˆπ−1
00 = 1.
• The solution of GMM method does not exist.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 48 / 56
1 Introduction
2 Full likelihood-based ML estimation
3 Partial Likelihood approach
4 GMM method
5 Exponential tilting method
6 Semiparametric efficient estimation
7 Concluding Remarks
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 49 / 56
Setup
Motivation
• For simplicity, assume that θ = E(Y ) is of our interest.
• A response model is assumed P(δ = 1 | x, y; φ) = π(φ; x, y).
• Therefore, our restriction is only on the response model (semiparametric model).
• What is the best way to estimate (φ, θ) in terms of the asymptotic variance?
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 50 / 56
Semiparametric Efficient Estimation
Morikawa and Kim (2017): The solution of the following estimating equations achieves
the semiparametric efficiency bound for (φ, θ):
n
i=1
1 −
δi
π(φ; xi , yi )
geff (φ; xi ) = 0,
n
i=1
δi (θ − yi )
π(φ; xi , yi )
+ 1 −
δi
π(φ; xi , yi )
(θ − Y (xi )) = 0,
(17)
where, geff (x; φ) = E { ˙π(φ; x, Y )/{1 − π(φ; x, Y )} | x}, ˙π(φ) = ∂π(φ)/∂φ,
Y (x) = E (Y | x),
E {g(x, Y ) | x} =
E{O(x, Y )g(x, Y ) | x}
E{O(x, Y ) | x}
,
and O(x, y) = 1/π(x, y) − 1.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 51 / 56
Semiparametric Efficient Estimation
• The two functions geff (φ; x) and Y (x) in (17) are to be estimated.
• By using Bayes’s formula, for any function g(x, y), we have
E {g(x, Y ) | x} =
E{O(x, Y )g(x, Y ) | x}
E{O(x, Y ) | x}
=
E1{π−1
(x, Y )O(x, Y )g(x, Y ) | x}
E1{π−1(x, Y )O(x, Y ) | x}
, (18)
where E1(· | x) := E(· | x, δ = 1).
• Hence, if the distribution of f1(y | x) := f (y | x, δ = 1) is specified, (18) can be
computed; thus, the estimating equations (17) can be solved.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 52 / 56
Semiparametric Efficient Estimation
• Morikawa and Kim (2017) proposed two types of semiparametric estimators.
• First estimator uses a parametric f1(y | x; γ) model and solve (17) with estimated
ˆγ. The estimator has consistency and asymptotic normality even if the
specification of f1 is wrong. Also, if the specification is correct, it attains the
semiparametric efficiency bound.
• Second estimator uses a nonparametrc model on f1 model and solve (17). The
estimator always attains the semiparametric efficiency bound, but it does not work
well when the dimension of x is high.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 53 / 56
Remark
• The concept of the semiparametric efficiency bound and the semiparametric theory
are rigorously developed: Bickel et al. (1998).
• Extension to parameters which may restrict the data generation such as the
regression parameter: Robins et al. (1999).
• The estimator for φ in (17) is a special case of Kott and Chang (2010): set
geff (φ; xi ) as the calibration condition.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 54 / 56
1 Introduction
2 Full likelihood-based ML estimation
3 Partial Likelihood approach
4 GMM method
5 Exponential tilting method
6 Semiparametric efficient estimation
7 Concluding Remarks
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 55 / 56
Concluding remarks
• Uses a model for the response probability.
• Parameter estimation for response model can be implemented using the idea of
maximum likelihood method or method of moments.
• Instrumental variable or model restriction on f1 is needed for identifiability of the
response model.
• Less tools for model diagnostics or model validation
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 56 / 56
REFERENCES
Bickel, P. J., C. A. J. Klaassen, Y. Ritov, and J. A. Wellner (1998). Efficient and
Adaptive Estimation for Semiparametric Models. Springer.
Fang, F. and J. Shao (2016). Model selection with nonignorable nonresponse.
Biometrika 103(1), 1–14.
Kenward, M. G. and G. Molenberghs (1988). Likelihood based frequentist inference
when data are missing at random. Statistical Science 13, 236–247.
Kim, J. K. (2011). Parametric fractional imputation for missing data analysis.
Biometrika 98, 119–132.
Kim, J. K. and J. Shao (2013). Statistical Methods for Handling Incomplete Data.
Chapman & Hall / CRC.
Kim, J. K. and C. L. Yu (2011). A semi-parametric estimation of mean functionals with
non-ignorable missing data. Journal of the American Statistical Association 106,
157–165.
Kott, P. S. and T. Chang (2010). Using calibration weighting to adjust for nonignorable
unit nonresponse. Journal of the American Statistical Association 105, 1265–1275.
Little, R. (1985). A note about models for selectivity bias. Econometrica 53, 1469–1474.
Morikawa, K. and J. K. Kim (2017). Semiparametric adaptive estimation with
nonignorable nonresponse data. Submitted.
Morikawa, K., J. K. Kim, and Y. Kano (2017). Semiparametric maximum likelihood
estimation under nonignorable nonresponse. Submitted.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 56 / 56
Riddles, M. K., J. K. Kim, and J. Im (2016). A propensity-score-adjustment method for
nonignorable nonresponse. J. Surv. Stat. Methodol. 4(2), 215–245.
Robins, J. M., A. Rotnitzky, and D. O. Scharfstein (1999). Sensitivity Analysis for
Selection Bias and Unmeasured Confounding in Missing Data and Causal Inference
Models, Volume 116 of Statistical Models in Epidemiology: The Environment and
Clinical Trials. NY: Springer-Verlag.
Scharfstein, D., A. Rotnizky, and J. M. Robins (1999). Adjusting for nonignorable
dropout using semi-parametric models. Journal of the American Statistical
Association 94, 1096–1146.
Shao, J. and L. Wang (2016). Semiparametric inverse propensity weighting for
nonignorable missing data. Biometrika 103(1), 175–187.
Tang, G., R. J. A. Little, and T. E. Raghunathan (2003). Analysis of multivariate
missing data with nonignorable nonresponse. Biometrika 90, 747–764.
Wang, S., J. Shao, and J. K. Kim (2014). Identifiability and estimation in problems with
nonignorable nonresponse. Statistica Sinica 24, 1097 – 1116.
Wei, G. C. and M. A. Tanner (1990). A Monte Carlo implementation of the EM
algorithm and the poor man’s data augmentation algorithms. Journal of the
American Statistical Association 85, 699–704.
Zhao, J. and J. Shao (2015). Semiparametric pseudo-likelihoods in generalized linear
models with nonignorable missing data. Journal of the American Statistical
Association 110(512), 1577–1590.
Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 56 / 56

More Related Content

What's hot

accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
olli0601
 

What's hot (20)

Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2
 
Seattle.Slides.7
Seattle.Slides.7Seattle.Slides.7
Seattle.Slides.7
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
Pattern Recognition
Pattern RecognitionPattern Recognition
Pattern Recognition
 
Hypothesis testings on individualized treatment rules
Hypothesis testings on individualized treatment rulesHypothesis testings on individualized treatment rules
Hypothesis testings on individualized treatment rules
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithm
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
Comparison of the optimal design
Comparison of the optimal designComparison of the optimal design
Comparison of the optimal design
 
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATARESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
When Classifier Selection meets Information Theory: A Unifying View
When Classifier Selection meets Information Theory: A Unifying ViewWhen Classifier Selection meets Information Theory: A Unifying View
When Classifier Selection meets Information Theory: A Unifying View
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
sada_pres
sada_pressada_pres
sada_pres
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
 
ABC short course: introduction chapters
ABC short course: introduction chaptersABC short course: introduction chapters
ABC short course: introduction chapters
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 

Similar to MNAR

Jackknife algorithm for the estimation of logistic regression parameters
Jackknife algorithm for the estimation of logistic regression parametersJackknife algorithm for the estimation of logistic regression parameters
Jackknife algorithm for the estimation of logistic regression parameters
Alexander Decker
 
NIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
NIPS2009: Sparse Methods for Machine Learning: Theory and AlgorithmsNIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
NIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
zukun
 
5. cem granger causality ecm
5. cem granger causality  ecm 5. cem granger causality  ecm
5. cem granger causality ecm
Quang Hoang
 
Jyokyo-kai-20120605
Jyokyo-kai-20120605Jyokyo-kai-20120605
Jyokyo-kai-20120605
ketanaka
 
4. standard granger causality
4. standard granger causality4. standard granger causality
4. standard granger causality
Quang Hoang
 

Similar to MNAR (20)

Fractional hot deck imputation - Jae Kim
Fractional hot deck imputation - Jae KimFractional hot deck imputation - Jae Kim
Fractional hot deck imputation - Jae Kim
 
Uncertainty in deep learning
Uncertainty in deep learningUncertainty in deep learning
Uncertainty in deep learning
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
A Szemerédi-type theorem for subsets of the unit cube
A Szemerédi-type theorem for subsets of the unit cubeA Szemerédi-type theorem for subsets of the unit cube
A Szemerédi-type theorem for subsets of the unit cube
 
Jackknife algorithm for the estimation of logistic regression parameters
Jackknife algorithm for the estimation of logistic regression parametersJackknife algorithm for the estimation of logistic regression parameters
Jackknife algorithm for the estimation of logistic regression parameters
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Image denoising
Image denoisingImage denoising
Image denoising
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
NIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
NIPS2009: Sparse Methods for Machine Learning: Theory and AlgorithmsNIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
NIPS2009: Sparse Methods for Machine Learning: Theory and Algorithms
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
5. cem granger causality ecm
5. cem granger causality  ecm 5. cem granger causality  ecm
5. cem granger causality ecm
 
Jyokyo-kai-20120605
Jyokyo-kai-20120605Jyokyo-kai-20120605
Jyokyo-kai-20120605
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
4. standard granger causality
4. standard granger causality4. standard granger causality
4. standard granger causality
 
Causality-inspired ML - a use case in unsupervised domain adaptation
Causality-inspired ML - a use case in unsupervised domain adaptationCausality-inspired ML - a use case in unsupervised domain adaptation
Causality-inspired ML - a use case in unsupervised domain adaptation
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
Bayes Independence Test
Bayes Independence TestBayes Independence Test
Bayes Independence Test
 

Recently uploaded

Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 

Recently uploaded (20)

Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 

MNAR

  • 1. Statistical analysis of non-ignorable nonresponse: Review of the existing methods Jae-Kwang Kim 1 Department of Statistics, Iowa State University June 8, 2017 1 Joint work with Kosuke Morikawa at Osaka University
  • 2. 1 Introduction 2 Full likelihood-based ML estimation 3 Partial Likelihood approach 4 GMM method 5 Exponential tilting method 6 Semiparametric efficient estimation 7 Concluding Remarks Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 2 / 56
  • 3. Introduction • (X, Y ): random variable, y is subject to missingness • Response indicator function δi = 1 if yi is observed 0 otherwise. • Nonignorable nonresponse f (y | x) = f (y | x, δ = 1). • In general, f (y | x, δ = 1) = P(δ = 1 | x, y) P(δ = 1 | x) f (y | x). Thus, P(δ = 1 | x, y) = P(δ = 1 | x) implies nonignorable nonresponse. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 3 / 56
  • 4. Observed likelihood • f (y | x; θ): model of y on x • g(δ | x, y; φ): model of δ on (x, y) • Observed likelihood Lobs (θ, φ) = δi =1 f (yi | xi ; θ) g (δi | xi , yi ; φ) × δi =0 f (yi | xi ; θ) g (δi | xi , yi ; φ) dyi • Under what conditions are the parameters identifiable (or estimable)? Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 4 / 56
  • 5. Lemma 1 Suppose that we can decompose the covariate vector x = (u, z) such that g(δ|y, x) = g(δ|y, u) (1) and, for any given u, there exist zu,1 and zu,2 such that f (y|u, z = zu,1) = f (y|u, z = zu,2). (2) Under some other minor conditions, all the parameters in f and g are identifiable. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 5 / 56
  • 6. Remark • Condition (1) means δ ⊥ z | y, u. • That is, given (y, u), z does not help in explaining δ. Figure: A DAG for understanding nonresponse instrumental variable Z Z U Y δ • We may call z the nonresponse instrument variable. • Rigorous theory developed by Wang et al. (2014). Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 6 / 56
  • 7. Another condition for identification problem • It is hard to find the instrumental variable Z by using only observed data. • Also, if the dependence of Y on Z is weak, the estimator can be unstable. • There is a necessary and sufficient condition for model identification by restricting on the outcome model. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 7 / 56
  • 8. Lemma 2 • Let O(φ; x, y) = 1/P(δ = 1 | x, y; φ) − 1, f (y | x, δ = 1; γ0) be a parametric model known up to γ; f (y | x, δ = 1; γ) is identifiable. Also let E1(· | x) = E(· | x, δ = 1). • If E1{O(φ; x, Y ) | x; γ0} exists for almost all x and it is identifiable for φ, the observed likelihood is identifiable. • We can show a model is identifiable without using any instrumental variable. • The proof is given in Morikawa and Kim (2017). Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 8 / 56
  • 9. Identifiable Model Example 1 • Suppose that yi | xi , δi = 1 ∼ N(τ(xi ), σ2 ). Assume that xi is always observed but we observe yi only when δi = 1 where δi ∼ Bernoulli [πi (φ)] and πi (φ) = exp (φ0 + φ1xi + φ2yi ) 1 + exp (φ0 + φ1xi + φ2yi ) . • Then, E1{O(φ, x, Y ) | x; γ0} = exp(−φ0 − φ1x − φ2τ(x) − φ2 2σ2 /2). • Therefore, the model is identifiable unless τ(x) is constant or linear. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 9 / 56
  • 10. Introduction Remark • MCAR (Missing Completely at random): P(δ | y) does not depend on y. • MAR (Missing at random): P(δ | y) = P(δ | yobs ) • NMAR (Not Missing at random): P(δ | y) = P(δ | yobs ) • Thus, MCAR is a special case of MAR. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 10 / 56
  • 11. Parameter estimation under nonresponse nonresponse • Full likelihood-based ML estimation • Generalized method of moment (GMM) approach (Section 6.3 of KS) • Conditional likelihood approach (Section 6.2 of KS) • Pseudo likelihood approach (Section 6.4 of KS) • Exponential tilting method (Section 6.5 of KS) • Latent variable approach (Section 6.6 of KS) Reference Kim, J.K. and Shao, J. (2013). “Statistical Methods for Handling Incomplete Data”, Chapman & Hall / CRC. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 11 / 56
  • 12. 1 Introduction 2 Full likelihood-based ML estimation 3 Partial Likelihood approach 4 GMM method 5 Exponential tilting method 6 Semiparametric efficient estimation 7 Concluding Remarks Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 12 / 56
  • 13. Full likelihood-based ML estimation • Wish to find ˆη = (ˆθ, ˆφ), that maximizes the observed likelihood Lobs (η) = δi =1 f (yi | xi ; θ) g (δi | xi , yi ; φ) × δi =0 f (yi | xi ; θ) g (δi | xi , yi ; φ) dyi • Mean score theorem: Under some regularity conditions, finding the MLE by maximizing the observed likelihood is equivalent to finding the solution to ¯S(η) ≡ E{S(η) | yobs , δ; η} = 0, where yobs is the observed data. The conditional expectation of the score function is called mean score function. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 13 / 56
  • 14. EM algorithm • Interested in finding ˆη that maximizes Lobs (η). The MLE can be obtained by solving Sobs (η) = 0, which is equivalent to solving ¯S(η) = 0 by the mean score theorem. • EM algorithm provides an alternative method of solving ¯S(η) = 0 by writing ¯S(η) = E {S(η) | yobs , δ; η} and using the following iterative method: ˆη(t+1) ← solve E S(η) | yobs , δ; ˆη(t) = 0. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 14 / 56
  • 15. EM algorithm Definition Let η(t) be the current value of the parameter estimate of η. The EM algorithm can be defined as iteratively carrying out the following E-step and M-steps: • E-step: Compute Q η | η(t) = E ln f (y, δ; η) | yobs, δ, η(t) • M-step: Find η(t+1) that maximizes Q(η | η(t) ) w.r.t. η. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 15 / 56
  • 16. Monte Carlo EM Motivation: Monte Carlo samples in the EM algorithm can be used as imputed values. Monte Carlo EM 1 In the EM algorithm defined by • [E-step] Compute Q η | η(t) = E ln f (y, δ; η) | yobs, δ; η(t) • [M-step] Find η(t+1) that maximizes Q η | η(t) , E-step is computationally cumbersome because it involves integral. 2 Wei and Tanner (1990): In the E-step, first draw y ∗(1) mis , · · · , y ∗(m) mis ∼ f ymis | yobs, δ; η(t) and approximate Q η | η(t) ∼= 1 m m j=1 ln f yobs , y ∗(j) mis , δ; η . Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 16 / 56
  • 17. Monte Carlo EM Example 2 • Suppose that yi ∼ f (yi | xi ; θ) Assume that xi is always observed but we observe yi only when δi = 1 where δi ∼ Bernoulli [πi (φ)] and πi (φ) = exp (φ0 + φ1xi + φ2yi ) 1 + exp (φ0 + φ1xi + φ2yi ) . • To implement the MCEM method, in the E-step, we need to generate samples from f (yi | xi , δi = 0; ˆθ, ˆφ) = f (yi | xi ; ˆθ){1 − πi (ˆφ)} f (yi | xi ; ˆθ){1 − πi (ˆφ)}dyi . Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 17 / 56
  • 18. Monte Carlo EM Example 2 (Cont’d) • We can use the following rejection method to generate samples from f (yi | xi , δi = 0; ˆθ, ˆφ): 1 Generate y∗ i from f (yi | xi ; ˆθ). 2 Using y∗ i , compute π∗ i (ˆφ) = exp(ˆφ0 + ˆφ1xi + ˆφ2y∗ i ) 1 + exp(ˆφ0 + ˆφ1xi + ˆφ2y∗ i ) . Accept y∗ i with probability 1 − π∗ i (ˆφ). 3 If y∗ i is not accepted, then goto Step 1. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 18 / 56
  • 19. Monte Carlo EM Example 2 (Cont’d) • Using the m imputed values of yi , denoted by y ∗(1) i , · · · , y ∗(m) i , and the M-step can be implemented by solving n i=1 m j=1 S θ; xi , y ∗(j) i = 0 and n i=1 m j=1 δi − π(φ; xi , y ∗(j) i ) 1, xi , y ∗(j) i = 0, where S (θ; xi , yi ) = ∂ log f (yi | xi ; θ)/∂θ. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 19 / 56
  • 20. Remark • Identifiability condition is needed to guarantee the convergence of EM sequence. • The fully parametric model approach is known to be sensitive to the failure of model assumptions: Little (1985), Kenward and Molenberghs (1988). • Sensitivity analysis is often recommended: Scharfstein et al. (1999). Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 20 / 56
  • 21. 1 Introduction 2 Full likelihood-based ML estimation 3 Partial Likelihood approach 4 GMM method 5 Exponential tilting method 6 Semiparametric efficient estimation 7 Concluding Remarks Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 21 / 56
  • 22. Partial Likelihood approach • A classical likelihood-based approach for parameter estimation under non ignorable nonresponse is to maximize Lobs (θ, φ) with respect to (θ, φ), where Lobs (θ, φ) = δi =1 f (yi | xi ; θ) g (δi | xi , yi ; φ) × δi =0 f (yi | xi ; θ) g (δi | xi , yi ; φ) dyi • Such approach can be called full likelihood-based approach because it uses full information available in the observed data. • On the other hand, partial likelihood-based approach (or conditional likelihood approach) uses a subset of the sample. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 22 / 56
  • 23. Conditional Likelihood approach Idea • Since f (y | x)g(δ | x, y) = f1(y | x, δ)g1(δ | x), for some f1 and g1, we can write Lobs (θ) = δi =1 f1 (yi | xi , δi = 1) g1 (δi | xi ) × δi =0 f1 (yi | xi , δi = 0) g1 (δi | xi ) dyi = δi =1 f1 (yi | xi , δi = 1) × n i=1 g1 (δi | xi ) . • The conditional likelihood is defined to be the first component: Lc (θ) = δi =1 f1 (yi | xi , δi = 1) = δi =1 f (yi | xi ; θ)π(xi , yi ) f (y | xi ; θ)π(xi , y)dy , where π(x, yi ) = Pr(δi = 1 | xi , yi ). Popular in biased sampling literature. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 23 / 56
  • 24. Pseudo Likelihood approach Idea • Consider bivariate (xi , yi ) with density f (y | x; θ)h(x) where yi are subject to missingness. • We are interested in estimating θ. • Suppose that Pr(δ = 1 | x, y) depends only on y. (i.e. x is nonresponse instrument) • Note that f (x | y, δ) = f (x | y). • Thus, we can consider the following conditional likelihood Lc (θ) = δi =1 f (xi | yi , δi = 1) = δi =1 f (xi | yi ). • We can consider maximizing the pseudo likelihood Lp(θ) = δi =1 f (yi | xi ; θ)ˆh(xi ) f (yi | x; θ)ˆh(x)dx , where ˆh(x) is a consistent estimator of the marginal density of x. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 24 / 56
  • 25. Pseudo Likelihood approach Idea • We may use the empirical density in ˆh(x). That is, ˆh(x) = 1/n if x = xi . In this case, Lc (θ) = δi =1 f (yi | xi ; θ) n k=1 f (yi | xk ; θ) . • We can extend the idea to the case of x = (u, z) where z is a nonresponse instrument. In this case, the conditional likelihood becomes i:δi =1 p(zi | yi , ui ) = i:δi =1 f (yi | ui , zi ; θ)p(zi |ui ) f (yi | ui , z; θ)p(z|ui )dz . (3) Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 25 / 56
  • 26. Pseudo Likelihood approach • Let ˆp(z|u) be an estimated conditional probability density of z given u. Substituting this estimate into the likelihood in (3), we obtain the following pseudo likelihood: i:δi =1 f (yi | ui , zi ; θ)ˆp(zi |ui ) f (yi | ui , z; θ)ˆp(z|ui )dz . (4) • The pseudo maximum likelihood estimator (PMLE) of θ, denoted by ˆθp, can be obtained by solving Sp(θ; ˆα) ≡ δi =1 [S(θ; xi , yi ) − E{S(θ; ui , z, yi ) | yi , ui ; θ, ˆα}] = 0 for θ, where S(θ; x, y) = S(θ; u, z, y) = ∂ log f (y | x; θ)/∂θ and E{S(θ; ui , z, yi ) | yi , ui ; θ, ˆα} = S(θ; ui , z, yi )f (yi | ui , z; θ)p(z | ui ; ˆα)dz f (yi | ui , z; θ)p(z | ui ; ˆα)dz . Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 26 / 56
  • 27. Pseudo Likelihood approach • The Fisher-scoring method for obtaining the PMLE is given by ˆθ(t+1) p = ˆθ(t) p + Ip ˆθ(t) , ˆα −1 Sp(ˆθ(t) , ˆα) where Ip (θ, ˆα) = δi =1 E{S(θ; ui , z, yi )⊗2 | yi , ui ; θ, ˆα} − E{S(θ; ui , z, yi ) | yi , ui ; θ, ˆα}⊗2 . • First considered by Tang et al. (2003) and further developed by Zhao and Shao (2015). • Model section with penalized validation criterion is proposed by Fang and Shao (2016). Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 27 / 56
  • 28. 1 Introduction 2 Full likelihood-based ML estimation 3 Partial Likelihood approach 4 GMM method 5 Exponential tilting method 6 Semiparametric efficient estimation 7 Concluding Remarks Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 28 / 56
  • 29. Basic setup • (X, Y ): random variable • θ: Defined by solving E{U(θ; X, Y )} = 0. • yi is subject to missingness δi = 1 if yi responds 0 if yi is missing. • Want to find wi such that the solution ˆθw to n i=1 δi wi U(θ; xi , yi ) = 0 is consistent for θ. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 29 / 56
  • 30. Basic Setup • Result 1: The choice of wi = 1 E(δi | xi , yi ) (5) makes the resulting estimator ˆθw consistent. • Result 2: If δi ∼ Bernoulli(πi ), then using wi = 1/πi also makes the resulting estimator consistent, but it is less efficient than ˆθw using wi in (5). Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 30 / 56
  • 31. Parameter estimation : GMM method • Because z is a nonresponse instrumental variable, we may assume P(δ = 1 | x, y) = π(φ0 + φ1u + φ2y) for some (φ0, φ1, φ2). • Kott and Chang (2010): Construct a set of estimating equations such as n i=1 δi π(φ0 + φ1ui + φ2yi ) − 1 (1, ui , zi ) = 0 that are unbiased to zero. • May have overidentified situation: Use the generalized method of moments (GMM). Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 31 / 56
  • 32. GMM method Example 3 • Suppose that we are interested in estimating the parameters in the regression model yi = β0 + β1x1i + β2x2i + ei (6) where E(ei | xi ) = 0. • Assume that yi is subject to missingness and assume that P(δi = 1 | x1i , xi2, yi ) = exp(φ0 + φ1x1i + φ2yi ) 1 + exp(φ0 + φ1x1i + φ2yi ) . Thus, x2i is the nonresponse instrument variable in this setup. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 32 / 56
  • 33. Example 3 (Cont’d) • A consistent estimator of φ can be obtained by solving ˆU2(φ) ≡ n i=1 δ π(φ; x1i , yi ) − 1 (1, x1i , x2i ) = (0, 0, 0). (7) Roughly speaking, the solution to (7) exists almost surely if E{∂ ˆU2(φ)/∂φ} is of full rank in the neighborhood of the true value of φ. If x2 is vector, then (7) is overidentified and the solution to (7) does not exist. In the case, the GMM algorithm can be used. • Finding the solution to ˆU2(φ) = 0 can be obtained by finding the minimizer of Q(φ) = ˆU2(φ) ˆU2(φ) or QW (φ) = ˆU2(φ) W ˆU2(φ) where W = {V ( ˆU2)}−1 . Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 33 / 56
  • 34. Example 3 (Cont’d) • Once the solution ˆφ to (7) is obtained, then a consistent estimator of β = (β0, β1, β2) can be obtained by solving ˆU1(β, ˆφ) ≡ n i=1 δi ˆπi {yi − β0 − β1x1i − β2x2i } (1, x1i , x2i ) = (0, 0, 0) (8) for β. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 34 / 56
  • 35. Asymptotic Properties • The asymptotic variance of ˆβ obtained from (8) with ˆφ computed from the GMM can be obtained by V (ˆθ) ∼= ΓaΣ−1 a Γa −1 (9) where Γa = E{∂ ˆU(θ)/∂θ} Σa = V ( ˆU) ˆU = ( ˆU1, ˆU2) and θ = (β, φ). • Rigorous theory developed by Wang et al. (2014). Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 35 / 56
  • 36. 1 Introduction 2 Full likelihood-based ML estimation 3 Partial Likelihood approach 4 GMM method 5 Exponential tilting method 6 Semiparametric efficient estimation 7 Concluding Remarks Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 36 / 56
  • 37. Exponential tilting method Motivation • Parameter θ defined by E{U(θ; X, Y )} = 0. • We are interested in estimating θ from an expected estimating equation: n i=1 [δi U(θ; xi , yi ) + (1 − δi )E{U(θ; xi , Y ) | xi , δi = 0}] = 0. (10) • The conditional expectation in (10) can be evaluated by using f (y|x, δ = 0) = f (y|x) P(δ = 0|x, y) E{P(δ = 0|x, y)|x} (11) which requires correct specification of f (y | x; θ). Known to be sensitive to the choice of f (y | x; θ). Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 37 / 56
  • 38. Exponential tilting method Idea Instead of specifying a parametric model for f (y | x), consider specifying a parametric model for f (y | x, δ = 1), denoted by f1(y | x). In this case, f0 (yi | xi ) = f1 (yi | xi ) × O (xi , yi ) E {O (xi , Yi ) | xi , δi = 1} , (12) where fδ (yi | xi ) = f (yi | xi , δi = δ) and O (xi , yi ) = Pr (δi = 0 | xi , yi ) Pr (δi = 1 | xi , yi ) (13) is the conditional odds of nonresponse. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 38 / 56
  • 39. Remark • If the response probability follows from a logistic regression model π(xi , yi ) ≡ Pr (δi = 1 | xi , yi ) = exp {g(xi ) + φyi } 1 + exp {g(xi ) + φyi } , (14) where g(x) is completely unspecified, the expression (12) can be simplified to f0 (yi | xi ) = f1 (yi | xi ) × exp (γyi ) E {exp (γY ) | xi , δi = 1} , (15) where γ = −φ and f1 (y | x) is the conditional density of y given x and δ = 1. • Model (15) states that the density for the nonrespondents is an exponential tilting of the density for the respondents. The parameter γ is the tilting parameter that determines the amount of departure from the ignorability of the response mechanism. If γ = 0, the the response mechanism is ignorable and f0(y|x) = f1(y|x). • Kim and Yu (2011) proposed a method to estimate γ with validation sample. • Shao and Wang (2016) developed a method to estimate γ without using the validation sample. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 39 / 56
  • 40. Maximum Likelihood Estimation of tilting parameter • We wish to solve n i=1 [δi Si (φ) + (1 − δi )E{Si (φ) | xi , δi = 0}] = 0 with respect to φ, where Si (φ) is the score function of φ evaluated at (xi , yi ). • The conditional expectation can be written as E{Si (φ) | xi , δi = 0} = Si (φ)O(xi , y; φ)f1(y | xi )dy O(xi , y; φ)f1(y | xi )dy where O(x, y; φ) = 1 − π(φ; x, y) π(φ; x, y) . • Thus, E{Si (φ) | xi , δi = 0} = E1{Si (φ)O(xi,Y ; φ) | xi } E1{O(xi,Y ; φ) | xi } where E1{Q(xi , Y ) | xi } = Q(xi , y)f1(y | xi )dy. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 40 / 56
  • 41. Riddles et al. (2016) method • In practice, f1(y | x) is unknown and is estimated by ˆf1(y | x) = f1(y | x; ˆγ). • Thus, given ˆγ, a fully efficient estimator of φ can be obtained by solving S2(φ, ˆγ) ≡ n i=1 δi S(φ; xi , yi ) + (1 − δi ) ¯S0(φ | xi ; ˆγ, φ) = 0, (16) where ¯S0(φ | xi ; ˆγ, φ) = δj =1 S(φ; xi , yj )O(φ; xi , yj )f1(yj | xi ; ˆγ)/ˆf1(yj ) δj =1 O(φ; xi , yj )f1(yj | xi ; ˆγ)/ˆf1(yj ) and ˆf1(y) = n−1 R n i=1 δi f1(y | xi ; ˆγ). • Use EM algorithm to solve (16) for φ. • Semiparametric extension (Morikawa et al., 2017): Use a nonparametric density for f1(y | x). Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 41 / 56
  • 42. A Toy Example: Categorical Data (All dichotomous) Example (SRS, n = 10) ID Weight x1 x2 y 1 0.1 1 0 1 2 0.1 1 1 1 3 0.1 0 1 M 4 0.1 1 0 0 5 0.1 0 1 1 6 0.1 1 0 M 7 0.1 0 1 M 8 0.1 1 0 0 9 0.1 0 0 0 10 0.1 1 1 0 M: Missing Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 42 / 56
  • 43. A Toy Example (Cont’d) Assume P(δ = 1 | x1, x2, y) = π(x1, y) ID Weight x1 x2 y 1 0.1 1 0 1 2 0.1 1 1 1 3 0.1 · w3,0 0 1 0 0.1 · w3,1 0 1 1 4 0.1 1 0 0 5 0.1 0 1 1 w3,j = ˆP(Y = j | X1 = 0, X2 = 1, δ = 0) ∝ ˆP(Y = j | X1 = 0, X2 = 1, δ = 1) 1 − ˆπ(0, j) ˆπ(0, j) with w3,0 + w3,1 = 1 Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 43 / 56
  • 44. A Toy Example (Cont’d) ID Weight x1 x2 y 6 0.1 · w6,0 1 0 0 0.1 · w6,1 1 0 1 7 0.1 · w7,0 0 1 0 0.1 · w7,1 0 1 1 8 0.1 1 0 0 9 0.1 0 0 0 10 0.1 1 1 0 w6,j ∝ ˆP(Y = j | X1 = 1, X2 = 0, δ = 1) 1 − ˆπ(1, j) ˆπ(1, j) w7,j ∝ ˆP(Y = j | X1 = 0, X2 = 1, δ = 1) 1 − ˆπ(0, j) ˆπ(0, j) with w6,0 + w6,1 = w7,0 + w7,1 = 1. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 44 / 56
  • 45. Example (Cont’d) • E-step: Compute the conditional probability using the estimated response probability ˆπab. • M-step: Update the response probability using the fractional weights. For fully nonparametric model, ˆπab = δi =1 I(x1i = a, yi = b) δi =1 I(x1i = a, yi = b) + δi =0 1 j=0 wi,j I(x1i = a, y∗ ij = b) • The solution from the proposed method is ˆπ11 = 1, ˆπ10 = 3/4, ˆπ01 = 1/3, ˆπ00 = 1. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 45 / 56
  • 46. A Toy Example (Cont’d) Example (Cont’d) • The method can be viewed as a fractional imputation method of Kim (2011). • On the other hand, GMM method is more close to nonresponse weighting adjustment. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 46 / 56
  • 47. A Toy Example (Cont’d) Example GMM method ID Wgt 1 Wgt2 x1 x2 y 1 0.1 0.1ˆπ−1 11 1 0 1 2 0.1 0.1ˆπ−1 11 1 1 1 3 0.1 0.0 0 1 M 4 0.1 0.1ˆπ−1 10 1 0 0 5 0.1 0.1ˆπ−1 01 0 1 1 6 0.1 0.0 1 0 M 7 0.1 0.0 0 1 M 8 0.1 0.1ˆπ−1 10 1 0 0 9 0.1 0.1ˆπ−1 00 0 0 0 10 0.1 0.1ˆπ−1 10 1 1 0 M: Missing Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 47 / 56
  • 48. A Toy Example (Cont’d) • GMM method: Calibration equation i δi ˆπi I(x1i = a, x2i = b) = i I(x1i = a, x2i = b). 1 X1 = 1, X2 = 1: ˆπ−1 11 + ˆπ−1 10 = 2 2 X1 = 1, X2 = 0: ˆπ−1 11 + ˆπ−1 10 + ˆπ−1 10 = 4 3 X1 = 0, X2 = 1: ˆπ−1 01 = 3 4 X1 = 0, X2 = 0: ˆπ−1 00 = 1. • The solution of GMM method does not exist. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 48 / 56
  • 49. 1 Introduction 2 Full likelihood-based ML estimation 3 Partial Likelihood approach 4 GMM method 5 Exponential tilting method 6 Semiparametric efficient estimation 7 Concluding Remarks Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 49 / 56
  • 50. Setup Motivation • For simplicity, assume that θ = E(Y ) is of our interest. • A response model is assumed P(δ = 1 | x, y; φ) = π(φ; x, y). • Therefore, our restriction is only on the response model (semiparametric model). • What is the best way to estimate (φ, θ) in terms of the asymptotic variance? Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 50 / 56
  • 51. Semiparametric Efficient Estimation Morikawa and Kim (2017): The solution of the following estimating equations achieves the semiparametric efficiency bound for (φ, θ): n i=1 1 − δi π(φ; xi , yi ) geff (φ; xi ) = 0, n i=1 δi (θ − yi ) π(φ; xi , yi ) + 1 − δi π(φ; xi , yi ) (θ − Y (xi )) = 0, (17) where, geff (x; φ) = E { ˙π(φ; x, Y )/{1 − π(φ; x, Y )} | x}, ˙π(φ) = ∂π(φ)/∂φ, Y (x) = E (Y | x), E {g(x, Y ) | x} = E{O(x, Y )g(x, Y ) | x} E{O(x, Y ) | x} , and O(x, y) = 1/π(x, y) − 1. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 51 / 56
  • 52. Semiparametric Efficient Estimation • The two functions geff (φ; x) and Y (x) in (17) are to be estimated. • By using Bayes’s formula, for any function g(x, y), we have E {g(x, Y ) | x} = E{O(x, Y )g(x, Y ) | x} E{O(x, Y ) | x} = E1{π−1 (x, Y )O(x, Y )g(x, Y ) | x} E1{π−1(x, Y )O(x, Y ) | x} , (18) where E1(· | x) := E(· | x, δ = 1). • Hence, if the distribution of f1(y | x) := f (y | x, δ = 1) is specified, (18) can be computed; thus, the estimating equations (17) can be solved. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 52 / 56
  • 53. Semiparametric Efficient Estimation • Morikawa and Kim (2017) proposed two types of semiparametric estimators. • First estimator uses a parametric f1(y | x; γ) model and solve (17) with estimated ˆγ. The estimator has consistency and asymptotic normality even if the specification of f1 is wrong. Also, if the specification is correct, it attains the semiparametric efficiency bound. • Second estimator uses a nonparametrc model on f1 model and solve (17). The estimator always attains the semiparametric efficiency bound, but it does not work well when the dimension of x is high. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 53 / 56
  • 54. Remark • The concept of the semiparametric efficiency bound and the semiparametric theory are rigorously developed: Bickel et al. (1998). • Extension to parameters which may restrict the data generation such as the regression parameter: Robins et al. (1999). • The estimator for φ in (17) is a special case of Kott and Chang (2010): set geff (φ; xi ) as the calibration condition. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 54 / 56
  • 55. 1 Introduction 2 Full likelihood-based ML estimation 3 Partial Likelihood approach 4 GMM method 5 Exponential tilting method 6 Semiparametric efficient estimation 7 Concluding Remarks Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 55 / 56
  • 56. Concluding remarks • Uses a model for the response probability. • Parameter estimation for response model can be implemented using the idea of maximum likelihood method or method of moments. • Instrumental variable or model restriction on f1 is needed for identifiability of the response model. • Less tools for model diagnostics or model validation Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 56 / 56
  • 57. REFERENCES Bickel, P. J., C. A. J. Klaassen, Y. Ritov, and J. A. Wellner (1998). Efficient and Adaptive Estimation for Semiparametric Models. Springer. Fang, F. and J. Shao (2016). Model selection with nonignorable nonresponse. Biometrika 103(1), 1–14. Kenward, M. G. and G. Molenberghs (1988). Likelihood based frequentist inference when data are missing at random. Statistical Science 13, 236–247. Kim, J. K. (2011). Parametric fractional imputation for missing data analysis. Biometrika 98, 119–132. Kim, J. K. and J. Shao (2013). Statistical Methods for Handling Incomplete Data. Chapman & Hall / CRC. Kim, J. K. and C. L. Yu (2011). A semi-parametric estimation of mean functionals with non-ignorable missing data. Journal of the American Statistical Association 106, 157–165. Kott, P. S. and T. Chang (2010). Using calibration weighting to adjust for nonignorable unit nonresponse. Journal of the American Statistical Association 105, 1265–1275. Little, R. (1985). A note about models for selectivity bias. Econometrica 53, 1469–1474. Morikawa, K. and J. K. Kim (2017). Semiparametric adaptive estimation with nonignorable nonresponse data. Submitted. Morikawa, K., J. K. Kim, and Y. Kano (2017). Semiparametric maximum likelihood estimation under nonignorable nonresponse. Submitted. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 56 / 56
  • 58. Riddles, M. K., J. K. Kim, and J. Im (2016). A propensity-score-adjustment method for nonignorable nonresponse. J. Surv. Stat. Methodol. 4(2), 215–245. Robins, J. M., A. Rotnitzky, and D. O. Scharfstein (1999). Sensitivity Analysis for Selection Bias and Unmeasured Confounding in Missing Data and Causal Inference Models, Volume 116 of Statistical Models in Epidemiology: The Environment and Clinical Trials. NY: Springer-Verlag. Scharfstein, D., A. Rotnizky, and J. M. Robins (1999). Adjusting for nonignorable dropout using semi-parametric models. Journal of the American Statistical Association 94, 1096–1146. Shao, J. and L. Wang (2016). Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika 103(1), 175–187. Tang, G., R. J. A. Little, and T. E. Raghunathan (2003). Analysis of multivariate missing data with nonignorable nonresponse. Biometrika 90, 747–764. Wang, S., J. Shao, and J. K. Kim (2014). Identifiability and estimation in problems with nonignorable nonresponse. Statistica Sinica 24, 1097 – 1116. Wei, G. C. and M. A. Tanner (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association 85, 699–704. Zhao, J. and J. Shao (2015). Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data. Journal of the American Statistical Association 110(512), 1577–1590. Jae-Kwang Kim (ISU) Nonignorable missing June 8, 2017 56 / 56