SlideShare a Scribd company logo
1 of 31
Download to read offline
Predictive mean matching imputation in survey
sampling
Shu Yang Jae Kwang Kim
Iowa State University
June 14, 2017
Outline
1 Basic Setup
2 Main result
3 Nearest neighbor imputation
4 Variance estimation
5 Simulation study
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 2 / 31
Predictive mean matching: Basic Setup
FN = {(xi , yi , δi ), i = 1, 2, · · · , N}: finite population, where
δi =
1 if yi is observed
0 otherwise
Note that δi are defined throughout the population. It is referred to
as Reverse approach (Fay, 1992; Shao and Steel, 1999; Kim, Navarro,
and Fuller, 2006; Berg, Kim, and Skinner, 2016).
Parameter of interest: µ = N−1 N
i=1 yi .
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 3 / 31
Basic Setup (Cont’d)
Let A be the index set of the probability sample selected from FN.
The first order inclusion probability πi are known in the sample.
We observe (xi , δi , δi yi ) for i ∈ A.
Imputation estimator of µ is
ˆµ =
1
N
i∈A
1
πi
{δi yi + (1 − δi )y∗
i }
where y∗
i is an imputed estimator of yi for unit i with δi = 0.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 4 / 31
Assumptions
We assume
E(yi | xi ) = m(xi ; β∗
)
where m(·) is a function of x known up to β∗.
We assume MAR (missing-at-random) in the sense that
P(δ = 1 | x, y) = P(δ = 1 | x)
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 5 / 31
Regression Imputation
Two-step imputation
1 Obtain a consistent estimator of β∗:
i∈A
1
πi
δi {yi − m(xi ; β)}g(xi ; β) = 0
for some g(xi ; β). That is, find ˆβ that satisfies ˆβ = β∗ + op(1).
2 Compute ˆyi = m(xi ; ˆβ) and use y∗
i = ˆyi (deterministic imputation) or
y∗
i = ˆyi + ˆe∗
i (stochastic imputation).
More rigorous theory can be found in Shao and Steel (1999) and Kim and
Rao (2009).
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 6 / 31
Predictive Mean Matching (PMM) Imputation
1 Obtain ˆβ satisfying ˆβ = β∗ + op(1).
2 For each unit i with δi = 0, obtain a predicted value of yi as
ˆyi = m(xi ; ˆβ).
3 Find the nearest neighbor of unit i from the respondents with the
minimum distance between yj and ˆyi . Let i(1) be the index of the
nearest neighbor of unit i, which satisfies
d(yi(1), ˆyi ) ≤ d(yj , ˆyi ),
for any j ∈ AR, where d(yi , yj ) = |yi − yj | and AR is the set of the
respondents.
4 Use y∗
i = yi(1) for δi = 0.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 7 / 31
PMM Imputation
PMM estimator
ˆµpmm =
1
N
i∈A
1
πi
δi yi + (1 − δi )yi(1) . (1)
It is a hot deck imputation (using real observations as the imputed
values).
Because it uses a model information for E(y | x), it can be efficient if
the model is good.
Popular imputation method, but its asymptotic properties are not
investigated rigorously. Asymptotic properties and variance estimation
are open research problems.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 8 / 31
Remark
Express the PMM estimator (1) as a function of ˆβ:
ˆµpmm(β) = N−1
i∈A
π−1
i {δi yi + (1 − δi )yi(1)}
= N−1



i∈A
π−1
i δi yi +
j∈A
π−1
j (1 − δj )
i∈A
δi dij yi



= N−1
i∈A
π−1
i δi (1 + κβ,i )yi ,
where
κβ,i =
j∈A
πi π−1
j (1 − δj )dij (2)
and dij = 1 if yj(1) = yi and dij = 0 otherwise.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 9 / 31
Remark (Cont’d)
If the regression imputation were used then the imputation estimator
is a smooth function of β and the standard linearization method can
be used to investigate the asymptotic properties of ˆµReg (ˆβ).
In the PMM imputation, ˆµPMM(β) is not a smooth function of β and
we cannot apply the standard linearization method.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 10 / 31
2. Main result
Overview
Theorem 1: We will first establish the asymptotic normality of
n1/2{ˆµpmm(β∗) − µ} when β∗ is known.
Theorem 2: Next, we will establish the asymptotic normality of
n1/2{ˆµpmm(ˆβ) − µ}.
Theorem 3: In addition, we also discuss asymptotic properties of the
nearest neighbor imputation estimator.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 11 / 31
Basic Idea
Express
ˆµpmm(β) − µ = Dn(β) + Bn(β), (3)
where
Dn(β) =
1
N
i∈A
1
πi
{m(xi ; β) + δi (1 + κβ,i ){yi − m(xi ; β)} −
N
i=1
yi
and
Bn(β) =
1
N
i∈A
1
πi
(1 − δi ){m(xi(1); β) − m(xi ; β)}.
The difference m(xi(1); β) − m(xi ; β) accounts for the matching
discrepancy, and Bn(β) contributes to the asymptotic bias of the
matching estimator.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 12 / 31
Asymptotic bias (Abadie and Imbens, 2006)
If the matching for the nearest neighbor is based on x, then
d(xi(1), xi ) = Op(n−1/p
),
where p is the dimension of x. Thus, for the classical nearest
neighbor imputation using x, the asymptotic bias
Bn =
1
N
i∈A
1
πi
(1 − δi ){m(xi(1)) − m(xi )}
satisfies Bn = Op(n−1/p) which is not negligible for p ≥ 2.
For the case of PMM, we use a scalar function m(x) to find the
nearest neighbor. Thus, Bn = Op(n−1) and the PMM estimator is
asymptotically unbiased.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 13 / 31
Theorem 1
Suppose that m(x) = E(y | x) = m(xi ; β∗) and σ2(x) = var(y | x). Under
the regularity conditions (skipped), we have
n1/2
{ˆµpmm(β∗
) − µ} → N(0, V1)
in distribution, as n → ∞, where
V1 = V m
+ V e
(4)
with
V m
=
n
N2
E V
i∈A
π−1
i m(xi ) | FN ,
V e
= E
n
N2
i∈A
π−1
i δi (1 + κβ∗,i ) − 1
2
σ2
(xi ) ,
and κβ,i is defined in (2).
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 14 / 31
Theorem 2
Under some additional regularity conditions, we have
n1/2
{ˆµpmm(ˆβ) − µ} → N(0, V2)
in distribution, as n → ∞, with
V2 = V1 − γ2V −1
s γ2 + γ1τ−1
Vsτ−1
γ1 (5)
where V1 is defined in (4), γ1 = E{ ˙m(x; β)},
γ2 = p lim
1
N
N
i=1
n
N
1
πi
(1 + κβ∗,i ) − 1 δi ˙m(xi ; β∗
)
and ˙m(x; β) = ∂m(x; β)/∂β .
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 15 / 31
Remark
1 The second term in (5),
V2 − V1 = −γ2V −1
s γ2 + γ1τ−1
Vsτ−1
γ1,
reflects the effect of using ˆβ instead of β∗ in the PMM imputation.
2 If n/N = o(1) and m(xi ; β) = β0 + β1xi with scalar x, then γ1 = γ2.
Furthermore, under SRS with g(x; β) = ˙m(x; β)/σ2(xi ) in the
estimating function of ˆβ, then V −1
s = τ−1Vsτ−1 . In this case,
V2 = V1.
3 In general, V2 is different from V1 and the effect of the sampling error
of ˆβ should be reflected in the variance estimation of ˆµPMM(ˆβ).
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 16 / 31
3. Nearest neighbor imputation
Instead of using a distance function on y, one may use a distance
function on x. Thus, the nearest neighbor of unit i, denoted by i(1),
can be determined to satisfy
d(xi(1), xi ) ≤ d(xj , xi ),
for j ∈ AR, where d(xi , xj ) is the distance function between xi and xj .
The NNI estimator can be written as
ˆµNNI = N−1
i∈A
π−1
i δi yi + (1 − δi )yi(1) .
Here, the only difference is on the matching variable for identifying
i(1).
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 17 / 31
Asymptotic properties
We can obtain a similar decomposition in (3):
ˆµNNI − µ = Dn + Bn,
where
Dn =
1
N
i∈A
1
πi
{m(xi ) + δi (1 + κi ){yi − m(xi )} −
N
i=1
yi
and
Bn =
1
N
i∈A
1
πi
(1 − δi ){m(xi(1)) − m(xi )}.
The asymptotic bias is not negligible for p ≥ 2: Bn = Op(n−1/p)
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 18 / 31
Bias-corrected NNI estimator
Let ˆm(x) be a (nonparametric) estimator of m(x) = E(y | x).
We can estimate Bn by
ˆBn =
1
N
i∈A
1
πi
(1 − δi ){ ˆm(xi(1)) − ˆm(xi )}.
A bias-corrected NNI estimator of µ is
ˆµNNI,bc = N−1
i∈A
π−1
i {δi yi + (1 − δi )y∗
i }
where y∗
i = ˆm(xi ) + yi(1) − ˆm(xi(1)).
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 19 / 31
Theorem 3
Under some regularity conditions, the bias corrected NNI estimator is
asymptotically equivalent to the PMM estimator with known β∗. That is,
n1/2
{ˆµNNI,bc − ˆµPMM(β∗
)} = op(1).
Thus, we have
n1/2
(ˆµNNI,bc − µ) → N(0, V1)
in distribution, as n → ∞, where V1 is defined in (4).
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 20 / 31
4. Replication variance estimation
If there is no nonresponse, we can use
ˆVrep(ˆµ) =
L
k=1
ck ˆµ(k)
− ˆµ
2
as a variance estimator of ˆµ = i∈A wi yi , where ˆµ(k) = i∈A w
(k)
i yi .
For example, in the delele-1 jackknife method under SRS, we have
L = n, ck = (n − 1)/n, and
w
(k)
i =
(n − 1)−1 if i = k
0 if i = k
We are interested in estimating the variance of PMM imputation
estimator.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 21 / 31
4. Replication variance estimation
Approach 1
Idea: Apply bootstrap (or jackknife) method and repeat the same
imputation method.
Such approach provides consistent variance estimation for regression
imputation.
ˆµ
(k)
reg,I =
i∈A
w
(k)
i δi yi + (1 − δi )m(xi ; ˆβ(k)
)
However, in the stochastic regression imputation, such approach does
not work because it does not capture the random imputation part
correctly.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 22 / 31
4. Replication variance estimation
Note that
ˆµreg,I2 =
i∈A
wi {δi yi + (1 − δi )(ˆyi + ˆe∗
i )}
=
i∈A
wi {ˆyi + δi (1 + κi )ˆei } ,
where κi is defined in (2).
Thus, we can write
ˆµreg,I2(β) =
i∈A
wi {m(xi ; β) + δi (1 + κi )(yi − m(xi ; β))}
=
i∈A
wi f (xi , yi , δi , κi ; β)
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 23 / 31
4. Replication variance estimation
Approach 2:
Idea: If the imputation estimator can be expressed as
ˆµI =
i∈A
wi f (xi , yi , δi , κi ; ˆβ)
for some f function (known form), we can use
ˆµ
(k)
I =
i∈A
w
(k)
i f (xi , yi , δi , κi ; ˆβ(k)
)
to construct a replication variance estimator
ˆVrep =
L
k=1
ck ˆµ
(k)
I − ˆµI
2
.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 24 / 31
Variance estimation for PMM estimator
1 Obtain the k-th replicate of ˆβ, denoted by ˆβ(k), by solving
i∈A
w
(k)
i δi {yi − m(xi ; β)}g(xi ; β) = 0
for β.
2 Calculate the k-th replicate of ˆµpmm by
ˆµ
(k)
pmm =
i∈A
w
(k)
i [m(xi ; ˆβ(k)
) + δi (1 + κi ){yi − m(xi ; ˆβ(k)
)}].
3 Compute
ˆVrep =
L
k=1
ck ˆµ
(k)
pmm − ˆµpmm
2
.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 25 / 31
5. Simulation Study
We wish to answer the following questions from a simulation study:
1 Is the bias of PMM imputation estimator asymptotically negligible?
2 Does the bias of NNI estimator remain significant for p ≥ 2?
3 Is the PMM imputation estimator more robust than the regression
imputation estimator?
Other issues
Efficiency comparison
Variance estimation validity
Coverage property for Normal-based Interval estimators.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 26 / 31
Simulation Setup 1
Population model (N = 50, 000)
1 P1: y = −1 + x1 + x2 + e
2 P2: y = −1.167 + x1 + x2 + (x1 − 0.5)2
+ (x2 − 0.5)2
+ e
3 P3: y = −1.5 + x1 + x2 + x2 + x4 + x5 + x6 + e
where x1, x2, x3 ∼ Uniform(0, 1), x4, x5, x6, e ∼ N(0, 1).
Sampling design:
1 SRS of size n = 400
2 PPS of size n = 400 with size measure si = log(|yi + νi | + 4) where
νi ∼ N(0, 1).
Response probability: δi ∼ Bernoulli{p(xi )} where
logit{p(xi )} = 0.2 + x1i + x2i . The overall response rate is 75%.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 27 / 31
Simulation Setup 2
Imputation methods (for P1 and P2)
1 NNI using (x1, x2)
2 PMM using ˆyi = ˆβ0 + ˆβ1x1i + ˆβ2x2i
3 Stochastic Regression imputation (SRI) using y∗
i = ˆyi + ˆe∗
i , where
ˆy = ˆβ0 + ˆβ1x1i + ˆβ2x2i .
Imputation methods (for P3)
1 NNI using (x1, x2, · · · , x6)
2 PMM using ˆyi = ˆβ0 + ˆβ1x1i + ˆβ2x2i + · · · + ˆβ6x6i
3 SRI using y∗
i = ˆyi + ˆe∗
i , where ˆyi = ˆβ0 + ˆβ1x1i + ˆβ2x2i + · · · + ˆβ6x6i .
Parameter of interest: θ = E(Y )
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 28 / 31
Simulation Result 1
Table: Simulation results: Bias (×102
) and S.E. (×102
) of the point estimator.
PMM NNI SRI
Bias S.E. Bias S.E. Bias S.E.
Simple Random Sampling
(P1) -0.15 6.46 -0.21 6.54 -0.23 6.44
(P2) -0.22 6.54 -0.25 6.55 -0.37 6.46
(P3) 1.90 11.85 18.59 11.06 0.11 11.17
Probability Proportional to Size Sampling
(P1) 0.05 6.46 0.13 6.37 0.18 6.53
(P2) 0.30 6.52 0.12 6.47 0.16 6.60
(P3) 1.33 10.99 17.53 10.70 0.40 11.10
PMM: predictive mean matching; NNI: nearest neighbor imputation; SRI:
stochastic regression imputation.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 29 / 31
Simulation Result 2
Table: Simulation results: Relative Bias of jackknife variance estimates (×102
)
and Coverage Rate (%) of 95% confidence intervals.
PMM NNI SRI
RB CR RB CR RB CR
Simple Random Sampling
(P1) 5 95.1 3 95.1 5 95.8
(P2) 5 95.4 3 95.3 5 95.6
(P3) 4 95.2 4 63.8 4 95.5
Probability Proportional to Size Sampling
(P1) 2 95.5 3 94.8 2 94.9
(P2) 1 95.4 0 95.3 3 94.9
(P3) 7 95.8 3 65.5 -3 95.6
PMM: predictive mean matching; NNI: nearest neighbor imputation; SRI:
stochastic regression imputation.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 30 / 31
6. Discussion
The non-smoothness nature of the PMM estimator makes the
asymptotic result difficult to investigate.
Some recent econometric papers (Andreou and Werker, 2012; Abadie
and Imbens, 2016) provides some key ideas to solve this problem in
survey sampling.
The proof for Theorem 2 involves Martingale central limit theorem
and Le Cam’s third lemma.
Replication variance estimation can be constructed to properly
capture the variability of the PMM estimator.
Fractional imputation can be developed to address this problem. This
is a topic for future research.
Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 31 / 31

More Related Content

What's hot

3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMINGChristian Robert
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixturesChristian Robert
 
Comparison of the optimal design
Comparison of the optimal designComparison of the optimal design
Comparison of the optimal designAlexander Decker
 
ABC short course: introduction chapters
ABC short course: introduction chaptersABC short course: introduction chapters
ABC short course: introduction chaptersChristian Robert
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmLoc Nguyen
 
"reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli..."reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli...Christian Robert
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapterChristian Robert
 
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATARESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATAorajjournal
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapterChristian Robert
 
8517ijaia06
8517ijaia068517ijaia06
8517ijaia06ijaia
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testingChristian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...Eesti Pank
 
Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Christian Robert
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chaptersChristian Robert
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 

What's hot (20)

3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
Comparison of the optimal design
Comparison of the optimal designComparison of the optimal design
Comparison of the optimal design
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
Seattle.Slides.7
Seattle.Slides.7Seattle.Slides.7
Seattle.Slides.7
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
ABC short course: introduction chapters
ABC short course: introduction chaptersABC short course: introduction chapters
ABC short course: introduction chapters
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithm
 
"reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli..."reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli...
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapter
 
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATARESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
 
8517ijaia06
8517ijaia068517ijaia06
8517ijaia06
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
 
Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chapters
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 

Similar to Predictive mean-matching2

Fractional hot deck imputation - Jae Kim
Fractional hot deck imputation - Jae KimFractional hot deck imputation - Jae Kim
Fractional hot deck imputation - Jae KimJae-kwang Kim
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clusteringFrank Nielsen
 
Auto Regressive Process (1) with Change Point: Bayesian Approch
Auto Regressive Process (1) with Change Point: Bayesian ApprochAuto Regressive Process (1) with Change Point: Bayesian Approch
Auto Regressive Process (1) with Change Point: Bayesian ApprochIJRESJOURNAL
 
Cs229 notes7b
Cs229 notes7bCs229 notes7b
Cs229 notes7bVuTran231
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Christian Robert
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talkjasonj383
 
Bayesian Criteria based on Universal Measures
Bayesian Criteria based on Universal MeasuresBayesian Criteria based on Universal Measures
Bayesian Criteria based on Universal MeasuresJoe Suzuki
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012Zheng Mengdi
 
Slides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometrySlides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometryFrank Nielsen
 
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Frank Nielsen
 
Bayes Independence Test
Bayes Independence TestBayes Independence Test
Bayes Independence TestJoe Suzuki
 
Universal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or ContinuousUniversal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or ContinuousJoe Suzuki
 
Machine learning (8)
Machine learning (8)Machine learning (8)
Machine learning (8)NYversity
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 

Similar to Predictive mean-matching2 (20)

Fractional hot deck imputation - Jae Kim
Fractional hot deck imputation - Jae KimFractional hot deck imputation - Jae Kim
Fractional hot deck imputation - Jae Kim
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
Auto Regressive Process (1) with Change Point: Bayesian Approch
Auto Regressive Process (1) with Change Point: Bayesian ApprochAuto Regressive Process (1) with Change Point: Bayesian Approch
Auto Regressive Process (1) with Change Point: Bayesian Approch
 
Statistical Method In Economics
Statistical Method In EconomicsStatistical Method In Economics
Statistical Method In Economics
 
Cs229 notes7b
Cs229 notes7bCs229 notes7b
Cs229 notes7b
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talk
 
Bayesian Criteria based on Universal Measures
Bayesian Criteria based on Universal MeasuresBayesian Criteria based on Universal Measures
Bayesian Criteria based on Universal Measures
 
Aaapresmarsella
AaapresmarsellaAaapresmarsella
Aaapresmarsella
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
 
Side 2019 #9
Side 2019 #9Side 2019 #9
Side 2019 #9
 
Talk iccf 19_ben_hammouda
Talk iccf 19_ben_hammoudaTalk iccf 19_ben_hammouda
Talk iccf 19_ben_hammouda
 
Slides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometrySlides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometry
 
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...
 
Bayes Independence Test
Bayes Independence TestBayes Independence Test
Bayes Independence Test
 
Universal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or ContinuousUniversal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or Continuous
 
Ridge regression
Ridge regressionRidge regression
Ridge regression
 
Machine learning (8)
Machine learning (8)Machine learning (8)
Machine learning (8)
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 

Recently uploaded

Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Youngkajalvid75
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 

Recently uploaded (20)

Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 

Predictive mean-matching2

  • 1. Predictive mean matching imputation in survey sampling Shu Yang Jae Kwang Kim Iowa State University June 14, 2017
  • 2. Outline 1 Basic Setup 2 Main result 3 Nearest neighbor imputation 4 Variance estimation 5 Simulation study Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 2 / 31
  • 3. Predictive mean matching: Basic Setup FN = {(xi , yi , δi ), i = 1, 2, · · · , N}: finite population, where δi = 1 if yi is observed 0 otherwise Note that δi are defined throughout the population. It is referred to as Reverse approach (Fay, 1992; Shao and Steel, 1999; Kim, Navarro, and Fuller, 2006; Berg, Kim, and Skinner, 2016). Parameter of interest: µ = N−1 N i=1 yi . Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 3 / 31
  • 4. Basic Setup (Cont’d) Let A be the index set of the probability sample selected from FN. The first order inclusion probability πi are known in the sample. We observe (xi , δi , δi yi ) for i ∈ A. Imputation estimator of µ is ˆµ = 1 N i∈A 1 πi {δi yi + (1 − δi )y∗ i } where y∗ i is an imputed estimator of yi for unit i with δi = 0. Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 4 / 31
  • 5. Assumptions We assume E(yi | xi ) = m(xi ; β∗ ) where m(·) is a function of x known up to β∗. We assume MAR (missing-at-random) in the sense that P(δ = 1 | x, y) = P(δ = 1 | x) Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 5 / 31
  • 6. Regression Imputation Two-step imputation 1 Obtain a consistent estimator of β∗: i∈A 1 πi δi {yi − m(xi ; β)}g(xi ; β) = 0 for some g(xi ; β). That is, find ˆβ that satisfies ˆβ = β∗ + op(1). 2 Compute ˆyi = m(xi ; ˆβ) and use y∗ i = ˆyi (deterministic imputation) or y∗ i = ˆyi + ˆe∗ i (stochastic imputation). More rigorous theory can be found in Shao and Steel (1999) and Kim and Rao (2009). Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 6 / 31
  • 7. Predictive Mean Matching (PMM) Imputation 1 Obtain ˆβ satisfying ˆβ = β∗ + op(1). 2 For each unit i with δi = 0, obtain a predicted value of yi as ˆyi = m(xi ; ˆβ). 3 Find the nearest neighbor of unit i from the respondents with the minimum distance between yj and ˆyi . Let i(1) be the index of the nearest neighbor of unit i, which satisfies d(yi(1), ˆyi ) ≤ d(yj , ˆyi ), for any j ∈ AR, where d(yi , yj ) = |yi − yj | and AR is the set of the respondents. 4 Use y∗ i = yi(1) for δi = 0. Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 7 / 31
  • 8. PMM Imputation PMM estimator ˆµpmm = 1 N i∈A 1 πi δi yi + (1 − δi )yi(1) . (1) It is a hot deck imputation (using real observations as the imputed values). Because it uses a model information for E(y | x), it can be efficient if the model is good. Popular imputation method, but its asymptotic properties are not investigated rigorously. Asymptotic properties and variance estimation are open research problems. Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 8 / 31
  • 9. Remark Express the PMM estimator (1) as a function of ˆβ: ˆµpmm(β) = N−1 i∈A π−1 i {δi yi + (1 − δi )yi(1)} = N−1    i∈A π−1 i δi yi + j∈A π−1 j (1 − δj ) i∈A δi dij yi    = N−1 i∈A π−1 i δi (1 + κβ,i )yi , where κβ,i = j∈A πi π−1 j (1 − δj )dij (2) and dij = 1 if yj(1) = yi and dij = 0 otherwise. Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 9 / 31
  • 10. Remark (Cont’d) If the regression imputation were used then the imputation estimator is a smooth function of β and the standard linearization method can be used to investigate the asymptotic properties of ˆµReg (ˆβ). In the PMM imputation, ˆµPMM(β) is not a smooth function of β and we cannot apply the standard linearization method. Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 10 / 31
  • 11. 2. Main result Overview Theorem 1: We will first establish the asymptotic normality of n1/2{ˆµpmm(β∗) − µ} when β∗ is known. Theorem 2: Next, we will establish the asymptotic normality of n1/2{ˆµpmm(ˆβ) − µ}. Theorem 3: In addition, we also discuss asymptotic properties of the nearest neighbor imputation estimator. Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 11 / 31
  • 12. Basic Idea Express ˆµpmm(β) − µ = Dn(β) + Bn(β), (3) where Dn(β) = 1 N i∈A 1 πi {m(xi ; β) + δi (1 + κβ,i ){yi − m(xi ; β)} − N i=1 yi and Bn(β) = 1 N i∈A 1 πi (1 − δi ){m(xi(1); β) − m(xi ; β)}. The difference m(xi(1); β) − m(xi ; β) accounts for the matching discrepancy, and Bn(β) contributes to the asymptotic bias of the matching estimator. Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 12 / 31
  • 13. Asymptotic bias (Abadie and Imbens, 2006) If the matching for the nearest neighbor is based on x, then d(xi(1), xi ) = Op(n−1/p ), where p is the dimension of x. Thus, for the classical nearest neighbor imputation using x, the asymptotic bias Bn = 1 N i∈A 1 πi (1 − δi ){m(xi(1)) − m(xi )} satisfies Bn = Op(n−1/p) which is not negligible for p ≥ 2. For the case of PMM, we use a scalar function m(x) to find the nearest neighbor. Thus, Bn = Op(n−1) and the PMM estimator is asymptotically unbiased. Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 13 / 31
  • 14. Theorem 1 Suppose that m(x) = E(y | x) = m(xi ; β∗) and σ2(x) = var(y | x). Under the regularity conditions (skipped), we have n1/2 {ˆµpmm(β∗ ) − µ} → N(0, V1) in distribution, as n → ∞, where V1 = V m + V e (4) with V m = n N2 E V i∈A π−1 i m(xi ) | FN , V e = E n N2 i∈A π−1 i δi (1 + κβ∗,i ) − 1 2 σ2 (xi ) , and κβ,i is defined in (2). Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 14 / 31
  • 15. Theorem 2 Under some additional regularity conditions, we have n1/2 {ˆµpmm(ˆβ) − µ} → N(0, V2) in distribution, as n → ∞, with V2 = V1 − γ2V −1 s γ2 + γ1τ−1 Vsτ−1 γ1 (5) where V1 is defined in (4), γ1 = E{ ˙m(x; β)}, γ2 = p lim 1 N N i=1 n N 1 πi (1 + κβ∗,i ) − 1 δi ˙m(xi ; β∗ ) and ˙m(x; β) = ∂m(x; β)/∂β . Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 15 / 31
  • 16. Remark 1 The second term in (5), V2 − V1 = −γ2V −1 s γ2 + γ1τ−1 Vsτ−1 γ1, reflects the effect of using ˆβ instead of β∗ in the PMM imputation. 2 If n/N = o(1) and m(xi ; β) = β0 + β1xi with scalar x, then γ1 = γ2. Furthermore, under SRS with g(x; β) = ˙m(x; β)/σ2(xi ) in the estimating function of ˆβ, then V −1 s = τ−1Vsτ−1 . In this case, V2 = V1. 3 In general, V2 is different from V1 and the effect of the sampling error of ˆβ should be reflected in the variance estimation of ˆµPMM(ˆβ). Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 16 / 31
  • 17. 3. Nearest neighbor imputation Instead of using a distance function on y, one may use a distance function on x. Thus, the nearest neighbor of unit i, denoted by i(1), can be determined to satisfy d(xi(1), xi ) ≤ d(xj , xi ), for j ∈ AR, where d(xi , xj ) is the distance function between xi and xj . The NNI estimator can be written as ˆµNNI = N−1 i∈A π−1 i δi yi + (1 − δi )yi(1) . Here, the only difference is on the matching variable for identifying i(1). Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 17 / 31
  • 18. Asymptotic properties We can obtain a similar decomposition in (3): ˆµNNI − µ = Dn + Bn, where Dn = 1 N i∈A 1 πi {m(xi ) + δi (1 + κi ){yi − m(xi )} − N i=1 yi and Bn = 1 N i∈A 1 πi (1 − δi ){m(xi(1)) − m(xi )}. The asymptotic bias is not negligible for p ≥ 2: Bn = Op(n−1/p) Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 18 / 31
  • 19. Bias-corrected NNI estimator Let ˆm(x) be a (nonparametric) estimator of m(x) = E(y | x). We can estimate Bn by ˆBn = 1 N i∈A 1 πi (1 − δi ){ ˆm(xi(1)) − ˆm(xi )}. A bias-corrected NNI estimator of µ is ˆµNNI,bc = N−1 i∈A π−1 i {δi yi + (1 − δi )y∗ i } where y∗ i = ˆm(xi ) + yi(1) − ˆm(xi(1)). Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 19 / 31
  • 20. Theorem 3 Under some regularity conditions, the bias corrected NNI estimator is asymptotically equivalent to the PMM estimator with known β∗. That is, n1/2 {ˆµNNI,bc − ˆµPMM(β∗ )} = op(1). Thus, we have n1/2 (ˆµNNI,bc − µ) → N(0, V1) in distribution, as n → ∞, where V1 is defined in (4). Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 20 / 31
  • 21. 4. Replication variance estimation If there is no nonresponse, we can use ˆVrep(ˆµ) = L k=1 ck ˆµ(k) − ˆµ 2 as a variance estimator of ˆµ = i∈A wi yi , where ˆµ(k) = i∈A w (k) i yi . For example, in the delele-1 jackknife method under SRS, we have L = n, ck = (n − 1)/n, and w (k) i = (n − 1)−1 if i = k 0 if i = k We are interested in estimating the variance of PMM imputation estimator. Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 21 / 31
  • 22. 4. Replication variance estimation Approach 1 Idea: Apply bootstrap (or jackknife) method and repeat the same imputation method. Such approach provides consistent variance estimation for regression imputation. ˆµ (k) reg,I = i∈A w (k) i δi yi + (1 − δi )m(xi ; ˆβ(k) ) However, in the stochastic regression imputation, such approach does not work because it does not capture the random imputation part correctly. Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 22 / 31
  • 23. 4. Replication variance estimation Note that ˆµreg,I2 = i∈A wi {δi yi + (1 − δi )(ˆyi + ˆe∗ i )} = i∈A wi {ˆyi + δi (1 + κi )ˆei } , where κi is defined in (2). Thus, we can write ˆµreg,I2(β) = i∈A wi {m(xi ; β) + δi (1 + κi )(yi − m(xi ; β))} = i∈A wi f (xi , yi , δi , κi ; β) Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 23 / 31
  • 24. 4. Replication variance estimation Approach 2: Idea: If the imputation estimator can be expressed as ˆµI = i∈A wi f (xi , yi , δi , κi ; ˆβ) for some f function (known form), we can use ˆµ (k) I = i∈A w (k) i f (xi , yi , δi , κi ; ˆβ(k) ) to construct a replication variance estimator ˆVrep = L k=1 ck ˆµ (k) I − ˆµI 2 . Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 24 / 31
  • 25. Variance estimation for PMM estimator 1 Obtain the k-th replicate of ˆβ, denoted by ˆβ(k), by solving i∈A w (k) i δi {yi − m(xi ; β)}g(xi ; β) = 0 for β. 2 Calculate the k-th replicate of ˆµpmm by ˆµ (k) pmm = i∈A w (k) i [m(xi ; ˆβ(k) ) + δi (1 + κi ){yi − m(xi ; ˆβ(k) )}]. 3 Compute ˆVrep = L k=1 ck ˆµ (k) pmm − ˆµpmm 2 . Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 25 / 31
  • 26. 5. Simulation Study We wish to answer the following questions from a simulation study: 1 Is the bias of PMM imputation estimator asymptotically negligible? 2 Does the bias of NNI estimator remain significant for p ≥ 2? 3 Is the PMM imputation estimator more robust than the regression imputation estimator? Other issues Efficiency comparison Variance estimation validity Coverage property for Normal-based Interval estimators. Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 26 / 31
  • 27. Simulation Setup 1 Population model (N = 50, 000) 1 P1: y = −1 + x1 + x2 + e 2 P2: y = −1.167 + x1 + x2 + (x1 − 0.5)2 + (x2 − 0.5)2 + e 3 P3: y = −1.5 + x1 + x2 + x2 + x4 + x5 + x6 + e where x1, x2, x3 ∼ Uniform(0, 1), x4, x5, x6, e ∼ N(0, 1). Sampling design: 1 SRS of size n = 400 2 PPS of size n = 400 with size measure si = log(|yi + νi | + 4) where νi ∼ N(0, 1). Response probability: δi ∼ Bernoulli{p(xi )} where logit{p(xi )} = 0.2 + x1i + x2i . The overall response rate is 75%. Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 27 / 31
  • 28. Simulation Setup 2 Imputation methods (for P1 and P2) 1 NNI using (x1, x2) 2 PMM using ˆyi = ˆβ0 + ˆβ1x1i + ˆβ2x2i 3 Stochastic Regression imputation (SRI) using y∗ i = ˆyi + ˆe∗ i , where ˆy = ˆβ0 + ˆβ1x1i + ˆβ2x2i . Imputation methods (for P3) 1 NNI using (x1, x2, · · · , x6) 2 PMM using ˆyi = ˆβ0 + ˆβ1x1i + ˆβ2x2i + · · · + ˆβ6x6i 3 SRI using y∗ i = ˆyi + ˆe∗ i , where ˆyi = ˆβ0 + ˆβ1x1i + ˆβ2x2i + · · · + ˆβ6x6i . Parameter of interest: θ = E(Y ) Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 28 / 31
  • 29. Simulation Result 1 Table: Simulation results: Bias (×102 ) and S.E. (×102 ) of the point estimator. PMM NNI SRI Bias S.E. Bias S.E. Bias S.E. Simple Random Sampling (P1) -0.15 6.46 -0.21 6.54 -0.23 6.44 (P2) -0.22 6.54 -0.25 6.55 -0.37 6.46 (P3) 1.90 11.85 18.59 11.06 0.11 11.17 Probability Proportional to Size Sampling (P1) 0.05 6.46 0.13 6.37 0.18 6.53 (P2) 0.30 6.52 0.12 6.47 0.16 6.60 (P3) 1.33 10.99 17.53 10.70 0.40 11.10 PMM: predictive mean matching; NNI: nearest neighbor imputation; SRI: stochastic regression imputation. Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 29 / 31
  • 30. Simulation Result 2 Table: Simulation results: Relative Bias of jackknife variance estimates (×102 ) and Coverage Rate (%) of 95% confidence intervals. PMM NNI SRI RB CR RB CR RB CR Simple Random Sampling (P1) 5 95.1 3 95.1 5 95.8 (P2) 5 95.4 3 95.3 5 95.6 (P3) 4 95.2 4 63.8 4 95.5 Probability Proportional to Size Sampling (P1) 2 95.5 3 94.8 2 94.9 (P2) 1 95.4 0 95.3 3 94.9 (P3) 7 95.8 3 65.5 -3 95.6 PMM: predictive mean matching; NNI: nearest neighbor imputation; SRI: stochastic regression imputation. Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 30 / 31
  • 31. 6. Discussion The non-smoothness nature of the PMM estimator makes the asymptotic result difficult to investigate. Some recent econometric papers (Andreou and Werker, 2012; Abadie and Imbens, 2016) provides some key ideas to solve this problem in survey sampling. The proof for Theorem 2 involves Martingale central limit theorem and Le Cam’s third lemma. Replication variance estimation can be constructed to properly capture the variability of the PMM estimator. Fractional imputation can be developed to address this problem. This is a topic for future research. Yang & Kim (ISU) Predictive Mean Matching Imputation June 14, 2017 31 / 31