Double Robustness: Theory and Applications with Missing Data

1. Double Robustness: Theory and Applications with Missing Data Lu Mao Department of Biostatistics The University of North Carolina at Chapel Hill Email: lmao@unc.edu April 17, 2013 1/49

2. Table of Contents Part I: A Semiparametric Perspective A motivating example Semiparametric approachs to coarsened data Constructing the estimating equation Part II: Applications in Missing Data Problems Data with two levels of missingness Monotone coarsened data 2/49

3. Part I: A Semiparametric Perspective 3/49

4. A Motivating Example I Given an iid sample Y1; ; Yn from an arbitrary distribution, consider the estimation of the population mean = EY by Y , which solves Pn(Y ) = 0; where PnZ 1 n Pn i=1 Zi. I Suppose some of the Yi's are missing. Let Ri = 1 if Yi is observed and = 0 if otherwise. Let (Y ) = P(R = 1jY ). Now consider estimating by solving PnR(Y ) = 0 resulting in ^CC = P PRiYi Ri !p E[(Y )Y ] E(Y ) 6= : 4/49

5. I Suppose in addition to Yi, an auxilary variable Xi is also collected, and R ? Y jX. Assume P(R = 1jY;X) = (X; ). To correct the bias, we apply the estimating equation ( ^n is a consistent estimator for 0) IPW n = Pn R (X; ^n) (Y ) resulting in ^IPW = Pn[RY=(X; ^n)] Pn[R=(X; ^n)] = Pn[RY=(X; 0)] Pn[R=(X; 0)] + op(1) !p E[RY=(X; 0)] E[R=(X; 0)] = (1) 5/49

6. I Assume (X) = E(Y jX; ), and consider a new estimating equation as a modi

7. cation of IPW n : DR n = Pn R (X; ^n) (Y ) R (X; ^n) (X; ^n) ! ; ((X; ^n) ) (2) resulting in ^DR = Pn R (X; ^n) Y R (X; ^n) (X; ^n) ! : (3) (X; ^n) Now let's study the consistency of ^DR under dierent assumptions. 6/49

8. I Scenario 1. (X; ) correct; (X; ) incorrect. So, ^n !p 0, but ^n ! , with (X; )6= E(Y jX): ^DR = Pn R (X; 0) Y R (X; 0) (X; 0) (X; ) + op(1) !p E R (X; 0) Y E R (X; 0) (X; 0) (X; ) = E Y E R (X; 0)

12. Y;X E (X; )E R (X; 0) (X; 0)

16. Y;X = 0 = (4) 7/49

17. I Scenario 2.(X; ) correct; (X; ) incorrect; So, ^n !p 0, but ^n ! , with (X; )6= E(RjY;X): ^DR = Pn R (X; ) Y R (X; ) (X; ) (X; 0) + op(1) !p E R (X; ) Y E R (X; ) (X; ) (X; 0) = E R (X; ) E(Y jR;X) E R (X; ) (X; ) (X; 0) = E R (X; ) E (X; 0) R (X; ) (X; ) (X; 0) = E[E((X; 0))] = : (5) 8/49

18. Result 1 (Double robustness) ^DR is consistent if either the model or the model is correct, that is, under M1 [M2, where M1 = fp(rjy; x; ) : 2 1g, and M2 = fp(yjx; ) : 2 2g. In other words, ^DR is doubly robust. I Now, let's consider a somewhat dierent question: eciency under M1 M2. For simplicity, we assume we know the true values (0; 0). I Denote Gng(Z) = p1 n Pn i=1[g(Zi) Eg(Z)]. Algebraic manipulations yield: p n(^IPW ) = 1 Pn[R=(X; 0)] Gn R 0(X; ) (Y ) N(0; 2 IPW): (6) 9/49

19. I where 2 IPW = E R (X; 0) (Y ) 2 = E (Y )2 (X; 0) : I Similarly p n(^DR ) = Gn R (X; 0) (Y ) R (X; 0) (X; 0) ((X; 0) ) N(0; 2D R); (7) where 2D R = E R (X; 0) 2 (Y ) 2E R (X; 0) (Y ) R (X; 0) (X; 0) ((X; 0) ) + E R (X; 0) (X; 0) 2 ((X; 0) ) 10/49

20. I 2D R = E (Y )2 (X; 0) 2E 1 (X; 0) (X; 0) ((X; 0) )2 + E 1 (X; 0) (X; 0) ((X; 0) )2 = 2 IPW E 1 (X; 0) (X; 0) ((X; 0) )2 (X;0) (Y ), A = R(X;0) (X;0) ((X; 0) ), I IPW = R 22 and DR = IPW + A. Consider the Hilbert space L2(P). Since IPW ^and DR ^have in uence functions IPW and DR respectively, their squared length (jj jj E()2) are the asumptotic variances for ^IPW and ^DR. 11/49

21. I The following

22. gure provides a geometric illustration: Figure : A geometric interpretation of eciency improvement by the DR. Result 2 (Eciency of DR) ^DR is more ecient than ^IPW under M1 M2. 12/49

23. Remark 1.1 The above example suggests that I For a full data problem, there is a natural extension, via the IPW (inverse probability weighting) method, to a corresponding missing data problem; I By positing a working model p(zmisjzobs; ), the IPW estimating equation can be modi

24. ed by adding a suitable augmentation term, resulting in an estimator that is still consistent even if the working model p(zmisjzobs; ) is not correct; I If in case p(zmisjzobs; ) is correct, the new estimator is consistent even if missing mechanism is incorrectly modeled. In this sense, the new estimator is doubly robust; I The doubly robust estimator has improved eciency if both models are correct. 13/49

25. Semiparametric approachs to coarsened data I First we introduce the terminology of coarsening, which contains missing data as a special case: De

26. nition 1.2 (Coarsening) Suppose the full data consist of iid observations of an l-dimensional random vector Z. De

27. ne a coarsening variable C such that when C = r, we only observe Gr(Z), where Gr() is a many-to-one function. Further denote C = 1 if Z is completely observed (no coarsening), that is, G1(Z) = Z. Thus, the observed data consist of iid copies of (C;GC(Z)). De

28. nition 1.3 (Coarsening at random) The data are said to be coarsened at random (CAR) if C ? ZjGC(Z). Remark 1.4 (Assumption) All problems considered are under the assumption of CAR. 14/49

29. Terminology I Z: Full data; I GC(Z): Observed data; I (C;GC(Z)): Coarsened data. I Semiparametric models arise naturally in coarsened data problems. I Consider a full data regression model, z = (y; x)0: p(zj

30. ; ) = p(yjx;

31. )p(x; ); where

32. is the regression parameter, and is in

33. nite dimensional (e.g. arbitrary cdf F for x). I Now suppose some components of x is missing (at random), then the likelihood becomes q(y; xobs; rj

34. ; ; ) = p(rjy; xobs; ) Z p(yjx;

35. )p(x; )dxmis 15/49

36. I Now the in

37. nite dimensional nuisance cannot be ignored. Hence we have arrived at a semiparametric model. I Let's review some basic theory about semiparametric inference. We assume as previously that

38. is p-dimensional, the parameter of interest, and is a possibly in

39. nite-dimensional nuissance parameter. De

40. nition 1.5 (RAL and in uence function) The estimator ^

41. n is regular asymptotically linear (RAL) if p n( ^

43. 0) = Gn ~

44. 0;0 + op(1): (8) The mean-zero function ~

45. 0;0 is said to be the in uence function of ^

46. n. Remark 1.6 (RAL estimator) If (8) holds, by CLT we easily have p n( ^

48. 0) N(0;E ~ 2) 16/49

49. De

50. nition 1.7 (Tangent spaces) Let H denote the Hilbert space of all mean-zero functions in L2(P).

51. , the tangent space for

52. is de

53. ned as the linear span, in H, of the score function S

54. = @ @

55. log p(zj

56. ; 0)j

58. 0 , i.e.

59. = spanfS

60. g: Similarly, the nuisance tangent space is de

61. ned as the linear span of the union of the score functions for all one-dimensional parametric submodels S = @ @ p(zj

62. 0; ). I The following important theorem provides a characterization of all in uence functions of semiparametric RAL estimators ^

63. n. 17/49

64. ? Theorem 1.8 (The space of IF for

65. ) The space of in uence functions of RAL estimators for

66. consists of all ~ satisfying I ~ is orthogonal to , i.e. ~ 2 ; I E ~ ST

67. = Ipp. Remark 1.9 (Z-estimation) Consider estimating

68. from the estimating equation Pn

69. = 0; where E

70. 0 = 0: Then by standard Z-estimation theory, p n( ^

72. ) = fE _

73. 0g1Gn

74. 0 + op(1): 18/49

75. Remark 1.10 (Z-estimation with estimated nuisance) In the presence of a nuisance paramter , the estimating equation generally involves . A natural strategy is to insert a consistent estimator ^n. Pn

76. ;^n = 0; where E

77. 0;0 = 0: Now, p n( ^

79. ) = fE _

80. 0g1p nPn

81. 0;^n + op(1)

82. 0g1 = fE _ Gn

83. 0;^n + + op(1) p n[E

84. 0;^n E

85. 0;^n]

86. 0g1 = fE _ Gn

87. 0;0 + E[

88. 0;0S] + op(1): p n(^n 0)] If is constructed such that

89. 0;0 2 ? , we have E[

90. 0;0S] = 0, and so p n( ^

92. ) = fE _

93. 0g1Gn

94. 0;0 + op(1); which is equivalent to the estimator solving Pn

95. ;0 = 0. 19/49

96. In the following development for methods with coarsened data, we start with the assumption that the full data problem p(zj

97. ; ) is well studied. This includes that I The full data tangent spaces F

98. and F are completely characterized; I We have a full data estimating function

99. (Z) 2 F? . The likelihood for the coarsened data, consisting of (Ci;GCi(Zi)), is q(r; grj

100. ; ; ) = (rjgr; ) Z z:Gr(z)=gr p(zj

101. ; )d(z) (9) Now the nuisance parameter consists of (; ). 20/49

102. I We start by investigating the relationships between coarsened data tangent spaces with the full data counterparts. I Consider S

103. (r; gr) = @ @

104. log q(r; grj

105. ; ; ) = @ @

106. log Z z:Gr(z)=gr p(zj

107. ; )d(z) = R z:Gr(z)=gr (@p(zj

108. ; )=@

109. )d(z) R z:Gr(z)=gr p(zj

110. ; )d(z) = R z:Gr(z)=gr (@ log p(zj

111. ; )=@

112. )p(zj

113. ; )d(z) R p(zj

114. ; )d(z) z:Gr(z)=gr = E(SF

115. (Z)jGr(Z) = gr) = E(SF

116. (Z)jC = r;Gr(Z) = gr) I Similarly we have the following theorem about : 21/49

117. Theorem 1.11 (Characterization of ) The coarsened data tangent space for is characterized by = fE[F (Z)jC;GC(Z)] : F 2 F g: (10) I Remember that the important task is to characterize ? , which will aid us in constructing coarsened data estimating equations for

118. . Theorem 1.12 (Characterization of ? ) The space ?consists of all elements h(C;GC(Z)) 2 H such that E[h(C;GC(Z))jZ] 2 F? : (11) 22/49

119. Proof. By Theorem 1.11, The space ? consists of all elements h(C;GC(Z)) 2 H such that Efh(C;GC(Z))E[F (Z)jC;GC(Z)])g = 0; 8F (Z) 2 F : This is equivalent to Efh(C;GC(Z))F (Z)g = 0, which is equivalent to EfF (Z)E[h(C;GC(Z))jZ]g = 0 Remark 1.13 (An linear operator perspective) De

120. ne the linear operator K : H ! HF by K() = E[jZ]. Then ? = K1(F? ): (12) Given

121. (Z) 2 F? , the inverse operation K1(

122. (Z)) with provide us a usable collection of estimating functions. 23/49

123. Constructing the estimating equation Theorem 1.14 (The Space K1(

124. (Z))) If (C;GC(Z)) 2 H is such that E[ (C;GC(Z))jZ] =

125. (Z). Then K1(

126. (Z)) = (C;GC(Z)) + K1(0): De

127. nition 1.15 (Augmentation space) We denote A = K1(0), and call it the augmentation space Corollary 1.16 Assume (1;Z; 0) = P(C = 1jZ; 0) 0 a.s.. Then K1(

128. (Z)) consists of h

129. ;0 I(C = 1)

130. (Z) (1;Z; 0) + h(C;GC(Z)); h 2 A: (13) 24/49

131. Suppose ^n is an ecient estimator of 0. Take h 0, and we obtain the inverse probability weighted (IPW) estimating equation IPW n = Pn I(C=1)

132. (Z) (1;Z;0) : In practice, the choice of h 2 A will be based on eciency considerations. We have the following theorem regarding the in uence function resulting from the estimating function Pn h

133. ; ^n Theorem 1.17 The in uence function for ^

134. h n solving Pn h

135. ; ^n is ~ h = (E _

136. 0)1 I(C = 1)

137. 0(Z) (1;Z; 0) + h(C;GC(Z)) [j] (14) Remark 1.18 ( A) By calculus we easily obtain E[SjZ] = 0, and therefore A. 25/49

138. From a geometric point of view, we easily get the following result: Theorem 1.19 (Eciency among ~ h) Arg min h2A jj ~ hjj2 = I(C = 1)

139. (Z) (1;Z; 0) jA ; resulting in the estimating equation DR

140. ;0 = I(C = 1)

141. (Z) (1;Z; 0) I(C = 1)

142. (Z) (1;Z; 0) jA : (15) I Typically, calculating the projection [jA] requires us to posit working parametric models p(zj). But the DR estimating equation will still be valid even if p(zj) does not contain the truth. 26/49

143. We conclude this section by a theorem characterizing the augmentation space A. Theorem 1.20 (Characterization of A) The space A consists of all elements that can be written as X r6=1 I(C = 1) (1;Z) (r;Gr(Z)) I(C = r) hr(Gr(Z)); (16) where hr(Gr(Z)) is an arbitrary function of Gr(Z). Proof. See Theorem 7.2 of Tsiatis (2005). 27/49

144. Part II: Applications in Missing Data Problems 28/49

145. Data with two levels of missingness Suppose Z = (Z1;Z2) and Z2 is missing on some observations. Denote R = 1 if Z2 is observed and = 0 if otherwise. Let (Z1; 0) = P(R = 1jZ; 0). The following theorem states explicitly how to calculate [ R

146. (Z) (Z1;0) jA]. Theorem 2.1 R

147. (Z) (Z1; 0) jA = R (Z1; 0) (Z1; 0) E[

148. (Z)jZ1]: A sketch of proof. Wh e

149. rst use Theorem 1.20 to

150. nd that a typical element in A is R(Z1;0) (Z1;0) i h(Z1). Then we

151. nd that the unique function h0(Z1) such that n R

152. (Z) (Z1;0) h R(Z1;0) (Z1;0) i h0(Z1) o ? h R(Z1;0) (Z1;0) i h(Z1); for all h(Z1) is E[

153. (Z)jZ1]. 29/49

154. Remark 2.2 (DR estimating equation) From Thereom 2.1 we have that DR

155. ;0 = R

156. (Z) (Z1; 0) R (Z1; 0) (Z1; 0) E[

157. (Z)jZ1]: (17) To compute E[

158. (Z)jZ1], we need to posit a parametric model p(zj), or at least p(Z2jZ1; ), and

159. nd a consistent estimator ^n for 0. Then the projection can be computed by E[

160. (Z)jZ1; ^n]. We should note that the parametric models need to be consistent with the original semiparametric model. Similar to the motivating example, we can show that the resulting estimating equation is doubly robust to p(rjz; ) and p(zj). DR

161. ; ^n;^n = R

162. (Z) (Z1; ^n) R (Z1; ^n) (Z1; ^n) # E[

163. (Z)jZ1; ^n] 30/49

164. From the theoretic developement in Part I, we know that the estimating equation Pn DR

165. ; ^n;0 is the most ecient among the augmented IPW if the working model p(zj) is true. It can be shown that DR

166. ; ^n;^n and DR

167. ; ^n;0 give assumptotic equivalent estimators under the working model. If we are to conduct robust inference based on DR

168. ; ^n;^n , we need to derive the variance without relying on the correctness of the working model p(zj). Let h(Xi;

169. ; 0) = E[

170. (Z)jZ1; 0] and assume ^n ! , where h(Xi;

171. ; )6= E[

172. (Z)jZ1]. 31/49

173. Now, p n( ^

174. n

175. 0) = E @ @

176. T DR

177. 0;0; 1 Gn DR

178. 0;0; + p n[E DR

179. 0; ^n; E DR

180. 0;0; ] + p n[E DR

181. 0;0;^ E DR

182. 0;0; ] + op(1) = h E _

183. 0 i1 Gn DR

184. 0;0; E[ DR

185. 0;0;ST p n( ^n 0) ] + op(1) = h E _

186. 0 i1 Gn DR

187. 0;0; E[ DR

188. 0;0;ST ][ESST ]1GnS + op(1) 32/49

189. Denote )(ES 2 = DR (E DRST )1S: The we have p n( ^

190. n

191. 0) = h E _

192. 0 i1 Gn

193. 0;0; + op(1) N(0; ); (18) where can be consistently estimated by Pn R _ ^

194. n (Z1; ^n) #1 Pn ^

195. n; ^n;^n (Z1;R) 2 Pn R _ ^

196. n (Z1; ^n) #1 (19) 33/49

197. Example: Logistic regression with missing covariate Consider a logistic regression P(Y = 1jX;

198. ) = e

199. 0+

200. T 1 X1+

201. 2X2 1 + e

202. 0+

203. T 1 X1+

204. 2X2 where X2 is a real-valued continuous covariate and is missing on some subjects; (Y;XT 1 )T is always observed. I The full data model is p(yjx;

205. )(x): Let X = (1;XT 1 ;X2). The full data estimating equation is Pn

206. = PnX y e

207. TX 1 + e

208. TX ! 34/49

209. I To use the IPW, we use a logistic regression for the missing mechanism P(R = 1jY;X; ) = e0+1Y +T2 X1 1 + e0+1Y +T2 X1 : The MLE ^n can be computed by solving PnS = 0, where S = 0 @ 1 Y X1 1 A R e0+1Y +T2 X1 1 + e0+1Y +T2 X1 ! : I To construct the DR estimating equation, we need to compute the conditional expectation E X y e

210. TX 1 + e

211. TX !

216. Y;X1 # : 35/49

217. I Therefore we need to posit a working model for p(xj), or at least for p(z2jy; z1; ). If we do the latter, we should be aware that p(z2jy; z1; ) must be compatible with the regression model p(yjx;

218. ). In fact, if the covariate distribution is MVN, we can show that xjy is multivariate normal. This motivates the following working model X2jY;X1; N(0 + 1Y + T 2 X1; 3): The MLE ^n is easily computed by the Least Squares with a complete case analysis. I Finally we need to compute E X Y e

219. TX 1 + e

220. TX !

225. Y;X1; ^n # : This can be completed using numerical or Monte Carlo integration. 36/49

226. Hence the DR estimating equation is DR n (

227. ) = Pn ( R 1 ; ^n) (Y;XT X Y e

228. TX 1 + e

229. TX ! 1 ; ^n) 1 (Y;XT 1 ; ^n) (Y;XT E X Y e

230. TX 1 + e

231. TX !

236. Y;X1; ^n #) (20) ^

237. n can be obtained using the Newton-Raphson algorithm, and its variance estimated using (19). 37/49

238. Monotone coarsened data De

239. nition 2.3 (General de

240. nition) If we can order the levels of coarsening in such a way that Gr(Z) is a coarsened version of Gr+1(Z); r = 1; 2; . That is Gr(Z) = fr(Gr+1(Z)); where fr is a many-to-one function, then coarsening is said to be monotone. Example 2.4 (Monotone missing in longitudinal data) When subject is followed over time, we observe (Y1; ; Yk), where Yj is the measurement at the jth time point. Incomplete data arise if a subject is lost to follow-up at certain point. In this case, if a measurement is missing at the rth point, then all measurements after that will be missing. 38/49

241. C = r Gr(Z) 1 Y1 2 Y1; Y2 ... ... k 1 Y1; ; Yk1 1 Y1; ; Yk For monotone coarsened data, it is natural and convenient to model missingness via the discrete hazard function r(Gr) = P(C = rjC r;Z); r6= 1 1; r = 1 : De

242. ne Kr(Gr) = P(C rjZ) = rj =1[1 j(Gj )]: Then the function can be expressed as (r;Gr(Z)) = Kr(Gr(Z))r(Gr): 39/49

243. As in the case with two levels of missingness, we

244. rst need to chacterize the augmentation space A using Theorem 1.20. Then we use the characterization to derive ( IPWjA). We provide the end result in the following theorem. Theorem 2.5 (( IPWjA) in monotone coarsened data) The projection of I(C=1)

245. (Z) (1;Z) onto A is X r6=1 I(C = r) r(Gr)I(C r) Kr(Gr) E[

246. (Z)jGr(Z)] (21) Again, to compute the conditional expectations E[

247. (Z)jGr(Z)] we need to posit a parametric working model p(zj), or at least a series of conditional models p(gr+1jgr; r). 40/49

248. Remark 2.6 (Modeling the coarsening hazard) Instead of modeling the coarsening probability, we model the discrete hazard P(C = rjC r;Z; r) = r(Gr; r): With monotone missing longitudinal data, for example, we may apply the logistic model r(Gr; r) = e0r+1rY1++rrYr 1 + e0r+1rY1++rrYr : The likelihood of C now has the following form Y r Y i:Cir r(Gr(Zi); r) I(Ci=r) I(Cir) 1 r(Gr(Zi); r) : Note that the likelihood for r factorizes. So maximization can be done separately. 41/49

249. If we use logistic regression for monotone missing longitudinal data, the likelihood is given by kY1 r Y i:Cir e0r+1rY1++rrYrI(Ci = r) 1 + e0r+1rY1++rrYr : Each r can be estimated using logistic regression on the data fi : Ci rg, and S = (ST 1 ; ; ST k1)T : 42/49

250. Now we look at the problem of double robustness. Let ^n ! and ^n ! . Theorem 2.7 (Double robustness of DR) E I(C = 1)

251. 0 (Z) (1;Z; ) + X r6=1 I(C = r) r(Gr; )I(C r) Kr(Gr; ) E[

252. 0 (Z)jGr(Z); ] = 0; if either the model for r(gr; ) or the working model p(zj) is correctly speci

253. ed. Hint. De

254. ne the discrete time

255. ltration Fr fI(C = 1); ; I(C = r 1);Zg: and use martingale arguments. 43/49

256. Remark 2.8 (Inference with DR) Denote )(ES 2 = DR (E DRST )1S: Similar to the case with two levels of missingness, we can show that p n( ^

257. n

258. 0) = h E _

259. 0 i1 Gn

260. 0;0; + op(1) N(0; ); (22) where can be consistently estimated by Pn I(C = 1) _ ^

261. n (1;Z; ^n) #1 Pn ^

262. n; ^n;^n 2 Pn I(C = 1) _ ^

263. n (1;Z; ^n) #1 (23) 44/49

264. Example: A longitudinal RCT with dropout Tsiatis (2006) describes a randomized clinical trial on a new drug for HIV/AIDS. The primary outcome is CD4 count, denoted as Y . We also denote X as the indicator variable for the treatment. Measurements of Y are taken at baseline t1 = 0 and l 1 subsequent time points, denoted t2; ; tl. We want to model the mean CD4 count as a function of treatment and time through E[YjijXi] =

265. 0 +

266. 1tj +

267. 2Xitj ; j = 1; ; l: Let the design matrix be D(X), that is E[YijXi] = D(Xi)

268. : If there is no dropout, we may use the GEE with independent working correlation, resulting in the estimating equation Pn

269. (Y;X) = PnDT (X)(Y D(X)

270. ): 45/49

271. Now suppose there is random dropout, and the mechanism is MAR. I First we use the logistic regression for the missing hazard r(Gr; r) = e0r+1rY1++rrYr+r+1;rX 1 + e0r+1rY1++rrYr+r+1;rX : and obtain the MLE ^n I Denote Y r = (Y1; ; Yr)T and Y r = (Yr+1; ; Yl)T . From Theorem 2.5, we need to compute the conditional expectation E[DT (X)(Y D(X)

272. )jY r;X] = DT (X)E[(Y D(X)

273. )jYr;X]: 46/49

274. I If we posit the working model Y j(X = k) N(k; ); and denote rr to be the variance of Yr and rr to be the covariance between Yr and Yr. Then we have E[(Y D(X)

275. )jYr;X; ] = Y r Dr3(X)

276. rr(rr)1(Y r Dr3(X)

277. ) : I The MLE ^ n can be computed using standard statistical package (e.g. PROC MIXED in SAS). 47/49

278. I Finally,

279. can be estimated through the DR estimating equation using the Newton-Raphson algorithm Pn DR

280. ; ^n;^ n = Pn ( I(C = 1) (1; ^n) DT (X)(Y D(X)

281. ) + Xl1 r=1 I(C = r) r(Yr;X; ^n)I(C r) Kr(Yr;X; ^n) # ) DT (X)E[(Y D(X)

282. )jYr;X; ^ n] : (24) I The asymptotic variance of

283. n can be estimated using the sandwich-type estimator described in (23). 48/49

284. References Bang H, Robins JM (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962-73. Bickel J, Klaassen C, Ritov Y, Wellner JA (1993). Ecient and Adaptive Estimation for Semiparametric Models. Springer. Kosorok MR (2008). Introduction to empirical processes and semiparametric inference. Springer Lipsitz SR, Ibrahim JG, Zhao LP (1999). A Weighted Estimating Equation for Missing Covariate Data with Properties Similar to Maximum Likelihood. Journal of the American Statistical Association 94, 1147-1160. Robins JM, Rotnitzky A. (2001). Comment on the Bickel and Kwon article, Inference for semiparametric models: Some questions and an answer Statistica Sinica 11, 920-936.. Scharfstein DO, Rotnitzky A, Robins JM (1999). Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models. Journal of the American Statistical Association 94, 1135-1146. Tsiatis (2006). Semiparametric Theory and Missing Data. Springer Series in Statistics 49/49

Double Robustness: Theory and Applications with Missing Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Double Robustness: Theory and Applications with Missing Data

Similar to Double Robustness: Theory and Applications with Missing Data (20)

Recently uploaded

Recently uploaded (20)

Double Robustness: Theory and Applications with Missing Data