Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Extrapolation of data from key
population surveys and
programs
Jessie K. Edwards
University of North Carolina at Chapel Hi...
Perspective
To improve public health, we must
make a series of urgent decisions.
• Resource allocation
• Intervention stra...
Perspective
• But, we rarely act from a position of complete
ignorance.
• To make decisions, we (often implicitly) summari...
Our Setting
Decision makers often need national estimates of
indicators related to key populations
- For advocacy purposes...
What do we want to know?
- Size estimates of key populations
- Female sex workers
- Men who have sex with men
- Transgende...
Challenges
Usually, information to describe a national HIV
epidemic has gaps.
Subgroups of the population may be
underrepr...
We need a principled approach to
come up with national estimates
when existing data are incomplete.
7
Ongoing efforts
WHO / UNAIDS meeting in March on strategic
indicators for key pops
MeSH co-convened, UNC, LSHTM, JHSPH
par...
Illustrative example
What is the size of the population of
female sex workers in the Dominican
Republic?
9
Overview
1. Need for a national estimate: Specifically sizes of
key populations, including female sex workers
• Funders re...
Parameter of interest
𝜓 = 𝐸 𝑌 = proportion of women who exchange sex for cash
11
In simulations, I set 𝜓 = 2.0%
Missing data problem
But we only have measurements in a subset of the
geographic areas from programmatic data collection i...
Missing data problem
Our challenge is to use data from the sampled
areas to make inference to the entire country
13
Approach
Original data:
Because data were collected for programmatic
purposes, 𝐸 𝑌 𝑆𝑗 = 1 > 𝐸(𝑌)
14
In simulated example, ...
Approach
Original data:
Expect that 𝐸 𝑌 𝑊 = 𝑤, 𝑆𝑗 = 1 = 𝐸(𝑌|𝑊 = 𝑤), but
did not collect data in all strata of 𝑊.
15
In sim...
Approach
Original data + targeted additional data collection
Expect that 𝐸 𝑌 𝑊 = 𝑤, 𝑆𝑗 = 1 = 𝐸(𝑌|𝑊 = 𝑤), and
𝑃 𝑆𝑗 = 1|𝑊 = ...
Obtaining information on 𝑾
The programmatic data collection activities only
collected data on 𝑌.
But contextual data (𝑊) c...
Covariates in 𝑾 for DR example
From national data (publically available online):
• Age distribution
• Male/female ratio
• ...
Challenges using population-based surveys
Challenges with DHS data
• Indicators are generalizable to regional level, but w...
Summarizing DHS data
1. Create 𝑊 “surface” by interpolating values of 𝑊
between DHS clusters using a fine grid
2. Summariz...
Summarizing DHS data
1. Create 𝑊 “surface” by interpolating values of 𝑊
between DHS clusters using a fine grid
2. Summariz...
Now a standard missing data problem
We have data on 𝑌 for 50 municipalities
We have data on 𝑊 for all 154 municipalities
C...
The AIPW estimator
The AIPW is a
consistent,
semiparametric efficient,
and doubly robust,
estimator of 𝜓.
23
AIPW estimator is consistent, semiparametric efficient,
and doubly robust
What does that mean and why should we care?
• Ta...
Data use/consensus building
In this example, stakeholders convened a workshop
after the activity to understand the results...
Loose ends
Quantifying uncertainty
• How sure are we that assumptions hold?
• How much faith do we have in existing estima...
Loose ends
What if we had not been able to collect additional
data?
That is, what if we had had NO DATA within key
of 𝑊?
e...
Discussion
Decisions can’t always wait for perfect and complete
data.
We can use modern statistical tools to produce usefu...
This presentation was produced with partial support from the
United States Agency for International Development (USAID)
un...
Upcoming SlideShare
Loading in …5
×

Extrapolation of data from key population surveys and programs

979 views

Published on

Presented by Jessie K. Edwards at the 2016 IAS conference.

Published in: Health & Medicine
  • Login to see the comments

Extrapolation of data from key population surveys and programs

  1. 1. Extrapolation of data from key population surveys and programs Jessie K. Edwards University of North Carolina at Chapel Hill jessedwards@unc.edu 2016 International AIDS Conference Durban, South Africa 1
  2. 2. Perspective To improve public health, we must make a series of urgent decisions. • Resource allocation • Intervention strategies • Prioritization Sometimes, we must act without perfect data. 2
  3. 3. Perspective • But, we rarely act from a position of complete ignorance. • To make decisions, we (often implicitly) summarize any existing information and our remaining uncertainty. • This talk walks through some approaches to formalize this process in a specific setting. 3
  4. 4. Our Setting Decision makers often need national estimates of indicators related to key populations - For advocacy purposes - To set national targets - To inform funding allocation by international bodies 4
  5. 5. What do we want to know? - Size estimates of key populations - Female sex workers - Men who have sex with men - Transgender people - Prevalence of HIV and other diseases - Distribution of risk factors for disease transmission - Program coverage 5
  6. 6. Challenges Usually, information to describe a national HIV epidemic has gaps. Subgroups of the population may be underrepresented Geographic areas of the country may be excluded 6
  7. 7. We need a principled approach to come up with national estimates when existing data are incomplete. 7
  8. 8. Ongoing efforts WHO / UNAIDS meeting in March on strategic indicators for key pops MeSH co-convened, UNC, LSHTM, JHSPH participants 8
  9. 9. Illustrative example What is the size of the population of female sex workers in the Dominican Republic? 9
  10. 10. Overview 1. Need for a national estimate: Specifically sizes of key populations, including female sex workers • Funders request this information • National estimates guide MOH priorities 2. Inventory of existing data: Size estimates from 37 of 154 municipalities collected in priority provinces using a venue-based sampling approach 3. Statistical approaches to compute national estimate: The focus of this talk. 4. Consensus building + decision making: Project is ongoing – due to data security concerns, this example is illustrative only (data are hypothetical) 10
  11. 11. Parameter of interest 𝜓 = 𝐸 𝑌 = proportion of women who exchange sex for cash 11 In simulations, I set 𝜓 = 2.0%
  12. 12. Missing data problem But we only have measurements in a subset of the geographic areas from programmatic data collection in 2014 (let’s denote these regions by 𝑆𝑗 = 1) 12
  13. 13. Missing data problem Our challenge is to use data from the sampled areas to make inference to the entire country 13
  14. 14. Approach Original data: Because data were collected for programmatic purposes, 𝐸 𝑌 𝑆𝑗 = 1 > 𝐸(𝑌) 14 In simulated example, 𝐸 𝑌 𝑆𝑗 = 1 = 4.1%
  15. 15. Approach Original data: Expect that 𝐸 𝑌 𝑊 = 𝑤, 𝑆𝑗 = 1 = 𝐸(𝑌|𝑊 = 𝑤), but did not collect data in all strata of 𝑊. 15 In simulated example, 𝑤 𝐸 𝑌 𝑊 = 𝑤, 𝑆𝑗 = 1 𝑃(𝑊 = 𝑤|𝑆𝑗 = 1) = 3.5%
  16. 16. Approach Original data + targeted additional data collection Expect that 𝐸 𝑌 𝑊 = 𝑤, 𝑆𝑗 = 1 = 𝐸(𝑌|𝑊 = 𝑤), and 𝑃 𝑆𝑗 = 1|𝑊 = 𝑤 > 0 for all 𝑊 16
  17. 17. Obtaining information on 𝑾 The programmatic data collection activities only collected data on 𝑌. But contextual data (𝑊) can be found other places • Census information • Geographic databases • Other population health surveys • DHS 17
  18. 18. Covariates in 𝑾 for DR example From national data (publically available online): • Age distribution • Male/female ratio • Employment data • Population density • Poverty • Country of origin From DHS • Average number of years of education among women • HIV prevalence • Adolescent pregnancy indicators From other sources (directly from stakeholders) • Indicator of tourist area • Indicator that the municipality was on a major transit corridor 18
  19. 19. Challenges using population-based surveys Challenges with DHS data • Indicators are generalizable to regional level, but we needed municipal estimates • Random displacement of GPS coordinates for clusters 19
  20. 20. Summarizing DHS data 1. Create 𝑊 “surface” by interpolating values of 𝑊 between DHS clusters using a fine grid 2. Summarize values of 𝑊 by municipality 20
  21. 21. Summarizing DHS data 1. Create 𝑊 “surface” by interpolating values of 𝑊 between DHS clusters using a fine grid 2. Summarize values of 𝑊 by municipality 21
  22. 22. Now a standard missing data problem We have data on 𝑌 for 50 municipalities We have data on 𝑊 for all 154 municipalities Could use any computation approach for estimating a mean in the presence of missing data. • Up-weight existing data to represent distribution of 𝑊 in entire country • Predict estimates in municipalities without data from a regression model containing 𝑊 22 Here, we use the augmented inverse probability weighted (AIPW) estimator for 𝝍.
  23. 23. The AIPW estimator The AIPW is a consistent, semiparametric efficient, and doubly robust, estimator of 𝜓. 23
  24. 24. AIPW estimator is consistent, semiparametric efficient, and doubly robust What does that mean and why should we care? • Takes covariates 𝑊 into account • Smaller variance than traditional weighting approaches • Two chances to correctly specify parametric models • Model for inclusion in data collection (like weighting approaches) AND model for the outcome of interest (like regression-prediction approaches) • In weighting only or regression model only approaches, if the parametric model is misspecified, results are biased. • Here, if either of these models is correct, estimate of 𝜓 will be correct 24
  25. 25. Data use/consensus building In this example, stakeholders convened a workshop after the activity to understand the results and 1. Suggest sensitivity analyses • What if a different set of variables were included in 𝑊? • What if variables in 𝑊 were measured differently? • What if variables in 𝑊 were modeled differently? 2. Discuss areas of uncertainty 25
  26. 26. Loose ends Quantifying uncertainty • How sure are we that assumptions hold? • How much faith do we have in existing estimates of 𝑌? • How well is 𝑊 captured? • Are we sure that we have measured all predictors of 𝑌 that differ between municipalities with and without data? • Crucial for making decisions (i.e., maximizing expected utility) • Best incorporated in a Bayesian framework 26
  27. 27. Loose ends What if we had not been able to collect additional data? That is, what if we had had NO DATA within key of 𝑊? e.g., What if all data had been collected in urban areas? We could augment our existing knowledge/information using • Parametric extrapolation • Bayesian methods 27
  28. 28. Discussion Decisions can’t always wait for perfect and complete data. We can use modern statistical tools to produce useful results from incomplete data. Inference may be updated as more information arises. 28
  29. 29. This presentation was produced with partial support from the United States Agency for International Development (USAID) under the terms of MEASURE Evaluation cooperative agreement AID-OAA-L-14-00004. MEASURE Evaluation is implemented by the Carolina Population Center, University of North Carolina at Chapel Hill in partnership with ICF International; John Snow, Inc.; Management Sciences for Health; Palladium; and Tulane University. Views expressed are not necessarily those of USAID or the United States government . www.measureevaluation.org

×