Ahmed ifpri impact evaluation methods_15 nov 2011

PROJECT IMPACT EVALUATION
METHODS
AKHTER U. AHMED
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
BANGLADESH POLICY RESEARCH AND STRATEGY SUPPORT
PROGRAM
KNOWLEDGE, TOOLS AND LESSONS FOR INFORMING THE DESIGN AND
IMPLEMENTATION OF FOOD SECURITY STRATEGIES IN ASIA
14-16 NOVEMBER 2011
KATHMANDU

INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE Page 2
Storyline
1. What is impact evaluation?
2. How to do impact evaluation?
3. Difference-in-differences method of impact evaluation
4. How to construct a comparison group?
• Randomization
• Matching
• Instrumental variables
• Regression discontinuity design

What is an impact evaluation?
An impact evaluation assesses the changes in the well-being
of families or individuals that can be attributed to a particular
project, program or policy
Impact is the difference between outcomes (e.g.,
consumption, school enrollment, women’s empowerment,
etc) with the program and without it
The goal of impact evaluation is to measure this
difference in a way that can attribute the difference to the
program, and only the program

Why is it important?
Government policymakers/implementing agencies/donors
want to know if the program had an impact and the average
size of that impact
Understand if policies work
Justification for program
Understand the net benefits of the program
Understand the distribution of gains and losses

What we need
Compare same individual with and without programs at
same point in time
Problem: Every individual is unique—each Individual has
only one existence. So, we never observe the same
individual with and without program at same point in time
Hence, we have a problem of a missing counterfactual of
what would have happened without the program

Solving the evaluation problem
How about comparing impact indicators of individuals before
and after the program?
This is called “reflexive” impact evaluation
But the problem is that the rest of the world moves on and
we are not sure what was caused by the program and what
by the rest of the world. We might pick up the effects of other
factors that changed around the time of treatment
So, we need a control/comparison group that will allow us to
attribute any change in the “treatment” group to the program
Difference between treated observation and counterfactual is
the estimated impact

Problems in constructing a comparison
group
Two main problems:
Most social interventions are targeted
Program areas differ from non-program areas in
“observable” and “unobservable” ways because the
program-designers intended this
Individual participation is usually voluntary
Participants differ from non-participants in observable and
unobservable ways
Therefore, a comparison of participants and an arbitrary
group of non-participants can lead to biased results. This is
termed as “selection bias”

Illustrating selection bias
SB = 0
G=ATT
SB > 0
G>ATT
SB < 0
G<ATT
Observed difference (G)
Impact on the treated (ATT) = true effect of the program on its recipients
Selection Bias (SB)
No selection bias
Selection on
“better-off” with
respect to the
outcome
Selection on
“worse-off” with
respect to the
outcome
Observed
G
Selection bias: The part of the observed differences in outcome due to initial
differences between Treatment and Control observations

Difference-in-differences
Difference-in-differences compares observed changes in
the outcomes for program participants (treatment) and
non-participating comparison group (control), before and
after a program
Identification assumption: Selection bias is time-
invariant
Counterfactual: Changes over time for the comparison
group
Constraint: Requires pre-program and post-program data
for treatment and control groups
Difference-in-differences is also called Double-difference
or Diff-in-diff

Illustrating difference-in-differences
estimate of average treatment effect
Baseline
(Before)
Follow-up
(After)
TA
CA
TB = CB
Treatment
Control
Impact = (TA - CA) - (TB - CB)
TB
CB

Difference-in-differences …

What is a sound comparison group?
A comparison group that is as identical as possible to those
receiving the program—the treatment group
Identical in observable and unobservable characteristics
Ideally, the only difference between a treatment group
and a comparison is: the control group does not
participate
A comparison group that will not get spillover benefits from
the program

How to construct a comparison group?
1. Randomization
2. Instrumental variables
3. Matching (e.g. propensity score matching)
4. Regression discontinuity design

Randomization
For a sound impact evaluation, the best way is to assign
the program randomly to treatment and comparison groups
Randomization is often termed as the “gold standard”
for impact evaluation
If program assignment is random, then all individuals (or
households, communities, schools, etc) have the same
chance of receiving the program
Selection bias is zero
Then there will be no difference between the two groups
besides the fact that the treatment group got the
program (There can still be differences due to sampling
error; the larger the size of the treatment and
comparison samples the less the error)

Advantages of a randomized design
Easy way to identify impact
Results can be easily explained and communicated
Ideal for pilot programs

Limitations of randomization
Ethical issues
Political constraints
People might not want to participate: Internal validity
(exogeneity) may not hold
Randomization is usually run on a pilot, small scale. May
be difficult to extrapolate the results to a larger population:
External validity (generalizability) may not hold

Instrumental Variables (IV)
Instrumental variables are variables that affect
program participation, but not outcomes given
participation
If such variables exist then they identify a source of
exogenous variation in outcomes attributable to the
program – recognizing that its placement is not
random but purposive
The instrumental variables are first used to predict
program participation, then one sees how the
outcome indicator varies with the predicted values

Instrumental Variables …
Suppose we want to estimate a treatment effect
using survey data
The OLS estimator is biased and inconsistent (due
to correlation between regressor and error term) if
there is
omitted variable bias
selection bias
simultaneous causality
Instrumental variables regression offers an
alternative way to obtain a consistent estimator

Instrumental Variables …
Advantage:
Instrumental Variables remove selection bias from
impact estimate by ‘instrumenting’ participation. Need to
find exogenous variables that explain participation but do
not affect the outcomes
Disadvantages:
It can be difficult to find an instrument that is both
relevant (not weak) and exogenous
IV can be difficult to explain to those who are unfamiliar
with it

Matching
Match program participants with non-participants from a
large survey
Counterfactual: Matched comparison group of non-
participants
Each program participant is paired with a non-participant
that is similar
Similarity is determined on the basis of observable
characteristics of participants and non-participants from
survey data
Matching assumes that, conditional on the set of
observable characteristics, there is no selection bias based
on unobserved heterogeneity

Double difference with matching
Baseline
(Before)
Follow-up
(After)
PA
CA
Program
Control
Impact = (PA - CA) - (PB - CB)
PB = CB

Propensity score matching (PSM)
In most impact evaluation, data do not come from
randomized treatment and comparison groups
In a seminal work, Rosenbaum and Rubin (1983)
proposed propensity score matching as a method to
reduce the bias in the estimation of treatment effects with
observational survey data sets
This method has become increasingly popular in impact
evaluation

PSM …
PSM is used to pick an ideal comparison group from a
larger survey
The comparison group is matched to the treatment group
using the “propensity score”
Propensity score is predicted probability of participation
given observed pre-program characteristics of
participants and non-participants

PSM …
Advantage:
Does not require randomization
Disadvantages:
Strong identification assumptions
Requires very good quality data: need to control for all
factors that influence program placement and participation

Regression Discontinuity Design (RDD)
RDD utilizes the rule that assigns an individual to
program only below a given threshold (cut-off point)
Assumption: Discontinuity in participation but not in
counterfactual outcomes
Counterfactual: Individuals just above the cut-off point
who did not participate

Regression Discontinuity: Illustration
Outcome
Selection criteria
Participants Non-participants
Impact
Individuals are selected into the program according to a clearly defined threshold
on a characteristic that is not directly linked to the outcome
Selection threshold
Individuals selected in the sample
Source: Bernard and Torero. IFPRI

6065707580
Outcome
20 30 40 50 60 70 80
Score
Regression Discontinuity Design - Baseline
Poor
Non-Poor
Source: Gertler and Martinez. 2006. World Bank

65707580
Outcome
20 30 40 50 60 70 80
Score
Regression Discontinuity Design - Post Intervention
Impact
Source: Gertler and Martinez. 2006. World Bank

RDD …
Advantage:
Identification built in the program design
Disadvantage:
Threshold has to be applied in practice
RDD can be difficult to explain to those who are
unfamiliar with it
Page 29

References
Bernard, T., and M. Torero. Impact Evaluation. IFPRI (PowerPoint presentation)
Department of Government, Harvard University. 2005. Econometric Approaches
to Causal Inference: Difference-in-Differences and Instrumental Variables.
(PowerPoint presentation from website)
Gertler, P.J., and S. Martinez. 2006. Module 3: Impact Evaluation for TTLs.
World Bank (PowerPoint presentation)
Goldstein, Markus. An Introduction to Impact Evaluation. PRMPR, World Bank.
(PowerPoint presentation from website)
Ravallion, M. 2001. The Mystery of Vanishing Benefits: An Introduction to Impact
Evaluation. The World Bank Economic Review, 15 (1).

Ahmed ifpri impact evaluation methods_15 nov 2011

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Ahmed ifpri impact evaluation methods_15 nov 2011

Similar to Ahmed ifpri impact evaluation methods_15 nov 2011 (20)

More from International Food Policy Research Institute

More from International Food Policy Research Institute (20)

Recently uploaded

Recently uploaded (20)

Ahmed ifpri impact evaluation methods_15 nov 2011