RSS 2008 - meta-analyis when assumptions are violated
1. A comparison of Random Effects
meta-analysis methods when study
effects are non-normally distributed
Evan Kontopantelis &
David Reeves
NPCRDC
2. How meta-analysis works
• A search for papers relevant to
the research question is
conducted. Unsuitable papers
are filtered out
• In each paper…
– for each outcome measure that
is directly relevant to the RQ,
or a good enough proxy, we
calculate an effect (of
intervention vs control) and its
variance
– An overall effect and variance
is selected
• Effects and their variances are
combined to calculate an
overall effect
Chronic disease - Risk factors
effect
-.4 0 .4 .8
Combined
Woolard(B), 1995
Woolard(A), 1995
Eckerlund, 1985
Moher, 2001
Cupples, 1994
Campbell, 1998
Van Ree, 1985
3. Heterogeneity
• Heterogeneity can be attributed to clinical and/or
methodological diversity
• Clinical heterogeneity: variability that arises from
different populations, interventions, outcomes
and follow-up times
• Methodological heterogeneity: relates to
differences in trial design and quality
• Detecting (usually with Cochran’s Q test)
quantifying and dealing with heterogeneity can
be very hard
4. Absence of heterogeneity
• Assumes that the true
effects of the studies
are all equal and
deviations occur
because of imprecision
of results
• Analysed with fixed-
effects method
i iY e
5. Presence of heterogeneity
• It is assumed that
there exists variation
in the size of the true
effect among studies
(in addition to the
imprecision in results)
• Analysed with
random-effects
methods
i i iY e
6. Random-effect MA methods
• Estimate the between-study variance and
use it in estimating the overall effect
• Parametric:
– DerSimonian-Laird (1986)
– Maximum & Profile likelihood (1996)
• Non-parametric:
– Permutations method (1999)
– Non Parametric Maximum Likelihood (1999)
2
7. “Potential” problems?
• Heterogeneity is common & the FE model is
under fire
• Parametric RE models assume that both the
effects and errors are normally distributed
• Almost all RE models (except PL) do not
take account of uncertainty in
• DL is usually the preferred method of
analysis because it is easy to implement
and is available in all software packages
2
ˆ
8. So far…
• The number of studies and the amount of
heterogeneity have been found to affect
method performance
• Performance comparisons usually focus on
coverage and ignore power or have not
included some important methods (e.g. PL,
PE)
• Evaluations were based on normal data:
method robustness has not been assessed
with non-normal data
10. In a nutshell
• Simulated various non-normal distributions
for the true effects: skew normal, bimodal,
beta, uniform, U and others
• Created datasets of 10000 meta-analyses
for various numbers of studies k and
different degrees of heterogeneity, for each
distributional assumption
• Compared FE, DL, ML, PL and PE methods
(along with a simple t-test) in terms of
coverage and power across all datasets
11. Generating the data
• For a single study we simulated the effect size estimate
and the within-study variance estimate of a binary
outcome
• The variance was assumed to be a realisation from a
distribution, multiplied by .25 and restricted to the
(.009, .6) interval
• involves two components
– where
–
• Four values were used: .01, .03, .07 & .1
• Number of studies (MA size) varied from 2 to 35
iY
i
2
ˆ
2
1
iY i i iY e( )
2
(0, ˆ )i ie
2
k
i i 2
?(0, )i
12. Details on the MA methods
• Fixed effects (FE)
• DerSimonian-Laird (DL)
• Q method (Q)
• Maximum Likelihood (ML)
• Profile Likelihood (PL)
• Permutations method (PE)
• T-test method (T)
13. Performance
• For each simulated meta-analysis case we
calculated confidence intervals for the overall
effect estimate , for all the methods
• Coverage: % of confidence intervals that
contain the true overall effect in a sample of
10000 meta-analyses
• Power: % of CIs that do not contain the 25th
centile of the population distribution of the
10000 effect sizes
ˆ
21. Summary
• Within any given method, the results were
consistent across all types of distribution shape
• This can give researchers confidence that
methods are highly robust against even the most
severe violation of the assumption of normally
distributed effect sizes
• If it is reasonable to assume that the effect size
does not vary between studies, the FE, Q and
ML methods all provide accurate coverage
coupled with good power
22. In the presence of heterogeneity…
• However, zero between study variance is the
exception rather than the norm and the
presence of even a moderate amount of
alters the picture considerably
• FE, Q and ML quickly lose coverage as
heterogeneity increases
• DL rapidly goes from providing a coverage that
is overly high, to one that is overly low
• PE, and to a lesser extend PL, now provide the
best coverage, even with very small sample
sizes
2
23. Which method then?
• If priority is given to maintaining an accurate
Type I error rate then the simple t-test is the
best method. But its power is very low, making
it a poor choice when control of the Type II
error rate is also important
• PE gives accurate coverage in all situations
and has better power than T, but the method is
more difficult to implement and cannot be used
with less than 6 studies
• PL has ‘reasonable’ coverage in most
situations, giving it an edge over other methods
24. Current & future work
• Created a freely available Excel add-in
that implements all the described MA
methods and various measures of
heterogeneity
• Working on a STATA module that will do
the same
• Investigate performance of heterogeneity
measures under non-normally distributed
data
25. Main references
• Brockwell SE, Gordon IR. A comparison of statistical methods for
meta-analysis. Stat.Med. 2001; 20(6):825-840
• Engels EA, Schmid CH, Terrin N, Olkin I, Lau J. Heterogeneity and
statistical significance in meta-analysis: an empirical study of 125
meta-analyses. Stat.Med. 2000; 19(13):1707-1728
• Follmann DA, Proschan MA. Valid inference in random effects meta-
analysis. Biometrics 1999; 55(3):732-737
• Hardy RJ, Thompson SG. A likelihood approach to meta-analysis
with random effects. Stat.Med. 1996; 15(6):619-629
• Micceri T. The Unicorn, the Normal Curve, and Other Improbable
Creatures. Psychological Bulletin 1989; 105(1):156-166
• Ramberg JS, Dudewicz EJ, Tadikamalla PR, Mykytka EF. A
Probability Distribution and Its Uses in Fitting Data. Technometrics
1979; 21(2):201-214