Clinical prediction models:development, validation and beyond

1
Dr Maarten van Smeden, UMC Utrecht, M.vanSmeden@umcutrecht.nl
Dr Laure Wynants, Maastricht University, laure.wynants@maastrichtuniversity.nl

2
} Introduction prediction models vs everything else
PART A: development
} biased estimators and Stein’s paradox
} overfitting/sample size
Part B: validation and beyond
} Metrics
} Validation strategies
} Impact and implementation
PART C: open discussion on prediction and being an
early career researcher

3
5-Feb-21
Insert > Header & footer
3

4
Cartoon of Jim Borgman, first published by the Cincinnati Inquirer and King Features Syndicate April 27 1997

5
Schoenfeld & Ioannidis, Am J Clin Nutr 2013, DOI: 10.3945/ajcn.112.047142
“We selected 50 common ingredients from random
recipes of a cookbook”

6
} veal, salt, pepper spice, flour, egg, bread, pork,
butter, tomato, lemon, duck, onion, celery, carrot,
parsley, mace, sherry, olive, mushroom, tripe,
milk, cheese, coffee, bacon, sugar, lobster, potato,
beef, lamb, mustard, nuts, wine, peas, corn,
cinnamon, cayenne, orange, tea, rum, raisin, bay
leaf, cloves, thyme, vanilla, hickory, molasses,
almonds, baking soda, ginger, terrapin

8

9

10

11
Credits to Peter Tennant for identifying this example

13
} Explanatory models
• Theory: interest in regression coefficients
• Testing and comparing existing causal theories
e.g. aetiology of illness, effect of treatment
} Predictive models
• Interest in (risk) predictions of future observations
• No concern about causality
• Concerns about overfitting and optimism
e.g. prognostic or diagnostic prediction model
} Descriptive models
• Capture the data structure

14
} Predictive models
A
L
Y
exposure outcome
confounder

15
} Predictive models

16
Van Smeden et al. Clinical prediction models: diagnosis versus prognosis, JCE, in press

17
} What is a prediction model?
◦ Mathematical formula (usually logistic or Cox regression,
sometimes machine learning methods)
◦ Combining multiple predictors (independent variables)
◦ The outcome (dependent variable) is usually a diagnosis
or prognosis
◦ Used to predict the outcome in new individuals
} What is the advantage
◦ Uses multiple characteristics, simultanously
◦ Giving each of them appropriate weights
◦ Personalized evidence-based approach to healthcare
(hopefully)
} Medical guidelines are usually binary
(dichotomania!)
◦ Treatment X if: Age>40 OR BMI>30
◦ What about a patient of 39 years old, with a BMI of 29?

20
Wells et al., Lancet, 1997. doi: 10.1016/S0140-6736(97)08140-3

21
Apgar, JAMA, 1958. doi: 10.1001/jama.1958.03000150027007

24
1. Before getting
started
2. Study design
3. Modelling
strategy
4. Model fitting
5. Model validation
– quantify
predictive
performance
6. Presentation
7. Reporting
8. Model validation
– external test
9. Impact studies
10. Implementation
} Phase 1: model development
} Phase 2: external validation
} Phase 3: impact evaluation
} Phase 4: implementation

25
1. Before getting
started
2. Study design
3. Modelling
strategy
4. Model fitting
5. Model validation
– quantify
predictive
performance
6. Presentation
7. Reporting
8. Model validation
– external test
9. Impact studies
10. Implementation

26
} Point of intended use of the risk model
• Primary care (paper/computer/app)?
• Secondary care (beside)?
• Low resource setting?
} Complexity
• Number of predictors?
• Transparency of calculation?
• Should it be fast?

29
When one has three or more units (say,
individuals), and for each unit one can
calculate an average score (say, average
blood pressure), then the best guess of
future observations for each unit (say, blood
pressure tomorrow) is NOT the average
score.

30
James and Stein. Estimation with quadratic loss. Proceedings of the fourth Berkeley
symposium on mathematical statistics and probability. Vol. 1. 1961.

32
Squared prediction error reduced from .077 to .022

33
• Probably among the most surprising (and initially doubted)
phenomena in statistics
• Now a large “family”: shrinkage estimators reduce prediction variance
to an extent that typically outweighs the bias that is introduced
• Bias/variance trade-off principle has motivated many statistical and
machine learning developments
Expected prediction error = irreducible error + bias2 + variance

• 5% reduction in MSPE just by shrinkage
estimator
• Van Houwelingen and le Cessie’s heuristic
shrinkage factor

} To explain or to predict?
◦ Prediction often benefits from shrinkage, the
consequence is that regression coefficients are
biased
◦ Explanations with focus on coefficients may not
benefit from the bias that is introduced!
} When is shrinkage needed?
◦ In case risk of overfitting is high
◦ Risk of overfitting is high when sample size is small
(particularly if modelling choices are data driven)

Events per variable (EPV) for logistic/survival models:
number of events (smallest outcome group)
number of candidate predictor variables
EPV = 10 commonly used minimal criterion

“For EPV values of 10 or greater, no major
problems occurred. For EPV values less than 10,
however, the regression coefficients were biased
in both positive and negative directions”

Citations based on Google Scholar, Oct 30 2020
citations: 5,736
“a minimum of 10 EPV […] may be too conservative”
“substantial problems even if the number of EPV exceeds 10”
For EPV values of 10 or greater, no major problems
citations: 2,438
citations: 216

• EPV values for reliable selection of
predictors from a larger set of candidate
predictors may be as large as 50
• Statistical simulation studies on the
minimal EPV rules are highly
heterogeneous and have large problems
• But what if we just use shrinkage?

“We conclude that, despite improved performance on average,
shrinkage often worked poorly in individual datasets, in
particular when it was most needed. The results imply that
shrinkage methods do not solve problems associated with
small sample size or low number of events per variable.”

} In short:
◦ Minimal sample size requirements for logistic,
survival and continuous outcomes
◦ 4 or 5 criteria to meet
– Minimizing risk of overfitting
– Ensuring sufficiently precise estimation of risk
} Software in R and Stata to simplify
calculations
} Sample size criteria for validation currently
under review

58
1. Before getting
started
2. Study design
3. Modelling
strategy
4. Model fitting
5. Model validation
– quantify
predictive
performance
6. Presentation
7. Reporting
8. Model validation
– external test
9. Impact studies
10. Implementation

59
Apparent
(Usually too optimistic)
i.e., predictions evaluated on the development
data
Internal
(optimism-corrected)
e.g. bootstrapping
External

60
Apparent
(Usually too optimistic)
i.e., predictions evaluated on the development
data
Internal
(optimism-corrected)
e.g. bootstrapping
External
n=232 models
22%
48%
20% as part of development study
10% independent
only 5% assessed calibration
doi.org/10.1136/bmj.m1328

61
Doi 10.1016/j.jclinepi.2020.01.028

62
NICE Framingham AUC 77.6, overestimated risk
Vs.
QRISK2–2011AUC 77.1, well calibrated
Treatment threshold 20%
206 per 1000 men
Vs
110 per 1000 men

63
Rms package
Val.prob.ci.2 doi 10.1016/j.jclinepi.2015.12.005

64
doi: 10.1186/s12916-019-1425-3

65
decisioncurveanalysis.org
Harm FP
Benefit TN
Harm FN
Benefit TP

66
} Split sample – easy but inefficient
◦ Unless
– huge data
– meaningful split
– Qcovid
doi: 10.1016/j.jclinepi.2015.04.005

67
doi: 10.1016/j.jclinepi.2015.04.005
Optimism-corrected performance =
apparent performance – optimism
1. Draw sample*
2. Built model* in sample* (repeat every step,
incl. variable selection, non-linearities)
• Bootstrap performance is performance of
model* on sample*
3. Apply model* to original Sample
• test performance is performance of
model* on sample
4. Optimism =
bootstrap performance – test performance
5. Repeat 100 times
Rms package
Can bee cumbersome for complex modeling

Heterogeneity
Test
accuracy
(affects 70% of
MA; Willis BMC
med res meth
2011)
Miscalibration
(Van Calster
et al. MDM
2015)
Disease
prevalence
(Hilden, Stat
Med., 2000)

69
doi: 10.1016/j.jclinepi.2015.04.005

Ultrasound-based risk model for preoperative prediction of lymph-node metastases in women with endometrial cancer
DOI: 10.1002/uog.21950

Se
Sp
P
Se
Sp
P
Se
Sp
P
Se Sp P
NB
Net Benefit
= (true positives – w × false
positives)/n
= Se×P – w × (1-Sp) × (1-P)
Within-setting
model
Between-setting
model
Distribution of NB

75
Model not
fit for
purpose
Not validated No impact Regulatory
frameworks
Not adopted in
clinical practice
Suboptimal for improving clinical practice

76
} Analysis of the impact of using the model in
clinical practice
◦ Calculate it
– Cost-effectiveness analysis
◦ Run an experiment
– (Cluster-) randomized trials or pre-post intervention
studies
– SPIRIT AI / CONSORT AI

77
Meertens LJE et al. Fetal diagn Ther. 2018 Jul 18:1-13.

78
} Before-after study
◦ Fewer adverse perinatal outcomes in nulliparous
women (OR 0.56, 95% CI 0.32 to 0.94)
◦ Lower mean cost per pregnant woman
(-2.766 euro, 95% CI -3.700 to -1.825)
Doi 10.1016/j.ajog.2020.02.036

79
Figure 4 Adherence rates of discussing low-dose aspirin prophylaxis during the study period.
Figure 4 Adherence rates of discussing low-dose aspirin prophylaxis during the study period.
Doi 10.1016/j.ajog.2020.02.036

80
} ‘No time’
} Black box/ new approach / do not trust model
predictions / do not believe it is applicable for
specific patient
} Not (yet) convinced of improvement
◦ Not (yet) aware of current situation
} ‘Aspirin is a medicine, thus potentially harmful’
◦ Difficulty in weighing risks

Clinical prediction models:development, validation and beyond

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Clinical prediction models:development, validation and beyond

Similar to Clinical prediction models:development, validation and beyond (20)

More from Maarten van Smeden

More from Maarten van Smeden (15)

Recently uploaded

Recently uploaded (20)