Clinical prediction models:development, validation and beyond
1. 1
Dr Maarten van Smeden, UMC Utrecht, M.vanSmeden@umcutrecht.nl
Dr Laure Wynants, Maastricht University, laure.wynants@maastrichtuniversity.nl
2. 2
} Introduction prediction models vs everything else
PART A: development
} biased estimators and Stein’s paradox
} overfitting/sample size
Part B: validation and beyond
} Metrics
} Validation strategies
} Impact and implementation
PART C: open discussion on prediction and being an
early career researcher
13. 13
} Explanatory models
• Theory: interest in regression coefficients
• Testing and comparing existing causal theories
e.g. aetiology of illness, effect of treatment
} Predictive models
• Interest in (risk) predictions of future observations
• No concern about causality
• Concerns about overfitting and optimism
e.g. prognostic or diagnostic prediction model
} Descriptive models
• Capture the data structure
14. 14
} Explanatory models
• Theory: interest in regression coefficients
• Testing and comparing existing causal theories
e.g. aetiology of illness, effect of treatment
} Predictive models
• Interest in (risk) predictions of future observations
• No concern about causality
• Concerns about overfitting and optimism
e.g. prognostic or diagnostic prediction model
} Descriptive models
• Capture the data structure
A
L
Y
exposure outcome
confounder
15. 15
} Explanatory models
• Theory: interest in regression coefficients
• Testing and comparing existing causal theories
e.g. aetiology of illness, effect of treatment
} Predictive models
• Interest in (risk) predictions of future observations
• No concern about causality
• Concerns about overfitting and optimism
e.g. prognostic or diagnostic prediction model
} Descriptive models
• Capture the data structure
16. 16
Van Smeden et al. Clinical prediction models: diagnosis versus prognosis, JCE, in press
17. 17
} What is a prediction model?
◦ Mathematical formula (usually logistic or Cox regression,
sometimes machine learning methods)
◦ Combining multiple predictors (independent variables)
◦ The outcome (dependent variable) is usually a diagnosis
or prognosis
◦ Used to predict the outcome in new individuals
} What is the advantage
◦ Uses multiple characteristics, simultanously
◦ Giving each of them appropriate weights
◦ Personalized evidence-based approach to healthcare
(hopefully)
} Medical guidelines are usually binary
(dichotomania!)
◦ Treatment X if: Age>40 OR BMI>30
◦ What about a patient of 39 years old, with a BMI of 29?
24. 24
1. Before getting
started
2. Study design
3. Modelling
strategy
4. Model fitting
5. Model validation
– quantify
predictive
performance
6. Presentation
7. Reporting
8. Model validation
– external test
9. Impact studies
10. Implementation
} Phase 1: model development
} Phase 2: external validation
} Phase 3: impact evaluation
} Phase 4: implementation
25. 25
1. Before getting
started
2. Study design
3. Modelling
strategy
4. Model fitting
5. Model validation
– quantify
predictive
performance
6. Presentation
7. Reporting
8. Model validation
– external test
9. Impact studies
10. Implementation
} Phase 1: model development
} Phase 2: external validation
} Phase 3: impact evaluation
} Phase 4: implementation
26. 26
} Point of intended use of the risk model
• Primary care (paper/computer/app)?
• Secondary care (beside)?
• Low resource setting?
} Complexity
• Number of predictors?
• Transparency of calculation?
• Should it be fast?
29. 29
When one has three or more units (say,
individuals), and for each unit one can
calculate an average score (say, average
blood pressure), then the best guess of
future observations for each unit (say, blood
pressure tomorrow) is NOT the average
score.
30. 30
James and Stein. Estimation with quadratic loss. Proceedings of the fourth Berkeley
symposium on mathematical statistics and probability. Vol. 1. 1961.
33. 33
• Probably among the most surprising (and initially doubted)
phenomena in statistics
• Now a large “family”: shrinkage estimators reduce prediction variance
to an extent that typically outweighs the bias that is introduced
• Bias/variance trade-off principle has motivated many statistical and
machine learning developments
Expected prediction error = irreducible error + bias2 + variance
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45. • 5% reduction in MSPE just by shrinkage
estimator
• Van Houwelingen and le Cessie’s heuristic
shrinkage factor
46. } To explain or to predict?
◦ Prediction often benefits from shrinkage, the
consequence is that regression coefficients are
biased
◦ Explanations with focus on coefficients may not
benefit from the bias that is introduced!
} When is shrinkage needed?
◦ In case risk of overfitting is high
◦ Risk of overfitting is high when sample size is small
(particularly if modelling choices are data driven)
47. Events per variable (EPV) for logistic/survival models:
number of events (smallest outcome group)
number of candidate predictor variables
EPV = 10 commonly used minimal criterion
48. “For EPV values of 10 or greater, no major
problems occurred. For EPV values less than 10,
however, the regression coefficients were biased
in both positive and negative directions”
49. Citations based on Google Scholar, Oct 30 2020
citations: 5,736
“a minimum of 10 EPV […] may be too conservative”
“substantial problems even if the number of EPV exceeds 10”
For EPV values of 10 or greater, no major problems
citations: 2,438
citations: 216
50. Citations based on Google Scholar, Oct 30 2020
citations: 5,736
“a minimum of 10 EPV […] may be too conservative”
“substantial problems even if the number of EPV exceeds 10”
For EPV values of 10 or greater, no major problems
citations: 2,438
citations: 216
51. Citations based on Google Scholar, Oct 30 2020
citations: 5,736
“a minimum of 10 EPV […] may be too conservative”
“substantial problems even if the number of EPV exceeds 10”
For EPV values of 10 or greater, no major problems
citations: 2,438
citations: 216
52. • EPV values for reliable selection of
predictors from a larger set of candidate
predictors may be as large as 50
• Statistical simulation studies on the
minimal EPV rules are highly
heterogeneous and have large problems
• But what if we just use shrinkage?
53. “We conclude that, despite improved performance on average,
shrinkage often worked poorly in individual datasets, in
particular when it was most needed. The results imply that
shrinkage methods do not solve problems associated with
small sample size or low number of events per variable.”
54.
55.
56.
57. } In short:
◦ Minimal sample size requirements for logistic,
survival and continuous outcomes
◦ 4 or 5 criteria to meet
– Minimizing risk of overfitting
– Ensuring sufficiently precise estimation of risk
} Software in R and Stata to simplify
calculations
} Sample size criteria for validation currently
under review
58. 58
1. Before getting
started
2. Study design
3. Modelling
strategy
4. Model fitting
5. Model validation
– quantify
predictive
performance
6. Presentation
7. Reporting
8. Model validation
– external test
9. Impact studies
10. Implementation
} Phase 1: model development
} Phase 2: external validation
} Phase 3: impact evaluation
} Phase 4: implementation
60. 60
Apparent
(Usually too optimistic)
i.e., predictions evaluated on the development
data
Internal
(optimism-corrected)
e.g. bootstrapping
External
n=232 models
22%
48%
20% as part of development study
10% independent
only 5% assessed calibration
doi.org/10.1136/bmj.m1328
62. 62
NICE Framingham AUC 77.6, overestimated risk
Vs.
QRISK2–2011AUC 77.1, well calibrated
Treatment threshold 20%
206 per 1000 men
Vs
110 per 1000 men
67. 67
doi: 10.1016/j.jclinepi.2015.04.005
Optimism-corrected performance =
apparent performance – optimism
1. Draw sample*
2. Built model* in sample* (repeat every step,
incl. variable selection, non-linearities)
• Bootstrap performance is performance of
model* on sample*
3. Apply model* to original Sample
• test performance is performance of
model* on sample
4. Optimism =
bootstrap performance – test performance
5. Repeat 100 times
Rms package
Can bee cumbersome for complex modeling
71. Ultrasound-based risk model for preoperative prediction of lymph-node metastases in women with endometrial cancer
DOI: 10.1002/uog.21950
72.
73. Se
Sp
P
Se
Sp
P
Se
Sp
P
Se Sp P
NB
Net Benefit
= (true positives – w × false
positives)/n
= Se×P – w × (1-Sp) × (1-P)
Within-setting
model
Between-setting
model
Distribution of NB
75. 75
Model not
fit for
purpose
Not validated No impact Regulatory
frameworks
Not adopted in
clinical practice
Suboptimal for improving clinical practice
76. 76
} Analysis of the impact of using the model in
clinical practice
◦ Calculate it
– Cost-effectiveness analysis
◦ Run an experiment
– (Cluster-) randomized trials or pre-post intervention
studies
– SPIRIT AI / CONSORT AI
78. 78
} Before-after study
◦ Fewer adverse perinatal outcomes in nulliparous
women (OR 0.56, 95% CI 0.32 to 0.94)
◦ Lower mean cost per pregnant woman
(-2.766 euro, 95% CI -3.700 to -1.825)
Doi 10.1016/j.ajog.2020.02.036
79. 79
Figure 4 Adherence rates of discussing low-dose aspirin prophylaxis during the study period.
Figure 4 Adherence rates of discussing low-dose aspirin prophylaxis during the study period.
Doi 10.1016/j.ajog.2020.02.036
80. 80
} ‘No time’
} Black box/ new approach / do not trust model
predictions / do not believe it is applicable for
specific patient
} Not (yet) convinced of improvement
◦ Not (yet) aware of current situation
} ‘Aspirin is a medicine, thus potentially harmful’
◦ Difficulty in weighing risks