SlideShare a Scribd company logo
1 of 114
Download to read offline
Maarten van Smeden, PhD
Biometrisches Kolloquim
16 March 2021
Development and evaluation of prediction
models: pitfalls and solutions
Statistics in practice I
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Aims of the lectures
• To give our views on the state of the art of clinical prediction modelling
• To highlight common pitfalls and potential solutions when developing or
evaluating clinical prediction models
• To argue the need for less model development and more model evaluation
• To plead for increased attention to quantifying differences in model
performance over time and place
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Agenda
• PART I (11:00 – 12:40)
• State of the medical prediction modeling art
• Just another prediction model
• Methods against overfitting
• Methods for deciding on appropriate sample size
• PART II (13:30 – 15:10)
• Model performance and validation
• Heterogeneity over time and place: there is no such thing as a validated
model
• Applied example: ADNEX model
• Future perspective: machine learning and AI
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Agenda
• PART I (11:00 – 12:40)
• State of the medical prediction modeling art
• Just another prediction model
• Methods against overfitting
• Methods for deciding on appropriate sample size
• PART II (13:30 – 15:10)
• Model performance and validation
• Heterogeneity over time and place: there is no such thing as a validated
model
• Applied example: ADNEX model
• Future perspective: machine learning and AI
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models Source: https://bit.ly/2ODx6c2
State of the medical
prediction modeling art
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
What do we mean with a prediction model?
“… summarize the effects of predictors to provide individualized predictions of
the absolute risk of a diagnostic or prognostic outcome.”
Steyerberg, 2019
”Model development studies aim to derive a prediction model by selecting the
relevant predictors and combining them statistically into a multivariable model.”
TRIPOD statement, 2015
“… may be developed for scientific or clinical reasons, or both”
Altman & Royston, 2000
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
What do I mean with a prediction model?
• Mathematical rule or equation derived from empirical data describing the
relation between a health outcome (Y) and input (X)
• The aim is to find a function ̂
𝑓𝑓(X) that yields accurate individual level
predictions, �
𝑦𝑦𝑖𝑖
• Finding the optimal ̂
𝑓𝑓(X) is usually not only prediction error minimization
problem, but also takes into account requirements regarding transparency,
transportability, costs, sparsity etc.
• Interpretation and contribution of individual components in ̂
𝑓𝑓(X) are not of
primary interest (exception: issues related to fairness)
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
van Smeden et al., JCE, in press
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
APGAR score: the oldest prediction model?
Apgar et al. JAMA, 1958
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Diagnostic prediction model
Wells et al., Lancet, 1997
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Prognostic prediction model
Conroy et al. EHJ, 2003.
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Simple model
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Complex model
Gulshan et al. JAMA, 2016
Diabetic retinopathy
Deep learning (= Neural network)
• 128,000 images
• Transfer learning (preinitialization)
• Sensitivity and specificity > .90
• Estimated from training data
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
“Bedside” prediction models
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
“Bedside” prediction models
Numerical example: 10 year death due to cardiovascular disease
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Why bother?
“As of today, we have deployed the system in 16 hospitals, and it is
performing over 1,300 screenings per day”
MedRxiv pre-print only, 23 March 2020,
doi.org/10.1101/2020.03.19.20039354
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
• Published on 7 April 2020
• 18 days between idea and article acceptance (sprint)
• Invited by BMJ as the first ever living review (marathon)
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Latest version (Update 3)
Wynants et al. BMJ, 2020
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Prediction model related to prognosis and diagnosis
3 main types models
1. Patient X is infected / COVID-19 pneumonia
diagnostic
2. Patient X will die from COVID-19 / need respiratory support
prognostic
3. Currently healthy person X will get severe COVID-19
general population
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Data extraction
• 43 researchers, duplicate reviews
• Extraction form based on CHARMS checklist & PROBAST (127 questions)
• Assessment of each prediction model separately
if more than one was developed/validated
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
UPDATE 3
• 232 models reviewed
• Peer reviewed articles
in Pubmed and Embase
• Pre-prints only until update 2
from bioRxiv, medRxiv, and arXiv
• Search: up to 1 July 2020
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Some key numbers
Origin of data
• Single country: N = 174, 75% (of which from China: 56%)
• Multiple countries: 18%
• Unknown origin: 7%
Target setting
• Patients admitted to hospital: N = 119, 51%
• Patient at triage centre or fever clinic: 5%
• Patients in general practice: 1%
• Other/unclear: 42%
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Some key numbers
Target population
• Confirmed COVID-19: N = 108, 47%
• Suspected COVID-19: 36%
• Other/unclear: 18%
Sample size
• Sample size development, median: 338 (IQR: 134—707)
• number of events, median: 69 (37—160)
• Sample size external validation, median: 189 (76—312)
• Number of events, median: 40 (24—122)
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Models
• Logistic regression (including regularized): 35%
• Neural net (including deep learning): 33%
• Tree-based (including random forest): 7%
• Survival models (including Cox ph): 6%
• SVM: 4%
• Other/unclear: 15%
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Reported performance – AUC range
• General population models: 0.71 to 0.99
• Diagnostic models: 0.65 to 0.99
• Diagnostic severity models: 0.80 to 0.99
• Diagnostic imaging models: 0.70 to 0.99
• Prognosis models: 0.54 to 0.99
prediction horizon varied from 1 to 37 days
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Participants
• Inappropriate or unclear in/exclusion or study design
Predictors
• Scored “unknown” in imaging studies
Outcome
• Subjective or proxy outcomes
Analysis
• Small sample size
• Inappropriate or incomplete evaluation of performance
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Conclusion update 3 living review
“…models are all at high or unclear risk of bias”
We do “not recommend any of the current prediction
models to be used in practice, but one diagnostic and
one prognostic model originated from higher quality
studies and should be (independently) validated in other
datasets”
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
https://twitter.com/EricTopol/status/1328086389024440321
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Roberts et al. Nature ML, 2021
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Gupta, ERJ, 2020
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Gupta, ERJ, 2020
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Landscape of clinical prediction models
• 37 models for treatment response in pulmonary TB (Peetluk, 2021)
• 35 models for in vitro fertilisation (Ratna, 2020)
• 34 models for stroke in type-2 diabetes (Chowdhury, 2019)
• 34 models for graft failure in kidney transplantation (Kabore, 2017)
• 31 models for length of stay in ICU (Verburg, 2016)
• 27 models for pediatric early warning systems (Trubey, 2019)
• 27 models for malaria prognosis (Njim, 2019)
• 26 models for postoperative outcomes colorectal cancer (Souwer, 2020)
• 26 models for childhood asthma (Kothalawa, 2020)
• 25 models for lung cancer risk (Gray, 2016)
• 25 models for re-admission after admitted for heart failure (Mahajan, 2018)
• 23 models for recovery after ischemic stroke (Jampathong, 2018)
• 23 models for delirium in older adults (Lindroth, 2018)
• 21 models for atrial fibrillation detection in community (Himmelreich, 2020)
• 19 models for survival after resectable pancreatic cancer (Stijker, 2019)
• 18 models for recurrence hep. carcinoma after liver transplantation (Al-Ameri, 2020)
• 18 models for future hypertension in children (Hamoen, 2018)
• 18 models for risk of falls after stroke (Walsh, 2016)
• 18 models for mortality in acute pancreatitis (Di, 2016)
• 17 models for bacterial meningitis (van Zeggeren, 2019)
• 17 models for cardiovascular disease in hypertensive population (Cai, 2020)
• 14 models for ICU delirium risk (Chen, 2020)
• 14 models for diabetic retinopathy progression (Haider, 2019)
• 408 models for COPD prognosis (Bellou, 2019)
• 363 models for cardiovascular disease general population (Damen, 2016)
• 263 prognosis models in obstetrics (Kleinrouweler, 2016)
• 258 models mortality after general trauma (Munter, 2017)
• 232 models related to COVID-19 (Wynants, 2020)
• 160 female-specific models for cardiovascular disease (Baart, 2019)
• 119 models for critical care prognosis in LMIC (Haniffa, 2018)
• 101 models for primary gastric cancer prognosis (Feng, 2019)
• 81 models for sudden cardiac arrest (Carrick, 2020)
• 74 models for contrast-induced acute kidney injury (Allen, 2017)
• 73 models for 28/30 day hospital readmission (Zhou, 2016)
• 68 models for preeclampsia (De Kat, 2019)
• 67 models for traumatic brain injury prognosis (Dijkland, 2019)
• 64 models for suicide / suicide attempt (Belsher, 2019)
• 61 models for dementia (Hou, 2019)
• 58 models for breast cancer prognosis (Phung, 2019)
• 52 models for pre‐eclampsia (Townsend, 2019)
• 52 models for colorectal cancer risk (Usher-Smith, 2016)
• 48 models for incident hypertension (Sun, 2017)
• 46 models for melanoma (Kaiser, 2020)
• 46 models for prognosis after carotid revascularisation (Volkers, 2017)
• 43 models for mortality in critically ill (Keuning, 2019)
• 42 models for kidney failure in chronic kidney disease (Ramspek, 2019)
• 40 models for incident heart failure (Sahle, 2017)
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Original: https://xkcd.com/927/
PREDICTION MODELS.
PREDICTION MODELS
MODEL
PREDICTION MODELS.
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Agenda
• PART I (11:00 – 12:40)
• State of the medical prediction modeling art
• Just another prediction model
• Methods against overfitting
• Methods for deciding on appropriate sample size
• PART II (13:30 – 15:10)
• Model performance and validation
• Heterogeneity over time and place: there is no such thing as a validated
model
• Applied example: ADNEX model
• Future perspective: machine learning and AI
Source: https://www.explainxkcd.com/wiki/index.php/1831:_Here_to_Help
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Diagnosis: “starting point of all medical actions”
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
"Usually doctors are right, but conservatively about 15
percent of all people are misdiagnosed.
Some experts think it's as high as 20 to 25 percent"
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
“Patient trust was essential in the healing process.
It could be won by a punctilious bedside manner,
by meticulous explanation,
and by mastery of prognosis, an art of demanding
experience, observation and logic”
Galen, 2nd century AD
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
PROGRESS framework
The PROGRESS (PROGnosis RESearch Strategy) framework classifies
prognosis research into four main types of study:
1. Overall prognosis research
2. Prognostic factor research
3. Prognostic model research
4. Predictors of treatment effect research
More details: https://www.prognosisresearch.com/progress-framework
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Overall prognosis research
Adabag et al. JAMA, 2008
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Prognostic factor research
Letellier et al. BJC, 2017
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Predictors of treatment effect research
Bass et al. JCEM, 2010
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
PROGRESS framework
The PROGRESS (PROGnosis RESearch Strategy) framework classifies
prognosis research into four main types of study:
1. Overall prognosis research Prevalence studies
2. Prognostic factor research Diagnostic test (accuracy) studies
3. Prognostic model research Diagnostic model research
4. Predictors of treatment effect research
More details: https://www.prognosisresearch.com/progress-framework
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Focus of this lecture
The PROGRESS (PROGnosis RESearch Strategy) framework classifies
prognosis research into four main types of study:
1. Overall prognosis research Prevalence studies
2. Prognostic factor research Diagnostic test (accuracy) studies
3. Prognostic model research Diagnostic model research
4. Predictors of treatment effect research
More details: https://www.prognosisresearch.com/progress-framework
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
The phases of prognostic/diagnostic model research
1. Development (including internal validation)
2. External validation(s) and updating
3. Software development and regulations
4. Impact assessment
5. Clinical implementation and scalability
• Most research focusses on development
• Few models are validated
• Fewer models reach the implementation
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Picture courtesy: Laure Wynants
Not fit for purpose
• Wrong target population
• Expensive predictors
• Discrepancies between
development and use
• Complexity/transparency
No validation/impact
• Poor development
• Insufficient reporting
• Incentives
• If done, usually small
studies
Regulation/implementation
• MDR
• Soft- and hardware
• Model updating
• Quality control
Not adopted
• User time
• No prediction needed
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Books (focus on development/validation)
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Typical regression models
Binary logistic model
Pr(𝑌𝑌 = 1) = expit(𝛽𝛽0 + 𝛽𝛽1𝑋𝑋1 + … + 𝛽𝛽𝑃𝑃𝑋𝑋𝑃𝑃)
= exp(𝑋𝑋𝛽𝛽)/[1 + exp(𝑋𝑋𝛽𝛽)]
Multinomial logistic model
Pr(𝑌𝑌 = 𝑗𝑗) = exp(𝑋𝑋𝛽𝛽𝑗𝑗)/[1 + ∑ℎ=1
𝐽𝐽−1
exp(𝑋𝑋𝛽𝛽ℎ) ]
Cox model
ℎ(𝑋𝑋, 𝑡𝑡) = ℎ0 𝑡𝑡 exp(𝛽𝛽1𝑋𝑋1 + … + 𝛽𝛽𝑃𝑃𝑋𝑋𝑃𝑃)
= ℎ0 𝑡𝑡 exp(𝑋𝑋𝛽𝛽)
Remarks
• 𝑋𝑋𝛽𝛽 is the linear predictor
• Most predictive performance
metrics do not directly
generalize between outcomes
(discrimination/calibration)
• Linearity/additivity assumptions
can be relaxed
• Tree based models (e.g.
random forest), SVM and
neural networks seem to be
upcoming
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Model users often unaware of models and equations
http://www.tbi-impact.org/?p=impact/calc
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Reporting
Baseline hazard?
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Example: Systematic review by Bellou et al, 2019
Prognostic models for COPD: 228 articles
(408 prognostic models, 38 external validations)
Not reported:
• 12%: modelling method (e.g. type of regression model)
• 24%: evaluation of discrimination (e.g. area under the ROC curve)
• 64%: missing data handling (e.g. multiple imputation)
• 70%: full model (e.g. no regression coefficients, intercept / baseline hazard)
• 78%: no evaluation of calibration (e.g. calibration plot)
Bellou et al. BMJ, 2019
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Reporting guidelines & risk of bias tools
• TRIPOD: reporting development/validation prediction models
• PROBAST: risk of bias development/validation prediction models
• STARD: reporting diagnostic accuracy studies
• QUADAS-2: risk of bias diagnostic accuracy studies
• REMARK: reporting (tumour marker) prognostic factor studies
• QUIPS: risk of bias in prognostic factor studies
• Currently in development: TRIPOD-AI, TRIPOD-cluster, PROBAST-AI,
STARD-AI, QUADAS-AI, DECIDE-AI etc.
More info: equator-network.org
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Dichotomania
“Dichotomania is an obsessive compulsive disorder to which medical
advisors in particular are prone… Show a medical advisor some continuous
measurements and he or she immediately wonders. ‘Hmm, how can I make
these clinically meaningful? Where can I cut them in two? What ludicrous
side conditions can I impose on this?’”
Stephen Senn
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
It’s all in the title
Source: Gary Collins, https://bit.ly/30Kr3W2
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Unnecessary dichotomizations of predictors
• Biological implausible step-functions in predicted risk
• Source of overfitting when cut-off is chosen based on maximizing
predictive performance
• Loss of information (e.g. Ensor et al. show dichotomizing BMI was
equivalent to throwing away 1/3 of the data)
Dichotomization/categorization remains very prevalent
• Wynants et al. 2020 (COVID prediction models): 48%
• Collins et al. 2011 (Diabetes T2 prediction models): 63%
• Mallett et al. 2010 (Prognostic models in cancer): 70%
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Unnecessary dichotomizations of predictions
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Predictimands
Van Geloven et al. EJE, 2020
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Predictimands
Van Geloven et al. EJE, 2020
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Agenda
• PART I (11:00 – 12:40)
• State of the medical prediction modeling art
• Just another prediction model
• Methods against overfitting
• Methods for deciding on appropriate sample size
• PART II (13:30 – 15:10)
• Model performance and validation
• Heterogeneity over time and place: there is no such thing as a validated
model
• Applied example: ADNEX model
• Future perspective: machine learning and AI
Overfitting =
image source: https://bit.ly/3eENofJ
Original by Dominic Wilcox: http://dominicwilcox.com/?portfolio=bed
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Classical consequence of overfitting
Bell et al. BMJ, 2015
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Shortest routes to overfitting
Steyerberg et al. JCE, 2017
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Shortest routes to overfitting
Steyerberg et al. JCE, 2017
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Shortest routes to overfitting
Steyerberg et al. JCE, 2017
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Routes to preventing overfitting
• Sample size: gather data on a sufficient* number of individuals
• Use it all on model development (avoid training-test splitting)
• Careful candidate predictors preselection
• Especially if data are small
• Avoid and be aware of the winner’s curse
• Data driven variable selection, model selection
• Avoid unnecessary dichotomizations
• Regression shrinkage / penalization / regularization
* Topic of 4th topic of this talk
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Shrinkage
Post-estimation shrinkage factor estimation
• Van Houwelingen & Le Cessie, 1990: uniform shrinkage factor, SVH
• Sauerbrei 1999: parameterwise shrinkage factors
Regularized regression (shrinkage during estimation)
• Ridge regression: L2-penalty on regression coefficient
• Lasso: L1 penalty
• Elastic net: L1 and L2 penalty
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Shrinkage
Post-estimation shrinkage factor estimation
Pr(𝑌𝑌 = 1) = expit[𝛽𝛽0
∗
+ 𝑆𝑆𝑉𝑉𝑉𝑉(𝛽𝛽1𝑋𝑋1 + … + 𝛽𝛽𝑃𝑃𝑋𝑋𝑃𝑃)]
• .factors
Regularized regression (shrinkage during estimation)
Lnℒ𝑝𝑝 = Lnℒ𝑚𝑚𝑚𝑚 − 𝜆𝜆 1 − 𝛼𝛼 �
𝑝𝑝=1
𝑃𝑃
𝛽𝛽𝑝𝑝
2
+ 𝑎𝑎 �
𝑝𝑝=1
𝑃𝑃
|𝛽𝛽𝑝𝑝|
Ridge regression: 𝑎𝑎 = 0, Lasso: 𝑎𝑎 = 1, Elastic net 0 < 𝑎𝑎 < 1
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Uniform shrinkage factor
Shrinkage < 1 means overfitting
lower values = more overfitting
Formula from Riley et al.
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Shrinkage/tuning/penalization hard to estimate
Riley et al. JCE, 2021, DOI: 10.1016/j.jclinepi.2020.12.005
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Van Houwelingen, Statistica Neerlandica, 2001
“shrinkage works on the average but may fail in the particular unique
problem on which the statistician is working.”
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Shrinkage works on the average
Simulation results, averaged over 4k scenarios and 20M datasets
van Smeden et al. SMMR, 2019
EPV
Ridge Lasso Maximum likelihood
Calibration
slope
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Shrinkage may not work on a particular dataset
Van Calster et al. SMMR, 2020
RMSD log(slope): root mean squared distance of log calibration slope to the ideal slope (value = log(1))
Maximum likelihood, Ridge, Lasso
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Shrinkage may not work on a particular dataset
Van Calster et al. SMMR, 2020
RMSD log(slope): root mean squared distance of log calibration slope to the ideal slope (value = log(1))
Maximum likelihood, Ridge, Lasso
“We conclude that, despite improved performance
on average, shrinkage often worked poorly in
individual datasets, in particular when it was most
needed. The results imply that shrinkage methods
do not solve problems associated with small
sample size or low number of events per variable.”
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
What about the variance in estimated risk?
EPV
Root
mean
squared
prediction
error
Ridge Lasso Maximum likelihood
van Smeden et al. SMMR, 2019
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Sources of prediction error
Y = 𝑓𝑓 𝑥𝑥 + 𝜀𝜀
For a model 𝑘𝑘 the expected test prediction error is:
σ2 + bias2 ̂
𝑓𝑓𝑘𝑘 𝑥𝑥 + var ̂
𝑓𝑓𝑘𝑘 𝑥𝑥
See equation 2.46 in Hastie et al., the elements of statistical learning, https://stanford.io/2voWjra
Irreducible error Mean squared prediction error
(with E 𝜀𝜀 = 0, var 𝜀𝜀 = 𝜎𝜎2, values in 𝑥𝑥 are not random)
What we don’t model How we model
≈
≈
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Agenda
• PART I (11:00 – 12:40)
• State of the medical prediction modeling art
• Just another prediction model
• Methods against overfitting
• Methods for deciding on appropriate sample size
• PART II (13:30 – 15:10)
• Model performance and validation
• Heterogeneity over time and place: there is no such thing as a validated
model
• Applied example: ADNEX model
• Future perspective: machine learning and AI
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
You get what you pay for
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Sample size considerations
• Sample size matters when designing a study of any kind
• When designing a prediction study precedes data collection (prospective)
• How many units/individuals/events do I need to collect data on?
• When designing a prediction study on existing data (retrospective)
• Is my dataset large enough to build a model?
• How much model complexity (e.g. number predictors considered) can I
afford?
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
The origin of the 1 in 10 rule
“For EPV values of 10 or greater, no major problems occurred. For EPV
values less than 10, however, the regression coefficients were biased in
both positive and negative directions”
Peduzzi et al. JCE, 1996
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Peduzzi et al. JCE, 1996
?
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
More simulation studies
Citations based on Google Scholar, Oct 30 2020
citations: 5,736
“a minimum of 10 EPV […] may be too conservative”
“substantial problems even if the number of EPV exceeds 10”
For EPV values of 10 or greater, no major problems
citations: 2,438
citations: 216
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
More simulation studies
Citations based on Google Scholar, Oct 30 2020
citations: 5,736
“a minimum of 10 EPV […] may be too conservative”
“substantial problems even if the number of EPV exceeds 10”
For EPV values of 10 or greater, no major problems
citations: 2,438
citations: 216
!?!
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Log(odds) is consistent but finite sample biased
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Schaefer, Stat Med, 1983
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
• Examine the reasons for substantial differences between the earlier EPV
simulation studies
• Evaluate a possible solution to reduce the finite sample bias
van Smeden et al. BMC MRM, 2016
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
• Examine the reasons for substantial differences between the earlier EPV
simulation studies (simulation technicality: handling of “separation”)
• Evaluate a possible solution to reduce the finite sample bias
van Smeden et al. BMC MRM, 2016
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
• Examine the reasons for substantial differences between the earlier EPV
simulation studies (simulation technicality: handling of “separation”)
• Evaluate a possible solution to reduce the finite sample bias
van Smeden et al. BMC MRM, 2016
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
• Firth’s ”correction” aims to reduce finite sample bias in maximum
likelihood estimates, applicable to logistic regression
• It makes clever use of the “Jeffries prior” (from Bayesian literature) to
penalize the log-likelihood, which shrinks the estimated coefficients
• Nice theoretical justifications
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Standard
Averaged over 465 simulation conditions with 10,000 replications each
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Standard
Firth’s
correction
Averaged over 465 simulation conditions with 10,000 replications each
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Firth’s correction and predictive performance
Van Calster, SMMR, 2020, DOI: 10.1177/0962280220921415
RMSD log(slope): root mean squared distance of log calibration slope to the ideal slope (value = log(1))
Maximum likelihood, Firth’s correction
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Beyond events per variable
• Sample size has a big influence on the performance of prediction models
• Giving appropriate weights to predictors is not a simple task
• Challenge is to avoid overfitting and lack of precision in the predictions
• Requires adequate sample size
• How much data is sufficient? Depends
• Model complexity (e.g. # predictors, EPV)
• Signal:noise ratio (e.g. R2 or C-index)
• Performance required (e.g. high vs low stake medical decisions)
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Riley et al., BMJ, 2020
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Our proposal
• Calculate sample size that is needed to
• minimise potential overfitting
• estimate probability (risk) precisely
• Sample size formula’s for
• Continuous outcomes
• Time-to-event outcomes
• Binary outcomes (focus today)
Riley et al., BMJ, 2020
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Example
Gupta et al. Lancet Resp Med, 2020
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Example
• COVID-19 prognosis hospitalized
patients
• Composite outcome: “deterioration”
(in-hospital death, ventilator support,
ICU)
A priori expectations
• Event fraction at least 30%
• 40 candidate predictor parameters
• C-statistic of 0.71(conservative est)
-> Cox-Snell R2 of 0.24
Gupta et al. Lancet Resp Med, 2020
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Restricted cubic splines
with 4 knots: 3 degrees of
freedom
Note: EPV rule also
calculates degrees of
freedom of candidate
predictors, not variables!
Gupta et al. Lancet Resp Med, 2020
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Calculate required sample size
Criterion 1. Shrinkage: expected heuristic shrinkage factor, S ≥ 0.9
(calibration slope, target < 10% overfitting)
Criterion 2. Optimism: Cox-Snell R2 apparent - Cox-Snell R2 validation < 0.05
(overfitting)
Criterion 3: A small margin of error in overall risk estimate < 0.05 absolute error
(precision estimated baseline risk)
(Criterion 4: a small margin of absolute error in the estimated risks)
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Calculation
R code:
> require(pmsampsize)
> pmsampsize(type="b",rsquared=0.24,parameters=40,prevalence=0.3)
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
A few alternative scenarios
• rsquared=0.24,parameters=40,prevalence=0.3 -> EPV≥9.7
• rsquared=0.12,parameters=40,prevalence=0.3 -> EPV≥21.0
• rsquared=0.12,parameters=40,prevalence=0.5 -> EPV≥35.0
• rsquared=0.36,parameters=40,prevalence=0.2 -> EPV≥5
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
The sample size that meets all criteria is the MINIMUM required
• Why minimum? Other criteria may be important. e.g. missing data,
clustering, variable selection
Future directions
• More evidence/guidance to choose values for the sample size criteria
• Sample size for validation (for continuous outcome just published)
• Sample size for different outcomes (e.g. multi-category)
• Sample size taking into account variable selection
• Simulation based approaches
• High-dimensional data?
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Steyerberg, 2019
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Work in collaboration with:
• Carl Moons
• Hans Reitsma
• Ben Van Calster (Leuven)
• Laure Wynants (Maastricht)
• Richard Riley (Keele, materials for this presentation)
• Gary Collins (Oxford, materials for this presentation)
• Ewout Steyerberg (Leiden)
• Many others
Contact: M.vanSmeden@umcutrecht.nl
M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden
Statistics in practice I: development and evaluation of prediction models
Reference list
Adabag et al. JAMA, 2008. doi: 10.1001/jama.2008.553
Altman & Royston, Stat Med, 2000, doi: 10.1002/(SICI)1097-0258(20000229)19:4<453::AID-SIM350>3.0.CO;2-5
Apgar et al. JAMA, 1958. doi: 10.1001/jama.1958.03000150027007
Bass et al., JCEM, 2010, doi:10.1210/jc.2010-0947
Bell et al. BMJ, 2015, doi: 10.1136/bmj.h5639
Bellou et al. BMJ, 2019, doi: 10.1136/bmj.l5358
Conroy et al. EHJ, 2003. doi: 10.1016/S0195-668X(03)00114-3
Collins et al. BMC Med, 2011, doi: 10.1186/1741-7015-9-103
Courvoisier et al. JCE, 2011, doi: /10.1016/j.jclinepi.2010.11.012
Gulshan et al. JAMA, 2016, doi: 10.1001/jama.2016.17216
Gupta et al. ERJ, doi: 10.1183/13993003.03498-2020
Gupta et al. Lancet Resp Med, 2020, doi: 10.1016/S2213-2600(20)30559-2
Letellier et al. BJC, 2017. doi: 10.1038/bjc.2017.352
Mallett et al. 2010, doi: 10.1186/1741-7015-8-21
Peduzzi, JCE, 1996, doi: 10.1016/s0895-4356(96)00236-3
Roberts et al. Nature ML, 2021, doi: 10.1038/s42256-021-00307-0
Steyerberg et al, JCE, 2017, doi: 10.1016/j.jclinepi.2017.11.013
Steyerberg, 2019, doi: /10.1007/978-3-030-16399-0
TRIPOD statement, 2015: Collins et al., Ann Intern Med, doi: 10.7326/M14-0697
Van Calster et al. SMMR, 2020, doi: 10.1177/0962280220921415
van Houwelingen, Statistica Neerlandica, 2001
van Houwelingen & Le Cessie, Stat Med, 1990, doi: 10.1002/sim.4780091109
van Geloven et al. EJE, 2020, doi: 0.1007/s10654-020-00636-1
van Smeden et al. Clinical prediction models: diagnosis versus prognosis, JCE, in press
van Smeden et al., SMMR, 2019, doi: 10.1177/0962280218784726
van Smeden et al. BMC MRM, 2016, doi: 10.1186/s12874-016-0267-3
Vittinghof & McCulloch. AJE, 2007, doi: 10.1093/aje/kwk052
Riley et al. Stat Med, 2019, doi: 10.1002/sim.7992
Riley et al., BMJ, 2020, doi: 10.1136/bmj.m441
Riley et al. JCE, 2021, doi: 10.1016/j.jclinepi.2020.12.005
Sauerbrei & Royston. JRSA-A, 1999, doi: 10.1111/1467-985x.00122
Schaefer. Stat Med, 1983, 10.1002/sim.4780020108
van Smeden et al. SMMR, 2019, doi: 10.1177/0962280218784726
Wells et al. Lancet, 1997, doi: 10.1016/S0140-6736(97)08140-3
Wynants et al. 2020, BMJ, doi: 10.1136/bmj.m1328

More Related Content

What's hot

Introduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIIntroduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIMaarten van Smeden
 
Improving epidemiological research: avoiding the statistical paradoxes and fa...
Improving epidemiological research: avoiding the statistical paradoxes and fa...Improving epidemiological research: avoiding the statistical paradoxes and fa...
Improving epidemiological research: avoiding the statistical paradoxes and fa...Maarten van Smeden
 
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...Maarten van Smeden
 
Rage against the machine learning 2023
Rage against the machine learning 2023Rage against the machine learning 2023
Rage against the machine learning 2023Maarten van Smeden
 
Choosing Regression Models
Choosing Regression ModelsChoosing Regression Models
Choosing Regression ModelsStephen Senn
 
Guideline for high-quality diagnostic and prognostic applications of AI in he...
Guideline for high-quality diagnostic and prognostic applications of AI in he...Guideline for high-quality diagnostic and prognostic applications of AI in he...
Guideline for high-quality diagnostic and prognostic applications of AI in he...Maarten van Smeden
 
ML and AI: a blessing and curse for statisticians and medical doctors
ML and AI: a blessing and curse forstatisticians and medical doctorsML and AI: a blessing and curse forstatisticians and medical doctors
ML and AI: a blessing and curse for statisticians and medical doctorsMaarten van Smeden
 
Correcting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confoundingCorrecting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confoundingMaarten van Smeden
 
A gentle introduction to AI for medicine
A gentle introduction to AI for medicineA gentle introduction to AI for medicine
A gentle introduction to AI for medicineMaarten van Smeden
 
Measurement error in medical research
Measurement error in medical researchMeasurement error in medical research
Measurement error in medical researchMaarten van Smeden
 
Five questions about artificial intelligence
Five questions about artificial intelligenceFive questions about artificial intelligence
Five questions about artificial intelligenceMaarten van Smeden
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianLaure Wynants
 
Real world modified
Real world modifiedReal world modified
Real world modifiedStephen Senn
 
Improving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxImproving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxMaarten van Smeden
 
Inverse variance method of meta-analysis and Cochran's Q
Inverse variance method of meta-analysis and Cochran's QInverse variance method of meta-analysis and Cochran's Q
Inverse variance method of meta-analysis and Cochran's QRizwan S A
 

What's hot (20)

Introduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIIntroduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part II
 
Improving epidemiological research: avoiding the statistical paradoxes and fa...
Improving epidemiological research: avoiding the statistical paradoxes and fa...Improving epidemiological research: avoiding the statistical paradoxes and fa...
Improving epidemiological research: avoiding the statistical paradoxes and fa...
 
Clinical prediction models
Clinical prediction modelsClinical prediction models
Clinical prediction models
 
Predictimands
PredictimandsPredictimands
Predictimands
 
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
 
Rage against the machine learning 2023
Rage against the machine learning 2023Rage against the machine learning 2023
Rage against the machine learning 2023
 
Choosing Regression Models
Choosing Regression ModelsChoosing Regression Models
Choosing Regression Models
 
Guideline for high-quality diagnostic and prognostic applications of AI in he...
Guideline for high-quality diagnostic and prognostic applications of AI in he...Guideline for high-quality diagnostic and prognostic applications of AI in he...
Guideline for high-quality diagnostic and prognostic applications of AI in he...
 
ML and AI: a blessing and curse for statisticians and medical doctors
ML and AI: a blessing and curse forstatisticians and medical doctorsML and AI: a blessing and curse forstatisticians and medical doctors
ML and AI: a blessing and curse for statisticians and medical doctors
 
Correcting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confoundingCorrecting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confounding
 
A gentle introduction to AI for medicine
A gentle introduction to AI for medicineA gentle introduction to AI for medicine
A gentle introduction to AI for medicine
 
Measurement error in medical research
Measurement error in medical researchMeasurement error in medical research
Measurement error in medical research
 
Five questions about artificial intelligence
Five questions about artificial intelligenceFive questions about artificial intelligence
Five questions about artificial intelligence
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatistician
 
Algorithm based medicine
Algorithm based medicineAlgorithm based medicine
Algorithm based medicine
 
Real world modified
Real world modifiedReal world modified
Real world modified
 
Uncertainty in AI
Uncertainty in AIUncertainty in AI
Uncertainty in AI
 
Improving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxImproving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradox
 
P-values in crisis
P-values in crisisP-values in crisis
P-values in crisis
 
Inverse variance method of meta-analysis and Cochran's Q
Inverse variance method of meta-analysis and Cochran's QInverse variance method of meta-analysis and Cochran's Q
Inverse variance method of meta-analysis and Cochran's Q
 

Similar to Development and evaluation of prediction models: pitfalls and solutions

Prediction models for diagnosis and prognosis related to COVID-19
Prediction models for diagnosis and prognosis related to COVID-19Prediction models for diagnosis and prognosis related to COVID-19
Prediction models for diagnosis and prognosis related to COVID-19Maarten van Smeden
 
COVID-19 related prediction models for diagnosis and prognosis - a living sys...
COVID-19 related prediction models for diagnosis and prognosis - a living sys...COVID-19 related prediction models for diagnosis and prognosis - a living sys...
COVID-19 related prediction models for diagnosis and prognosis - a living sys...Maarten van Smeden
 
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
Theory and Practice of Integrating Machine Learning and Conventional Statisti...Theory and Practice of Integrating Machine Learning and Conventional Statisti...
Theory and Practice of Integrating Machine Learning and Conventional Statisti...University of Malaya
 
Living systematic reviews: now and in the future
Living systematic reviews: now and in the futureLiving systematic reviews: now and in the future
Living systematic reviews: now and in the futureMaarten van Smeden
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Evangelos Kritsotakis
 
Building a Comprehensive Health HistoryBuild a health histor.docx
Building a Comprehensive Health HistoryBuild a health histor.docxBuilding a Comprehensive Health HistoryBuild a health histor.docx
Building a Comprehensive Health HistoryBuild a health histor.docxrichardnorman90310
 
Building a Comprehensive Health HistoryBuild a health histor.docx
Building a Comprehensive Health HistoryBuild a health histor.docxBuilding a Comprehensive Health HistoryBuild a health histor.docx
Building a Comprehensive Health HistoryBuild a health histor.docxcurwenmichaela
 
Towards 'smart' healthcare?
Towards 'smart' healthcare?Towards 'smart' healthcare?
Towards 'smart' healthcare?likewildfire
 
Biostatistics.school of public healthppt
Biostatistics.school of public healthpptBiostatistics.school of public healthppt
Biostatistics.school of public healthppttemesgengirma0906
 
Thoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial IntelligenceThoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial IntelligenceMaarten van Smeden
 
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...MIS Quarterly
 
Hospital Information Systems (August 18, 2015)
Hospital Information Systems (August 18, 2015)Hospital Information Systems (August 18, 2015)
Hospital Information Systems (August 18, 2015)Nawanan Theera-Ampornpunt
 
Operations research within UK healthcare: A review
Operations research within UK healthcare: A reviewOperations research within UK healthcare: A review
Operations research within UK healthcare: A reviewHarender Singh
 
2017 06-15 Ctbg Relatiedag 2017, Ede, Alain van Gool
2017 06-15 Ctbg Relatiedag 2017, Ede, Alain van Gool2017 06-15 Ctbg Relatiedag 2017, Ede, Alain van Gool
2017 06-15 Ctbg Relatiedag 2017, Ede, Alain van GoolAlain van Gool
 
future of health UMCG 3 june1630-a safe society
future of health UMCG 3 june1630-a safe societyfuture of health UMCG 3 june1630-a safe society
future of health UMCG 3 june1630-a safe societyLisette Van Gemert-Pijnen
 
Digital_Twin_-_Tina-Paul-Tanveer.pptx
Digital_Twin_-_Tina-Paul-Tanveer.pptxDigital_Twin_-_Tina-Paul-Tanveer.pptx
Digital_Twin_-_Tina-Paul-Tanveer.pptxMohammedSakhlain
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
 

Similar to Development and evaluation of prediction models: pitfalls and solutions (20)

Prediction models for diagnosis and prognosis related to COVID-19
Prediction models for diagnosis and prognosis related to COVID-19Prediction models for diagnosis and prognosis related to COVID-19
Prediction models for diagnosis and prognosis related to COVID-19
 
COVID-19 related prediction models for diagnosis and prognosis - a living sys...
COVID-19 related prediction models for diagnosis and prognosis - a living sys...COVID-19 related prediction models for diagnosis and prognosis - a living sys...
COVID-19 related prediction models for diagnosis and prognosis - a living sys...
 
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
Theory and Practice of Integrating Machine Learning and Conventional Statisti...Theory and Practice of Integrating Machine Learning and Conventional Statisti...
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
 
Innovative Insights for Smarter Care: Care Management and Analytics
Innovative Insights for Smarter Care: Care Management and AnalyticsInnovative Insights for Smarter Care: Care Management and Analytics
Innovative Insights for Smarter Care: Care Management and Analytics
 
Living systematic reviews: now and in the future
Living systematic reviews: now and in the futureLiving systematic reviews: now and in the future
Living systematic reviews: now and in the future
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
 
Science and values. A two-way relations
Science and values. A two-way relationsScience and values. A two-way relations
Science and values. A two-way relations
 
Building a Comprehensive Health HistoryBuild a health histor.docx
Building a Comprehensive Health HistoryBuild a health histor.docxBuilding a Comprehensive Health HistoryBuild a health histor.docx
Building a Comprehensive Health HistoryBuild a health histor.docx
 
Building a Comprehensive Health HistoryBuild a health histor.docx
Building a Comprehensive Health HistoryBuild a health histor.docxBuilding a Comprehensive Health HistoryBuild a health histor.docx
Building a Comprehensive Health HistoryBuild a health histor.docx
 
Towards 'smart' healthcare?
Towards 'smart' healthcare?Towards 'smart' healthcare?
Towards 'smart' healthcare?
 
Biostatistics.school of public healthppt
Biostatistics.school of public healthpptBiostatistics.school of public healthppt
Biostatistics.school of public healthppt
 
Thoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial IntelligenceThoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial Intelligence
 
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...
 
Health IT for Executives
Health IT for ExecutivesHealth IT for Executives
Health IT for Executives
 
Hospital Information Systems (August 18, 2015)
Hospital Information Systems (August 18, 2015)Hospital Information Systems (August 18, 2015)
Hospital Information Systems (August 18, 2015)
 
Operations research within UK healthcare: A review
Operations research within UK healthcare: A reviewOperations research within UK healthcare: A review
Operations research within UK healthcare: A review
 
2017 06-15 Ctbg Relatiedag 2017, Ede, Alain van Gool
2017 06-15 Ctbg Relatiedag 2017, Ede, Alain van Gool2017 06-15 Ctbg Relatiedag 2017, Ede, Alain van Gool
2017 06-15 Ctbg Relatiedag 2017, Ede, Alain van Gool
 
future of health UMCG 3 june1630-a safe society
future of health UMCG 3 june1630-a safe societyfuture of health UMCG 3 june1630-a safe society
future of health UMCG 3 june1630-a safe society
 
Digital_Twin_-_Tina-Paul-Tanveer.pptx
Digital_Twin_-_Tina-Paul-Tanveer.pptxDigital_Twin_-_Tina-Paul-Tanveer.pptx
Digital_Twin_-_Tina-Paul-Tanveer.pptx
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 

More from Maarten van Smeden

Clinical prediction models for covid-19: alarming results from a living syste...
Clinical prediction models for covid-19: alarming results from a living syste...Clinical prediction models for covid-19: alarming results from a living syste...
Clinical prediction models for covid-19: alarming results from a living syste...Maarten van Smeden
 
Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Maarten van Smeden
 
The statistics of the coronavirus
The statistics of the coronavirusThe statistics of the coronavirus
The statistics of the coronavirusMaarten van Smeden
 
The absence of a gold standard: a measurement error problem
The absence of a gold standard: a measurement error problemThe absence of a gold standard: a measurement error problem
The absence of a gold standard: a measurement error problemMaarten van Smeden
 
Anatomy of a successful science thread
Anatomy of a successful science threadAnatomy of a successful science thread
Anatomy of a successful science threadMaarten van Smeden
 

More from Maarten van Smeden (8)

UMC Utrecht AI Methods Lab
UMC Utrecht AI Methods LabUMC Utrecht AI Methods Lab
UMC Utrecht AI Methods Lab
 
Associate professor lecture
Associate professor lectureAssociate professor lecture
Associate professor lecture
 
Clinical prediction models for covid-19: alarming results from a living syste...
Clinical prediction models for covid-19: alarming results from a living syste...Clinical prediction models for covid-19: alarming results from a living syste...
Clinical prediction models for covid-19: alarming results from a living syste...
 
Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead
 
Voorspelmodellen en COVID-19
Voorspelmodellen en COVID-19Voorspelmodellen en COVID-19
Voorspelmodellen en COVID-19
 
The statistics of the coronavirus
The statistics of the coronavirusThe statistics of the coronavirus
The statistics of the coronavirus
 
The absence of a gold standard: a measurement error problem
The absence of a gold standard: a measurement error problemThe absence of a gold standard: a measurement error problem
The absence of a gold standard: a measurement error problem
 
Anatomy of a successful science thread
Anatomy of a successful science threadAnatomy of a successful science thread
Anatomy of a successful science thread
 

Recently uploaded

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsAhmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsoolala9823
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.k64182334
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravitySubhadipsau21168
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 

Recently uploaded (20)

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreamsAhmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
Ahmedabad Call Girls Service 9537192988 can satisfy every one of your dreams
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified Gravity
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 

Development and evaluation of prediction models: pitfalls and solutions

  • 1. Maarten van Smeden, PhD Biometrisches Kolloquim 16 March 2021 Development and evaluation of prediction models: pitfalls and solutions Statistics in practice I
  • 2. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Aims of the lectures • To give our views on the state of the art of clinical prediction modelling • To highlight common pitfalls and potential solutions when developing or evaluating clinical prediction models • To argue the need for less model development and more model evaluation • To plead for increased attention to quantifying differences in model performance over time and place
  • 3. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Agenda • PART I (11:00 – 12:40) • State of the medical prediction modeling art • Just another prediction model • Methods against overfitting • Methods for deciding on appropriate sample size • PART II (13:30 – 15:10) • Model performance and validation • Heterogeneity over time and place: there is no such thing as a validated model • Applied example: ADNEX model • Future perspective: machine learning and AI
  • 4. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Agenda • PART I (11:00 – 12:40) • State of the medical prediction modeling art • Just another prediction model • Methods against overfitting • Methods for deciding on appropriate sample size • PART II (13:30 – 15:10) • Model performance and validation • Heterogeneity over time and place: there is no such thing as a validated model • Applied example: ADNEX model • Future perspective: machine learning and AI
  • 5. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Source: https://bit.ly/2ODx6c2 State of the medical prediction modeling art
  • 6. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models What do we mean with a prediction model? “… summarize the effects of predictors to provide individualized predictions of the absolute risk of a diagnostic or prognostic outcome.” Steyerberg, 2019 ”Model development studies aim to derive a prediction model by selecting the relevant predictors and combining them statistically into a multivariable model.” TRIPOD statement, 2015 “… may be developed for scientific or clinical reasons, or both” Altman & Royston, 2000
  • 7. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models What do I mean with a prediction model? • Mathematical rule or equation derived from empirical data describing the relation between a health outcome (Y) and input (X) • The aim is to find a function ̂ 𝑓𝑓(X) that yields accurate individual level predictions, � 𝑦𝑦𝑖𝑖 • Finding the optimal ̂ 𝑓𝑓(X) is usually not only prediction error minimization problem, but also takes into account requirements regarding transparency, transportability, costs, sparsity etc. • Interpretation and contribution of individual components in ̂ 𝑓𝑓(X) are not of primary interest (exception: issues related to fairness)
  • 8. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models van Smeden et al., JCE, in press
  • 9. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models APGAR score: the oldest prediction model? Apgar et al. JAMA, 1958
  • 10. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Diagnostic prediction model Wells et al., Lancet, 1997
  • 11. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Prognostic prediction model Conroy et al. EHJ, 2003.
  • 12. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Simple model
  • 13. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Complex model Gulshan et al. JAMA, 2016 Diabetic retinopathy Deep learning (= Neural network) • 128,000 images • Transfer learning (preinitialization) • Sensitivity and specificity > .90 • Estimated from training data
  • 14. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models “Bedside” prediction models
  • 15. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models “Bedside” prediction models Numerical example: 10 year death due to cardiovascular disease
  • 16. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models
  • 17. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Why bother? “As of today, we have deployed the system in 16 hospitals, and it is performing over 1,300 screenings per day” MedRxiv pre-print only, 23 March 2020, doi.org/10.1101/2020.03.19.20039354
  • 18. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models
  • 19. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models
  • 20. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models • Published on 7 April 2020 • 18 days between idea and article acceptance (sprint) • Invited by BMJ as the first ever living review (marathon)
  • 21. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Latest version (Update 3) Wynants et al. BMJ, 2020
  • 22. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Prediction model related to prognosis and diagnosis 3 main types models 1. Patient X is infected / COVID-19 pneumonia diagnostic 2. Patient X will die from COVID-19 / need respiratory support prognostic 3. Currently healthy person X will get severe COVID-19 general population
  • 23. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Data extraction • 43 researchers, duplicate reviews • Extraction form based on CHARMS checklist & PROBAST (127 questions) • Assessment of each prediction model separately if more than one was developed/validated
  • 24. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models UPDATE 3 • 232 models reviewed • Peer reviewed articles in Pubmed and Embase • Pre-prints only until update 2 from bioRxiv, medRxiv, and arXiv • Search: up to 1 July 2020
  • 25. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Some key numbers Origin of data • Single country: N = 174, 75% (of which from China: 56%) • Multiple countries: 18% • Unknown origin: 7% Target setting • Patients admitted to hospital: N = 119, 51% • Patient at triage centre or fever clinic: 5% • Patients in general practice: 1% • Other/unclear: 42%
  • 26. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Some key numbers Target population • Confirmed COVID-19: N = 108, 47% • Suspected COVID-19: 36% • Other/unclear: 18% Sample size • Sample size development, median: 338 (IQR: 134—707) • number of events, median: 69 (37—160) • Sample size external validation, median: 189 (76—312) • Number of events, median: 40 (24—122)
  • 27. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Models • Logistic regression (including regularized): 35% • Neural net (including deep learning): 33% • Tree-based (including random forest): 7% • Survival models (including Cox ph): 6% • SVM: 4% • Other/unclear: 15%
  • 28. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models
  • 29. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Reported performance – AUC range • General population models: 0.71 to 0.99 • Diagnostic models: 0.65 to 0.99 • Diagnostic severity models: 0.80 to 0.99 • Diagnostic imaging models: 0.70 to 0.99 • Prognosis models: 0.54 to 0.99 prediction horizon varied from 1 to 37 days
  • 30. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models
  • 31. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Participants • Inappropriate or unclear in/exclusion or study design Predictors • Scored “unknown” in imaging studies Outcome • Subjective or proxy outcomes Analysis • Small sample size • Inappropriate or incomplete evaluation of performance
  • 32. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Conclusion update 3 living review “…models are all at high or unclear risk of bias” We do “not recommend any of the current prediction models to be used in practice, but one diagnostic and one prognostic model originated from higher quality studies and should be (independently) validated in other datasets”
  • 33. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models https://twitter.com/EricTopol/status/1328086389024440321
  • 34. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Roberts et al. Nature ML, 2021
  • 35. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Gupta, ERJ, 2020
  • 36. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Gupta, ERJ, 2020
  • 37. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Landscape of clinical prediction models • 37 models for treatment response in pulmonary TB (Peetluk, 2021) • 35 models for in vitro fertilisation (Ratna, 2020) • 34 models for stroke in type-2 diabetes (Chowdhury, 2019) • 34 models for graft failure in kidney transplantation (Kabore, 2017) • 31 models for length of stay in ICU (Verburg, 2016) • 27 models for pediatric early warning systems (Trubey, 2019) • 27 models for malaria prognosis (Njim, 2019) • 26 models for postoperative outcomes colorectal cancer (Souwer, 2020) • 26 models for childhood asthma (Kothalawa, 2020) • 25 models for lung cancer risk (Gray, 2016) • 25 models for re-admission after admitted for heart failure (Mahajan, 2018) • 23 models for recovery after ischemic stroke (Jampathong, 2018) • 23 models for delirium in older adults (Lindroth, 2018) • 21 models for atrial fibrillation detection in community (Himmelreich, 2020) • 19 models for survival after resectable pancreatic cancer (Stijker, 2019) • 18 models for recurrence hep. carcinoma after liver transplantation (Al-Ameri, 2020) • 18 models for future hypertension in children (Hamoen, 2018) • 18 models for risk of falls after stroke (Walsh, 2016) • 18 models for mortality in acute pancreatitis (Di, 2016) • 17 models for bacterial meningitis (van Zeggeren, 2019) • 17 models for cardiovascular disease in hypertensive population (Cai, 2020) • 14 models for ICU delirium risk (Chen, 2020) • 14 models for diabetic retinopathy progression (Haider, 2019) • 408 models for COPD prognosis (Bellou, 2019) • 363 models for cardiovascular disease general population (Damen, 2016) • 263 prognosis models in obstetrics (Kleinrouweler, 2016) • 258 models mortality after general trauma (Munter, 2017) • 232 models related to COVID-19 (Wynants, 2020) • 160 female-specific models for cardiovascular disease (Baart, 2019) • 119 models for critical care prognosis in LMIC (Haniffa, 2018) • 101 models for primary gastric cancer prognosis (Feng, 2019) • 81 models for sudden cardiac arrest (Carrick, 2020) • 74 models for contrast-induced acute kidney injury (Allen, 2017) • 73 models for 28/30 day hospital readmission (Zhou, 2016) • 68 models for preeclampsia (De Kat, 2019) • 67 models for traumatic brain injury prognosis (Dijkland, 2019) • 64 models for suicide / suicide attempt (Belsher, 2019) • 61 models for dementia (Hou, 2019) • 58 models for breast cancer prognosis (Phung, 2019) • 52 models for pre‐eclampsia (Townsend, 2019) • 52 models for colorectal cancer risk (Usher-Smith, 2016) • 48 models for incident hypertension (Sun, 2017) • 46 models for melanoma (Kaiser, 2020) • 46 models for prognosis after carotid revascularisation (Volkers, 2017) • 43 models for mortality in critically ill (Keuning, 2019) • 42 models for kidney failure in chronic kidney disease (Ramspek, 2019) • 40 models for incident heart failure (Sahle, 2017)
  • 38. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Original: https://xkcd.com/927/ PREDICTION MODELS. PREDICTION MODELS MODEL PREDICTION MODELS.
  • 39. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Agenda • PART I (11:00 – 12:40) • State of the medical prediction modeling art • Just another prediction model • Methods against overfitting • Methods for deciding on appropriate sample size • PART II (13:30 – 15:10) • Model performance and validation • Heterogeneity over time and place: there is no such thing as a validated model • Applied example: ADNEX model • Future perspective: machine learning and AI
  • 41. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Diagnosis: “starting point of all medical actions”
  • 42. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models "Usually doctors are right, but conservatively about 15 percent of all people are misdiagnosed. Some experts think it's as high as 20 to 25 percent"
  • 43. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models “Patient trust was essential in the healing process. It could be won by a punctilious bedside manner, by meticulous explanation, and by mastery of prognosis, an art of demanding experience, observation and logic” Galen, 2nd century AD
  • 44. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models PROGRESS framework The PROGRESS (PROGnosis RESearch Strategy) framework classifies prognosis research into four main types of study: 1. Overall prognosis research 2. Prognostic factor research 3. Prognostic model research 4. Predictors of treatment effect research More details: https://www.prognosisresearch.com/progress-framework
  • 45. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Overall prognosis research Adabag et al. JAMA, 2008
  • 46. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Prognostic factor research Letellier et al. BJC, 2017
  • 47. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Predictors of treatment effect research Bass et al. JCEM, 2010
  • 48. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models PROGRESS framework The PROGRESS (PROGnosis RESearch Strategy) framework classifies prognosis research into four main types of study: 1. Overall prognosis research Prevalence studies 2. Prognostic factor research Diagnostic test (accuracy) studies 3. Prognostic model research Diagnostic model research 4. Predictors of treatment effect research More details: https://www.prognosisresearch.com/progress-framework
  • 49. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Focus of this lecture The PROGRESS (PROGnosis RESearch Strategy) framework classifies prognosis research into four main types of study: 1. Overall prognosis research Prevalence studies 2. Prognostic factor research Diagnostic test (accuracy) studies 3. Prognostic model research Diagnostic model research 4. Predictors of treatment effect research More details: https://www.prognosisresearch.com/progress-framework
  • 50. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models The phases of prognostic/diagnostic model research 1. Development (including internal validation) 2. External validation(s) and updating 3. Software development and regulations 4. Impact assessment 5. Clinical implementation and scalability • Most research focusses on development • Few models are validated • Fewer models reach the implementation
  • 51. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Picture courtesy: Laure Wynants Not fit for purpose • Wrong target population • Expensive predictors • Discrepancies between development and use • Complexity/transparency No validation/impact • Poor development • Insufficient reporting • Incentives • If done, usually small studies Regulation/implementation • MDR • Soft- and hardware • Model updating • Quality control Not adopted • User time • No prediction needed
  • 52. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Books (focus on development/validation)
  • 53. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Typical regression models Binary logistic model Pr(𝑌𝑌 = 1) = expit(𝛽𝛽0 + 𝛽𝛽1𝑋𝑋1 + … + 𝛽𝛽𝑃𝑃𝑋𝑋𝑃𝑃) = exp(𝑋𝑋𝛽𝛽)/[1 + exp(𝑋𝑋𝛽𝛽)] Multinomial logistic model Pr(𝑌𝑌 = 𝑗𝑗) = exp(𝑋𝑋𝛽𝛽𝑗𝑗)/[1 + ∑ℎ=1 𝐽𝐽−1 exp(𝑋𝑋𝛽𝛽ℎ) ] Cox model ℎ(𝑋𝑋, 𝑡𝑡) = ℎ0 𝑡𝑡 exp(𝛽𝛽1𝑋𝑋1 + … + 𝛽𝛽𝑃𝑃𝑋𝑋𝑃𝑃) = ℎ0 𝑡𝑡 exp(𝑋𝑋𝛽𝛽) Remarks • 𝑋𝑋𝛽𝛽 is the linear predictor • Most predictive performance metrics do not directly generalize between outcomes (discrimination/calibration) • Linearity/additivity assumptions can be relaxed • Tree based models (e.g. random forest), SVM and neural networks seem to be upcoming
  • 54. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Model users often unaware of models and equations http://www.tbi-impact.org/?p=impact/calc
  • 55. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Reporting Baseline hazard?
  • 56. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Example: Systematic review by Bellou et al, 2019 Prognostic models for COPD: 228 articles (408 prognostic models, 38 external validations) Not reported: • 12%: modelling method (e.g. type of regression model) • 24%: evaluation of discrimination (e.g. area under the ROC curve) • 64%: missing data handling (e.g. multiple imputation) • 70%: full model (e.g. no regression coefficients, intercept / baseline hazard) • 78%: no evaluation of calibration (e.g. calibration plot) Bellou et al. BMJ, 2019
  • 57. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Reporting guidelines & risk of bias tools • TRIPOD: reporting development/validation prediction models • PROBAST: risk of bias development/validation prediction models • STARD: reporting diagnostic accuracy studies • QUADAS-2: risk of bias diagnostic accuracy studies • REMARK: reporting (tumour marker) prognostic factor studies • QUIPS: risk of bias in prognostic factor studies • Currently in development: TRIPOD-AI, TRIPOD-cluster, PROBAST-AI, STARD-AI, QUADAS-AI, DECIDE-AI etc. More info: equator-network.org
  • 58. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Dichotomania “Dichotomania is an obsessive compulsive disorder to which medical advisors in particular are prone… Show a medical advisor some continuous measurements and he or she immediately wonders. ‘Hmm, how can I make these clinically meaningful? Where can I cut them in two? What ludicrous side conditions can I impose on this?’” Stephen Senn
  • 59. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models It’s all in the title Source: Gary Collins, https://bit.ly/30Kr3W2
  • 60. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Unnecessary dichotomizations of predictors • Biological implausible step-functions in predicted risk • Source of overfitting when cut-off is chosen based on maximizing predictive performance • Loss of information (e.g. Ensor et al. show dichotomizing BMI was equivalent to throwing away 1/3 of the data) Dichotomization/categorization remains very prevalent • Wynants et al. 2020 (COVID prediction models): 48% • Collins et al. 2011 (Diabetes T2 prediction models): 63% • Mallett et al. 2010 (Prognostic models in cancer): 70%
  • 61. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Unnecessary dichotomizations of predictions
  • 62. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Predictimands Van Geloven et al. EJE, 2020
  • 63. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Predictimands Van Geloven et al. EJE, 2020
  • 64. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Agenda • PART I (11:00 – 12:40) • State of the medical prediction modeling art • Just another prediction model • Methods against overfitting • Methods for deciding on appropriate sample size • PART II (13:30 – 15:10) • Model performance and validation • Heterogeneity over time and place: there is no such thing as a validated model • Applied example: ADNEX model • Future perspective: machine learning and AI
  • 65. Overfitting = image source: https://bit.ly/3eENofJ
  • 66. Original by Dominic Wilcox: http://dominicwilcox.com/?portfolio=bed
  • 67. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Classical consequence of overfitting Bell et al. BMJ, 2015
  • 68. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Shortest routes to overfitting Steyerberg et al. JCE, 2017
  • 69. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Shortest routes to overfitting Steyerberg et al. JCE, 2017
  • 70. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Shortest routes to overfitting Steyerberg et al. JCE, 2017
  • 71. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Routes to preventing overfitting • Sample size: gather data on a sufficient* number of individuals • Use it all on model development (avoid training-test splitting) • Careful candidate predictors preselection • Especially if data are small • Avoid and be aware of the winner’s curse • Data driven variable selection, model selection • Avoid unnecessary dichotomizations • Regression shrinkage / penalization / regularization * Topic of 4th topic of this talk
  • 72. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models
  • 73. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models
  • 74. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Shrinkage Post-estimation shrinkage factor estimation • Van Houwelingen & Le Cessie, 1990: uniform shrinkage factor, SVH • Sauerbrei 1999: parameterwise shrinkage factors Regularized regression (shrinkage during estimation) • Ridge regression: L2-penalty on regression coefficient • Lasso: L1 penalty • Elastic net: L1 and L2 penalty
  • 75. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Shrinkage Post-estimation shrinkage factor estimation Pr(𝑌𝑌 = 1) = expit[𝛽𝛽0 ∗ + 𝑆𝑆𝑉𝑉𝑉𝑉(𝛽𝛽1𝑋𝑋1 + … + 𝛽𝛽𝑃𝑃𝑋𝑋𝑃𝑃)] • .factors Regularized regression (shrinkage during estimation) Lnℒ𝑝𝑝 = Lnℒ𝑚𝑚𝑚𝑚 − 𝜆𝜆 1 − 𝛼𝛼 � 𝑝𝑝=1 𝑃𝑃 𝛽𝛽𝑝𝑝 2 + 𝑎𝑎 � 𝑝𝑝=1 𝑃𝑃 |𝛽𝛽𝑝𝑝| Ridge regression: 𝑎𝑎 = 0, Lasso: 𝑎𝑎 = 1, Elastic net 0 < 𝑎𝑎 < 1
  • 76. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Uniform shrinkage factor Shrinkage < 1 means overfitting lower values = more overfitting Formula from Riley et al.
  • 77. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Shrinkage/tuning/penalization hard to estimate Riley et al. JCE, 2021, DOI: 10.1016/j.jclinepi.2020.12.005
  • 78. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Van Houwelingen, Statistica Neerlandica, 2001 “shrinkage works on the average but may fail in the particular unique problem on which the statistician is working.”
  • 79. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Shrinkage works on the average Simulation results, averaged over 4k scenarios and 20M datasets van Smeden et al. SMMR, 2019 EPV Ridge Lasso Maximum likelihood Calibration slope
  • 80. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Shrinkage may not work on a particular dataset Van Calster et al. SMMR, 2020 RMSD log(slope): root mean squared distance of log calibration slope to the ideal slope (value = log(1)) Maximum likelihood, Ridge, Lasso
  • 81. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Shrinkage may not work on a particular dataset Van Calster et al. SMMR, 2020 RMSD log(slope): root mean squared distance of log calibration slope to the ideal slope (value = log(1)) Maximum likelihood, Ridge, Lasso “We conclude that, despite improved performance on average, shrinkage often worked poorly in individual datasets, in particular when it was most needed. The results imply that shrinkage methods do not solve problems associated with small sample size or low number of events per variable.”
  • 82. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models What about the variance in estimated risk? EPV Root mean squared prediction error Ridge Lasso Maximum likelihood van Smeden et al. SMMR, 2019
  • 83. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Sources of prediction error Y = 𝑓𝑓 𝑥𝑥 + 𝜀𝜀 For a model 𝑘𝑘 the expected test prediction error is: σ2 + bias2 ̂ 𝑓𝑓𝑘𝑘 𝑥𝑥 + var ̂ 𝑓𝑓𝑘𝑘 𝑥𝑥 See equation 2.46 in Hastie et al., the elements of statistical learning, https://stanford.io/2voWjra Irreducible error Mean squared prediction error (with E 𝜀𝜀 = 0, var 𝜀𝜀 = 𝜎𝜎2, values in 𝑥𝑥 are not random) What we don’t model How we model ≈ ≈
  • 84. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Agenda • PART I (11:00 – 12:40) • State of the medical prediction modeling art • Just another prediction model • Methods against overfitting • Methods for deciding on appropriate sample size • PART II (13:30 – 15:10) • Model performance and validation • Heterogeneity over time and place: there is no such thing as a validated model • Applied example: ADNEX model • Future perspective: machine learning and AI
  • 85. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models You get what you pay for
  • 86. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Sample size considerations • Sample size matters when designing a study of any kind • When designing a prediction study precedes data collection (prospective) • How many units/individuals/events do I need to collect data on? • When designing a prediction study on existing data (retrospective) • Is my dataset large enough to build a model? • How much model complexity (e.g. number predictors considered) can I afford?
  • 87. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models
  • 88. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models The origin of the 1 in 10 rule “For EPV values of 10 or greater, no major problems occurred. For EPV values less than 10, however, the regression coefficients were biased in both positive and negative directions” Peduzzi et al. JCE, 1996
  • 89. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Peduzzi et al. JCE, 1996 ?
  • 90. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models More simulation studies Citations based on Google Scholar, Oct 30 2020 citations: 5,736 “a minimum of 10 EPV […] may be too conservative” “substantial problems even if the number of EPV exceeds 10” For EPV values of 10 or greater, no major problems citations: 2,438 citations: 216
  • 91. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models More simulation studies Citations based on Google Scholar, Oct 30 2020 citations: 5,736 “a minimum of 10 EPV […] may be too conservative” “substantial problems even if the number of EPV exceeds 10” For EPV values of 10 or greater, no major problems citations: 2,438 citations: 216 !?!
  • 92. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Log(odds) is consistent but finite sample biased
  • 93. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Schaefer, Stat Med, 1983
  • 94. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models • Examine the reasons for substantial differences between the earlier EPV simulation studies • Evaluate a possible solution to reduce the finite sample bias van Smeden et al. BMC MRM, 2016
  • 95. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models • Examine the reasons for substantial differences between the earlier EPV simulation studies (simulation technicality: handling of “separation”) • Evaluate a possible solution to reduce the finite sample bias van Smeden et al. BMC MRM, 2016
  • 96. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models • Examine the reasons for substantial differences between the earlier EPV simulation studies (simulation technicality: handling of “separation”) • Evaluate a possible solution to reduce the finite sample bias van Smeden et al. BMC MRM, 2016
  • 97. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models • Firth’s ”correction” aims to reduce finite sample bias in maximum likelihood estimates, applicable to logistic regression • It makes clever use of the “Jeffries prior” (from Bayesian literature) to penalize the log-likelihood, which shrinks the estimated coefficients • Nice theoretical justifications
  • 98. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Standard Averaged over 465 simulation conditions with 10,000 replications each
  • 99. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Standard Firth’s correction Averaged over 465 simulation conditions with 10,000 replications each
  • 100. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Firth’s correction and predictive performance Van Calster, SMMR, 2020, DOI: 10.1177/0962280220921415 RMSD log(slope): root mean squared distance of log calibration slope to the ideal slope (value = log(1)) Maximum likelihood, Firth’s correction
  • 101. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Beyond events per variable • Sample size has a big influence on the performance of prediction models • Giving appropriate weights to predictors is not a simple task • Challenge is to avoid overfitting and lack of precision in the predictions • Requires adequate sample size • How much data is sufficient? Depends • Model complexity (e.g. # predictors, EPV) • Signal:noise ratio (e.g. R2 or C-index) • Performance required (e.g. high vs low stake medical decisions)
  • 102. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models
  • 103. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Riley et al., BMJ, 2020
  • 104. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Our proposal • Calculate sample size that is needed to • minimise potential overfitting • estimate probability (risk) precisely • Sample size formula’s for • Continuous outcomes • Time-to-event outcomes • Binary outcomes (focus today) Riley et al., BMJ, 2020
  • 105. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Example Gupta et al. Lancet Resp Med, 2020
  • 106. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Example • COVID-19 prognosis hospitalized patients • Composite outcome: “deterioration” (in-hospital death, ventilator support, ICU) A priori expectations • Event fraction at least 30% • 40 candidate predictor parameters • C-statistic of 0.71(conservative est) -> Cox-Snell R2 of 0.24 Gupta et al. Lancet Resp Med, 2020
  • 107. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Restricted cubic splines with 4 knots: 3 degrees of freedom Note: EPV rule also calculates degrees of freedom of candidate predictors, not variables! Gupta et al. Lancet Resp Med, 2020
  • 108. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Calculate required sample size Criterion 1. Shrinkage: expected heuristic shrinkage factor, S ≥ 0.9 (calibration slope, target < 10% overfitting) Criterion 2. Optimism: Cox-Snell R2 apparent - Cox-Snell R2 validation < 0.05 (overfitting) Criterion 3: A small margin of error in overall risk estimate < 0.05 absolute error (precision estimated baseline risk) (Criterion 4: a small margin of absolute error in the estimated risks)
  • 109. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Calculation R code: > require(pmsampsize) > pmsampsize(type="b",rsquared=0.24,parameters=40,prevalence=0.3)
  • 110. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models A few alternative scenarios • rsquared=0.24,parameters=40,prevalence=0.3 -> EPV≥9.7 • rsquared=0.12,parameters=40,prevalence=0.3 -> EPV≥21.0 • rsquared=0.12,parameters=40,prevalence=0.5 -> EPV≥35.0 • rsquared=0.36,parameters=40,prevalence=0.2 -> EPV≥5
  • 111. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models The sample size that meets all criteria is the MINIMUM required • Why minimum? Other criteria may be important. e.g. missing data, clustering, variable selection Future directions • More evidence/guidance to choose values for the sample size criteria • Sample size for validation (for continuous outcome just published) • Sample size for different outcomes (e.g. multi-category) • Sample size taking into account variable selection • Simulation based approaches • High-dimensional data?
  • 112. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Steyerberg, 2019
  • 113. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Work in collaboration with: • Carl Moons • Hans Reitsma • Ben Van Calster (Leuven) • Laure Wynants (Maastricht) • Richard Riley (Keele, materials for this presentation) • Gary Collins (Oxford, materials for this presentation) • Ewout Steyerberg (Leiden) • Many others Contact: M.vanSmeden@umcutrecht.nl
  • 114. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmeden Statistics in practice I: development and evaluation of prediction models Reference list Adabag et al. JAMA, 2008. doi: 10.1001/jama.2008.553 Altman & Royston, Stat Med, 2000, doi: 10.1002/(SICI)1097-0258(20000229)19:4<453::AID-SIM350>3.0.CO;2-5 Apgar et al. JAMA, 1958. doi: 10.1001/jama.1958.03000150027007 Bass et al., JCEM, 2010, doi:10.1210/jc.2010-0947 Bell et al. BMJ, 2015, doi: 10.1136/bmj.h5639 Bellou et al. BMJ, 2019, doi: 10.1136/bmj.l5358 Conroy et al. EHJ, 2003. doi: 10.1016/S0195-668X(03)00114-3 Collins et al. BMC Med, 2011, doi: 10.1186/1741-7015-9-103 Courvoisier et al. JCE, 2011, doi: /10.1016/j.jclinepi.2010.11.012 Gulshan et al. JAMA, 2016, doi: 10.1001/jama.2016.17216 Gupta et al. ERJ, doi: 10.1183/13993003.03498-2020 Gupta et al. Lancet Resp Med, 2020, doi: 10.1016/S2213-2600(20)30559-2 Letellier et al. BJC, 2017. doi: 10.1038/bjc.2017.352 Mallett et al. 2010, doi: 10.1186/1741-7015-8-21 Peduzzi, JCE, 1996, doi: 10.1016/s0895-4356(96)00236-3 Roberts et al. Nature ML, 2021, doi: 10.1038/s42256-021-00307-0 Steyerberg et al, JCE, 2017, doi: 10.1016/j.jclinepi.2017.11.013 Steyerberg, 2019, doi: /10.1007/978-3-030-16399-0 TRIPOD statement, 2015: Collins et al., Ann Intern Med, doi: 10.7326/M14-0697 Van Calster et al. SMMR, 2020, doi: 10.1177/0962280220921415 van Houwelingen, Statistica Neerlandica, 2001 van Houwelingen & Le Cessie, Stat Med, 1990, doi: 10.1002/sim.4780091109 van Geloven et al. EJE, 2020, doi: 0.1007/s10654-020-00636-1 van Smeden et al. Clinical prediction models: diagnosis versus prognosis, JCE, in press van Smeden et al., SMMR, 2019, doi: 10.1177/0962280218784726 van Smeden et al. BMC MRM, 2016, doi: 10.1186/s12874-016-0267-3 Vittinghof & McCulloch. AJE, 2007, doi: 10.1093/aje/kwk052 Riley et al. Stat Med, 2019, doi: 10.1002/sim.7992 Riley et al., BMJ, 2020, doi: 10.1136/bmj.m441 Riley et al. JCE, 2021, doi: 10.1016/j.jclinepi.2020.12.005 Sauerbrei & Royston. JRSA-A, 1999, doi: 10.1111/1467-985x.00122 Schaefer. Stat Med, 1983, 10.1002/sim.4780020108 van Smeden et al. SMMR, 2019, doi: 10.1177/0962280218784726 Wells et al. Lancet, 1997, doi: 10.1016/S0140-6736(97)08140-3 Wynants et al. 2020, BMJ, doi: 10.1136/bmj.m1328