This document discusses the differences between explanatory models, predictive models, and descriptive models. Explanatory models aim to understand causal relationships by examining regression coefficients and testing theories. Predictive models focus on predicting future observations without considering causality, and addressing overfitting. Descriptive models simply capture data structures. While the goals differ, the document notes problems like generalizability and model misspecification are common challenges. It provides examples of epidemiological and medical prediction models and emphasizes the need for external validation of predictive performance.
1. Is it causal, is it prediction or is it neither?
Maarten van Smeden, Department of Clinical Epidemiology,
Leiden University Medical Center, Leiden, Netherlands
Seminar Erasmus School of Health Policy & Management
June 24 2019
4. Cookbook review
4
Schoenfeld & Ioannidis, Am J Clin Nutr 2013, DOI: 10.3945/ajcn.112.047142
“We selected 50 common ingredients from random
recipes of a cookbook”
11. https://bit.ly/2KyLXxo (winner VWN publication prize for best science journalism article in 2018)
Read 19 peer reviewed articles using data from
Dutch cohort studies: 15 had serious limitations
15. To explain or to predict?
Explanatory models
• Theory: interest in regression coefficients
• Testing and comparing existing causal theories
• e.g. aetiology of illness, effect of treatment
Predictive models
• Interest in (risk) predictions of future observations
• No concern about causality
• Concerns about overfitting and optimism
• e.g. prognostic or diagnostic prediction model
Descriptive models
• Capture the data structure
15
Shmueli, Statistical Science 2010, DOI: 10.1214/10-STS330
16. To explain or to predict?
Explanatory models
• Theory: interest in regression coefficients
• Testing and comparing existing causal theories
• e.g. aetiology of illness, effect of treatment
Predictive models
• Interest in (risk) predictions of future observations
• No concern about causality
• Concerns about overfitting and optimism
• e.g. prognostic or diagnostic prediction model
Descriptive models
• Capture the data structure
16
A
L
Y
exposure outcome
confounder
Shmueli, Statistical Science 2010, DOI: 10.1214/10-STS330
17. Causal effect estimate
17
What would have happened with a group of individuals had they
received some treatment or exposure rather than another?
23. Observational study: diet -> diabetes, age
23
Age No diabetes Diabetes No diabetes Diabetes RR
< 50 years 19 1 37 3 1.50
≥ 50 years 28 12 12 8 1.33
Total 47 13 49 11 0.88
Traditional Exotic diet
50%
40%
30%
20%
10%
≥ 50 years
> 50 years
Total
Diabetes
risk
< 50 years
Numerical example adapted from Peter Tennant with permission: http://tiny.cc/ai6o8y
24. Observational study: diet -> diabetes, weight loss
24
Weight No diabetes Diabetes No diabetes Diabetes RR
Lost 19 1 37 3 1.50
Gained 28 12 12 8 1.33
Total 47 13 49 11 0.88
Traditional Exotic diet
50%
40%
30%
20%
10%
Gained wt
Lost wt
Total
Diabetes
risk
< 50 years
Numerical example adapted from Peter Tennant with permission: http://tiny.cc/ai6o8y
30. To explain or to predict?
Explanatory models
• Theory: interest in regression coefficients
• Testing and comparing existing causal theories
• e.g. aetiology of illness, effect of treatment
Predictive models
• Interest in (risk) predictions of future observations
• No concern about causality
• Concerns about overfitting and optimism
• e.g. prognostic or diagnostic prediction model
Descriptive models
• Capture the data structure
30
Shmueli, Statistical Science 2010, DOI: 10.1214/10-STS330
43. Prediction model landscape
>110 models for prostate cancer (Shariat 2008)
>100 models for Traumatic Brain Injury (Perel 2006)
83 models for stroke (Counsell 2001)
54 models for breast cancer (Altman 2009)
43 models for type 2 diabetes (Collins 2011; Dieren 2012)
31 models for osteoporotic fracture (Steurer 2011)
29 models in reproductive medicine (Leushuis 2009)
26 models for hospital readmission (Kansagara 2011)
>25 models for length of stay in cardiac surgery (Ettema 2010)
>350 models for CVD outcomes (Damen 2016)
• Few prediction models are externally validated
• Predictive performance often poor
43
45. To explain or to predict?
Explanatory models
• Theory: interest in regression coefficients
• Testing and comparing existing causal theories
• e.g. aetiology of illness, effect of treatment
Predictive models
• Interest in (risk) predictions of future observations
• No concern about causality
• Concerns about overfitting and optimism
• e.g. prognostic or diagnostic prediction model
Descriptive models
• Capture the data structure
45
Shmueli, Statistical Science 2010, DOI: 10.1214/10-STS330
46. To explain or to predict?
Explanatory models
• Causality
• Understanding the role of elements in complex systems
• ”What will happen if….”
Predictive models
• Forecasting
• Often, focus is on the performance of the forecasting
• “What will happen ….”
Descriptive models
• “What happened?”
46
Require different
research design
and analysis
choices
• Confounding
• Stein’s paradox
• Estimators
47. Problems in common (selection)
• Generalizability/transportability
• Missing values
• Model misspecification
• Measurement and misclassification error
47
https://osf.io/msx8d/
preprint