Presented at 15th International Conference on BioInformatics and BioEngineering (BIBE2014)
Prognostic modeling is central to medicine, as it is often used to predict patients’ outcome and response to treatments and to identify important medical risk factors. Logistic regression is one of the most used approaches for clinical pre- diction modeling. Traumatic brain injury (TBI) is an important public health issue and a leading cause of death and disability worldwide. In this study, we adapt CPXR (Contrast Pattern Aided Regression, a recently introduced regression method), to develop a new logistic regression method called CPXR(Log), for general binary outcome prediction (including prognostic modeling), and we use the method to carry out prognostic modeling for TBI using admission time data. The models produced by CPXR(Log) achieved AUC as high as 0.93 and specificity as high as 0.97, much better than those reported by previous studies. Our method produced interpretable prediction models for diverse patient groups for TBI, which show that different kinds of patients should be evaluated differently for TBI outcome prediction and the odds ratios of some predictor variables differ significantly from those given by previous studies; such results can be valuable to physicians.
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling Results Using the Method on Traumatic Brain Injury
1. Ohio Center of Excellence in Knowledge-Enabled Computing
A new CPXR Based Logistic Regression Method
and Clinical Prognostic Modeling Results Using
the Method on Traumatic Brain Injury
Vahid Taslimitehrani, Guozhu Dong
kno.e.sis center
Department of Computer Science and Engineering
Wright State University
Dayton, OH
1
2. Ohio Center of Excellence in Knowledge-Enabled Computing
Outline
• Motivation and background
• Preliminaries
– Contrast pattern mining
– Logistic regression
• CPXR(Log)
• TBI data
• Results of CXR(Log) on TBI
• Conclusion
• References
2
3. Ohio Center of Excellence in Knowledge-Enabled Computing
Motivation and Background
• CPXR (Log): Accurate and informative prognostic models
Prognostic models are central to medicine. [Steyerberg, 2009]
Facilitate physicians decision making process on patient treatment plan,
screening and etc.
Help to understand the disease behavior including identifying new
biomarkers.
Number of articles listed in PubMed with “prediction model” in title in
2012 is 7 times of that in 2000. [pubmed]
3
4. Ohio Center of Excellence in Knowledge-Enabled Computing
Motivation and Background
• CPXR (Log): A powerful new generic Logistic Regression method
Logistic regression is one of the most popular approaches for building
clinical prediction models. [Steyerberg, 2009]
Logistic regression models are desirable since
They are representable.
They are probabilistic based.
They are flexible in terms of
predictor variables. (categorical
and numerical variables)
4
5. Ohio Center of Excellence in Knowledge-Enabled Computing
Motivation and Background
• Traumatic Brain Injury
One of the leading causes of death and disability worldwide.
Annually, 1.5 million death in worldwide. [Perel, 2006]
$76.5 billion dollars including direct and indirect cost in 2010 in US.
[www.cdc.gov]
Early and accurate prognostic models based on just admission time data
to make time–critical clinical decisions by physicians.
5
6. Ohio Center of Excellence in Knowledge-Enabled Computing
Challenges in clinical modeling
• Accuracy of the clinical prediction models
• Easiness to interpret clinical prediction models
• To explain medical decision to the patient
• To identify important risk factors
• Avoid overfitting to make clinical prediction models more generalizable
• Early decision making
• ABILITY to CAPTURE
– Heterogeneous patient group behavior
6
7. Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR works well by using
several pattern local model pairs
These are different subpopulations that need different
predicted models. Using just one prediction function does
not work well!!
Not an extreme case! It happens very often …
7
8. Ohio Center of Excellence in Knowledge-Enabled Computing
How CPXR(Log) is different from other classifiers?
• CPXR introduced the idea of
– using patterns to logically characterize different
subpopulations of data and
– using local regression models to represent predictor response
relationship of the subpopulation
– choosing a pattern only if the local model is very accurate
[Dong, 2014]
• CPXR(Log)
– can capture diversified/heterogeneous behavior.
– is more generalizable.
– is less overfitting than other classifiers.
• CPXR(Log) is more accurate than other classifiers like SVM and
Random Forest.
8
9. Ohio Center of Excellence in Knowledge-Enabled Computing
Traditional classification vs CPXR
Training Data
Classification
engine
Classifier
(model)
Training
Data
Classification
engine
Baseline
model
• Large
error data
• Small
error data
(Pattern 1, Model 1)
(Pattern 2, Model 2)
(Pattern k, Model k)
.
.
.
Build and select
CPs &
local models
9
10. Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR(Log) – PXR concept
• Definition: Let 𝐷 = 𝑋𝑖, 𝑌𝑖 1 ≤ 𝑖 ≤ 𝑛 be training data for regression. Let
𝑓 be a regression model built on 𝐷, which we will call the baseline
model on 𝐷. A pattern aided regression (PXR) model is a tuple
𝑃𝑀 = ( 𝑃1, 𝑓1, 𝑤1 , … , 𝑃𝑘, 𝑓𝑘, 𝑤 𝑘 , 𝑓𝑑), where {𝑃1, … , 𝑃𝑘} is the pattern set of
𝑃𝑀, 𝑓𝑖s are local regression models of 𝑃𝑖s and 𝑓𝑑 is the default regression
model. We define the regression model of 𝑃𝑀 as
𝑓𝑃𝑀 =
𝑃 𝑖∈𝜋 𝑥
𝑤𝑖 𝑓𝑖(𝑥)
𝑃 𝑖∈𝜋 𝑥
𝑤𝑖
𝑖𝑓 𝜋 𝑥 ≠ 0
𝑓𝑑 𝑥 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
for each instance 𝑥, where 𝜋 𝑥 = 𝑃𝑖 1 ≤ 𝑖 ≤ 𝑘, 𝑥 𝑠𝑎𝑡𝑖𝑠𝑓𝑖𝑒𝑠 𝑃𝑖 .
10
11. Ohio Center of Excellence in Knowledge-Enabled Computing
Preliminaries: Contrast Patterns
• A toy example
• 𝑃1 = 𝐴2 = 𝑐 & 𝐴3 = 𝑒 𝑚𝑡 𝑃1, 𝐷 = 𝑡2, 𝑡3, 𝑡4 𝑠𝑢𝑝𝑝(𝑃1, 𝐷)=
3
5
= 𝟔𝟎%
• 𝑠𝑢𝑝𝑝𝑅𝑎𝑡𝑖𝑜 𝐶1
𝐶2
𝑃 =
2
1
= 𝟐
• Given a threshold like 2, 𝑃1 is a contrast pattern.
• Details: We only consider one minimal generator pattern for each
“equivalency class” of contrast patterns.
TID 𝑨 𝟏 𝑨 𝟐 𝑨 𝟑 𝑨 𝟒 𝑨 𝟓 Class
𝒕 𝟏 b d e g i 𝑪 𝟏
𝒕 𝟐 b c e g i 𝑪 𝟏
𝒕 𝟑 a c e g j 𝑪 𝟐
𝒕 𝟒 a c e h j 𝑪 𝟐
𝒕 𝟓 b d f g i 𝑪 𝟐
12. Ohio Center of Excellence in Knowledge-Enabled Computing
Quality measures
• CPXR(Log) needs to efficiently extract a desirable pattern set from a
huge search space of potential pattern sets.
• Definition: The average residual reduction (arr) of a pattern 𝑃 w.r.t.
a model 𝑓 and a dataset 𝐷 is
𝑎𝑟𝑟 𝑃 =
𝑋 ∈𝑚𝑑𝑠(𝑃) 𝑟𝑋(𝑓) − 𝑋∈𝑚𝑑𝑠(𝑃) 𝑟𝑋(𝑓𝑃)
𝑚𝑑𝑠(𝑃)
• Definition: The total residual reduction (trr) of a pattern set 𝑃𝑆 =
𝑃1, … , 𝑃𝑘 w.r.t a model 𝑓 and a dataset 𝐷 is
𝑡𝑟𝑟 𝑃 =
𝑋 ∈𝑚𝑑𝑠(𝑃𝑆) 𝑟 𝑋(𝑓) − 𝑋∈𝑚𝑑𝑠(𝑃𝑆) 𝑟 𝑋(𝑓 𝑃𝑀)
𝑋∈𝐷 𝑟 𝑋(𝑓)
where 𝑃𝑀 = 𝑃1, 𝑓𝑃1
, 𝑤1 , … , 𝑃𝑘, 𝑓𝑃 𝑘
, 𝑤 𝑘 , 𝑓 , 𝑤𝑖 = 𝑎𝑟𝑟(𝑃𝑖), and 𝑚𝑑𝑠 𝑃𝑆 =
𝑃∈𝑃𝑆 𝑚𝑑𝑠(𝑃).
13. Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR(Log) algorithm -- outline
• First step: split training dataset 𝐷 into two classes, 𝐿𝐸 and 𝑆𝐸.
• 𝐿𝐸: instances of 𝐷 where baseline model 𝑓 makes Large Error.
• 𝑆𝐸: instances of 𝐷 where baseline model 𝑓 makes Small Error.
• Second step: extract all contrast patterns on 𝐿𝐸 satisfying 𝑚𝑖𝑛𝑆𝑢𝑝.
• Third step: search for a small set of pattern to maximize error reduction
and uses that set to build a 𝑃𝑋𝑅 model.
• Note
Each pattern 𝑃 is associated with a local regression model 𝑓𝑃 built on 𝑃’s matching
data.
Using a pattern 𝑃 and its local associated regression model 𝑓𝑃 is a flexible way to
represent one predictor response relationship.
Different (𝑃, 𝑓𝑃) pairs represent highly different predictor response relationships.
13
14. Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR(Log) – details (1)
• Inputs:
• Training data 𝐷 = (𝑥𝑖, 𝑦𝑖) 1 ≤ 𝑖 ≤ 𝑛
• Baseline model 𝑓
• 𝜌 to partition 𝐷 into 𝐿𝐸 and 𝑆𝐸
• 𝑚𝑖𝑛𝑆𝑢𝑝 threshold on contrast patterns
• Output:
• A 𝑃𝑋𝑅 model
Let 𝑟1, … , 𝑟𝑛 denote 𝑓’s error on 𝑥1, … , 𝑥 𝑛;
Determine 𝜅 to minimize 𝜌 −
𝑟 𝑖>𝜅 𝑟 𝑖
𝑟 𝑖
;
Let 𝐿𝐸 = 𝑥𝑖 𝑟𝑖 > 𝜅 , 𝑆𝐸 = 𝐷 − 𝐿𝐸;
Discretize each numerical variable using entropy based binning;
Extract all contrast patterns for 𝑚𝑖𝑛𝑆𝑢𝑝 in the 𝐿𝐸 class (𝐶𝑃𝑆);
14
15. Ohio Center of Excellence in Knowledge-Enabled Computing
CPXR(Log) – details (2)
For each 𝑃 ∈ 𝐶𝑃𝑆, build the local regression model 𝑓𝑃 for data in 𝑚𝑑𝑠(𝑃);
Let 𝑃𝑆 = 𝑃0 , where 𝑃0 is the pattern 𝑃 in 𝐶𝑃𝑆 with highest 𝑎𝑟𝑟;
Let 𝑓𝑑 be the regression model trained from 𝐷 − 𝑃∈𝑃𝑆 𝑚𝑑𝑠(𝑃);
Return 𝑃𝑀(𝑃𝑆, 𝑓𝑑);
15
16. Ohio Center of Excellence in Knowledge-Enabled Computing
TBI data
• TBI dataset is a collection of some International and US Tirilazad trials.
• 2159 instances. [Steyerberg, 2008]
• 15 numerical and categorical predictor variables.
• Missing instances were treated using multiple imputation.
• The outcome variable is the Glascow Outcome Scale: GOS 1 (dead),…,
GOS 5 (good recovery)
• This study used two discretized versions of GOS: “Mortality” vs survival
(GOS1 vs GOS 2-5), “Unfavorable” vs favorable (GOS 1-3 vs GOS 4-5)
Category Predictor variables
Basic Cause of injury, age, GCS motor score, pupil reactivity
Computed
tomography (CT)
Hypoxia, hypotension, Marshall CT, tSAH, eDH,
compressed cistern, midline shift more than 5 mm
Lab Glucose, ph, sodium, hb
16
17. Ohio Center of Excellence in Knowledge-Enabled Computing
Results – Performance of SLogR and CPXR(Log) on Mortality models
Model SLogR CPXR(Log)
Specificity Sensitivity F1 AUC Specificity Sensitivity F1 AUC
Basic 0.95 0.18 0.27 0.77 0.96 0.18 0.28 0.8
Basic+CT 0.95 0.32 0.42 0.8 0.96 0.42 0.53 0.88
Basic+CT+Lab 0.94 0.36 0.46 0.8 0.97 0.46 0.58 0.92
Of course more accurate than standard logistic regression
17
18. Ohio Center of Excellence in Knowledge-Enabled Computing
Results – Performance of SLogR and CPXR(Log) on Unfavorable models
Model SLogR CPXR(Log)
Specificity Sensitivity F1 AUC Specificity Sensitivity F1 AUC
Basic 0.85 0.52 0.59 0.76 0.89 0.54 0.63 0.82
Basic+CT 0.85 0.6 0.66 0.8 0.87 0.65 0.7 0.87
Basic+CT+Lab 0.84 0.61 0.66 0.81 0.91 0.72 0.76 0.93
18
19. Ohio Center of Excellence in Knowledge-Enabled Computing
Results – Impact of adding more variables on AUC
Variable set change Mortality Unfavorable
CPXR(Log) SLogR CPXR(Log) SLogR
Basic Basic +CT 10% 7.7% 6% 5.2%
Basic Basic + CT + Lab 15% 11.1% 13.4% 6.6%
Mortality Unfavorable
Basic Basic+CT Basic+CT+Lab Basic Basic+CT Basic+CT+Lab
11.1% 12.8% 15% 7.9% 8.8% 14.8%
CPXR(Log) over SlogR
AUC improvement when more variables are used by CPXR(Log) and SLogR
19
20. Ohio Center of Excellence in Knowledge-Enabled Computing
Results – ROC curves of Basic models
20
21. Ohio Center of Excellence in Knowledge-Enabled Computing
Results - ROC curves of (Basic + CT) models
21
22. Ohio Center of Excellence in Knowledge-Enabled Computing
Results - ROC curves of (Basic+CT+Lab) models
22
23. Ohio Center of Excellence in Knowledge-Enabled Computing
Results – Performance comparison
CPXR(Log)
Comparing CPXR(Log)
performance with
- Logistic Regression
- SVM
- Random Forest
23
24. Ohio Center of Excellence in Knowledge-Enabled Computing
Example: patterns used by CPXR(Log) & Mortality (Basic+CT+Lab)
patterns arr Cov
(CT classification = III) 15% 20%
(CT classification = V) AND (midline shift) AND (0.56 < glucose <= 10.4) 12% 15%
(No compressed cistern) AND (No midline shift) AND (7.22 < PH <= 7.45) 10% 40%
(10.77 < glucose <= 21.98) AND (134 < sodium <= 144) 18% 18%
(No Hypotension) AND (134 < sodium < 144) AND (10.55 < HB <= 14.57)
AND (with tSAH)
19% 20%
(No tSAH) AND (134 < sodium <= 144) AND (10.77 < glucose <= 21.98)
AND (No Hypotension) AND (No midline shift) AND (One reactive pupil)
19% 20%
(No tSAH) AND (One reactive pupil) 18% 40%
24
25. Ohio Center of Excellence in Knowledge-Enabled Computing
Odds ratios
(CT classification = V) AND (midline shift) AND (0.56 < glucose <= 10.4)
25
26. Ohio Center of Excellence in Knowledge-Enabled Computing
Residual reduction and example patient
• Age = 15 years old
• Cause of injury =
motorbike accident
• GCS motor score = 5
(No eye response)
• No reactive pupil
• No hypoxia
• No hypotension
• CT scan classification = V
(mass lesion)
• No tSAH
• With ePDH
• Has midline shift more
than 5 mm
• Glucose = 9.06 mmol/l
• PH = 7.37
• Sodium = 141 mmol/l
• Hb = 14.4 g/dl
• Patient is dead.
0.78, risk of
survival based on
standard logistic
regression!!!!
0
100
200
300
400
500
600
0 500 1000 1500 2000 2500
Error distribution of TBI dataset on SLogR
Patient is matched
with “pattern II”
and CPXR(Log)
predicted 0.38 risk
of survival.
26
0
2
4
6
8
10
12
0 500 1000 1500 2000 2500
Error distribution of TBI dataset on CPXR(Log)
27. Ohio Center of Excellence in Knowledge-Enabled Computing
Results – Box plot of RMSE reduction in CPXR
• Piecewise linear regression
• Support vector regression
• Bayesian additive regression tree
• Gradient boosting method
How much CPXR can reduce RMSE (Root Mean Square Error) in 50 datasets comparing to
27
28. Ohio Center of Excellence in Knowledge-Enabled Computing
Results – Noise sensitivity and impact of the number of patterns
Number of patterns is determined by
the method automatically.
How much noisy datasets can impact
on the performance of CPXR and
other methods?
28
29. Ohio Center of Excellence in Knowledge-Enabled Computing
Conclusion
• We presented an effective new method, CPXR(Log) for logistic regression
and for clinical predictive modeling.
• We showed CPXR is more accurate than standard logistic regression and
some other classification algorithms.
• We also presented CPXR(Log) models including patterns and local
models an new odds ratios of predictor variables.
29
30. Ohio Center of Excellence in Knowledge-Enabled Computing
References
• Guozhu Dong & Vahid Taslimitehrani. Pattern-Aided Regression
Modeling and Prediction Model Analysis. Tech Report, CSE, Wright State
Univ. 2014.
• E. Steyerberg: Clinical prediction models. Springer, 2009.
• P. Perel, P. Edwards, R. Wentz, and I. Roberts: Systematic review of
prognostic models in traumatic brain injury. BMC medical informatics
and decision making, 6(1): 1-10, 2006.
• G. Dong, J. Li: Efficient mining of emerging patterns: Discovering trends
and differences. In Proc. KDD, 43-52, 1999.
• E.W. Steyerberg, et al: Predicting outcome after traumatic brain injury:
development and international validation of prognostic scores based on
admission characteristics. PLoS medicine, 5(8): e165, 2008.
30
31. Ohio Center of Excellence in Knowledge-Enabled Computing
Preliminaries: Logistic Regression
• Regression modeling: predicting response variable (output) based on
predictor variables (input).
• Logistic regression: the response variable is binary. For example,
• “having the disease” or “not”
• “mortal” or “not”
• Let X=(𝑥1, 𝑥2, … , 𝑥 𝑛) be a vector of predictor variables
• and Y be the response variable.
• The goal of logistic regression is learning a function like
𝑙𝑝 𝑋 = 𝛽0 + 𝑖=1
𝑛
𝛽𝑖 × 𝑥𝑖 satisfying
log
𝑃 𝑌 = 1
𝑃 𝑌 = 1 + 1
= 𝑙𝑝(𝑋)
Chi-square (𝜒2) is one
of the goodness of fit
measures for logistic
regression
31
33. Ohio Center of Excellence in Knowledge-Enabled Computing
Preliminaries: Contrast Patterns
• The matching data of pattern P in dataset D or 𝑚𝑡(𝑃, 𝐷) is the set of all
instances matching pattern P.
• The support of pattern P in D is 𝑠𝑢𝑝𝑝 𝑃, 𝐷 =
𝑚𝑡(𝑃,𝐷)
𝐷
• Given 2 classes 𝐶1 and 𝐶2,the support ratio of pattern P from 𝐶1 to 𝐶2
𝑠𝑢𝑝𝑝𝑅𝑎𝑡𝑖𝑜 𝐶1
𝐶2
𝑃 =
𝑠𝑢𝑝𝑝(𝑃,𝐶2)
𝑠𝑢𝑝𝑝(𝑃,𝐶1)
• Given a threshold 𝛾, a contrast pattern (emerging pattern) of class 𝐶2
is a pattern P satisfying 𝑠𝑢𝑝𝑝𝑅𝑎𝑡𝑖𝑜 𝐶1
𝐶2
𝑃 ≽ 𝛾. [Dong, 1999]
33