SlideShare a Scribd company logo
1 of 33
What to Optimize?
The Heart of Every Analytics Problem	
Predictive Analytics World
New York - October, 2017
John F. Elder, Ph.D.
elder@elderresearch.com
@johnelder4
Charlottesville, VA
Washington, DC
Baltimore, MD
Raleigh, NC
434-973-7673
www.elderresearch.com
Outline
•  Squared error is convenient for the computer"
but not for the client
•  Lift (cumulative response) charts are great,"
but never optimize AUC (area under the curve)
•  You may need to design a custom metric
•  That may require a global search algorithm
•  Brainstorm about the Project goal
•  And what project to tackle in the first place
2
3	
4 Series: (X,Y1) (X,Y2) (X,Y3) (X4,Y4)
rxy	=	0.85	
yLS	=	3	+	0.5x	
MSE	=	1.25	
R2	=	0.67	
X	 Y1	 Y2	 Y3	 X4	 Y4	
10	 8.04	 9.14	 7.46	 8	 6.58	
8	 6.95	 8.14	 6.77	 8	 5.76	
13	 7.58	 8.74	 12.74	 8	 7.71	
9	 8.81	 8.77	 7.11	 8	 8.84	
11	 8.33	 9.26	 7.81	 8	 8.47	
14	 9.96	 8.10	 8.84	 8	 7.04	
6	 7.24	 6.13	 6.08	 8	 5.25	
4	 4.26	 3.10	 5.39	 19	 12.50	
12	 10.84	 9.13	 8.15	 8	 5.56	
7	 4.82	 7.26	 6.42	 8	 7.91	
5	 5.68	 4.74	 5.73	 8	 6.89
Anscomb’s Quartet (1973, American Statistician)
Y1	
X	
2						4						6						8					10				12				14				16				18				20	
14	
	
12	
	
10	
	
8	
	
6	
	
4	
	
2	
Y3	
14	
	
12	
	
10	
	
8	
	
6	
	
4	
	
2	
2						4						6						8					10				12				14				16				18				20	
Y2	
X	
14	
	
12	
	
10	
	
8	
	
6	
	
4	
	
2	
2						4						6						8					10				12				14				16				18				20	
Y4	
14	
	
12	
	
10	
	
8	
	
6	
	
4	
	
2	
2						4						6						8					10				12				14				16				18				20
Datasaurus	Dozen	(David	Smith	5/2/17)
Carl Friedrich Gauss
1)		If	your	model	is	linear	and	your	error	is	squared	then	
there	is	a	closed-form	soluOon	(regression)	
	
	
	
1) Otherwise,	you	are	groping	in	the	dark	(global	search)
2)		If	your	model	is	linear	and	your	error	is	absolute	then	
there	is	an	iteraOve	soluOon	(linear	programming)
3)		Otherwise,	you	need	
to	perform	global	search	
(which	has	no	
guarantees)
Simulated	Annealing	search	path
Nelder-Mead	(Amoeba)	Search	Path
Global	Rd	OpOmizaOon	when	Probes	are	
Expensive	(GROPE)	
•  Class	of	problems	where	goal	is	to	get	to	the	answer	
with	fewest	probes	(funcOon	evaluaOons)	
•  Best	algorithms	are		
–  SDO	(SequenOal	Design	for	OpOmizaOon)	by	Cox	&	John	
(1992,	1997)	
–  GROPE-Canopy	by	Elder	(1992,	1993)
Stock	Market	PredicOon	Thought	Experiment	
•  Say	your	model	predicted	a	10%	price	rise,	from	
$10	to	$11	over	the	next	quarter.	
•  But	the	price	later	actually	rises	to	$14.	
•  How	do	you	feel	about	it?	
•  How	does	the	model	(under	squared	error)	“feel”	
about	it?			
•  14-11=3;	3*3=9.		Had	it	instead	lost	10%	to	$9,	
the	error	of	2	would’ve	led	to	a	squared	error	of	
less	than	half	as	much	(4).			
•  So	the	model	would	have	been	“twice	as	happy”	
if	you’d	lost	10%	instead	of	won	40%.	
•  Something	is	wrong	with	that	metric!
16	16
Trading System Example
Gas Production Saved
19	19	
Using Lift Charts
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1a.		Set	
invesOgaOon	limit	
1b.	Note	
expected	
response	
2a.		Or,	Set	
desired	
response	
2b.		And	note		
work	requirements	
Prospects Ordered by Response Probability
[293-295]
Bound by Random and Perfect Models
A	random	model	(no	
predicOve	power)	would	
be	a	diagonal	line.	
	
A	perfect	model	(right	
predicOon	every	Ome)	
shoots	up	as	fast	as	
possible	to	100%.	The	
slope	depends	on	event	
frequency.
Never	Use	AUC	(Area	Under	the	Curve)	
•  The	area	between	the	lin	curve	and	the	random	
line	(or	the	baseline)	is	onen	maximized.	
•  This	is	never	the	best	thing	to	do	
•  Instead,	figure	out	how	deep	into	the	list	you	
want	to,	or	can,	go.	
•  You	are	either	constrained	by	resources	(#cases	
you	can	invesOgate,	for	instance),	or	there	is	a	
problem-dependent	cost	tradeoff	between	false	
alarms	and	false	dismissals	(false	posiOves	and	
negaOves)
Truth Table (confusion matrix) "
with 25% Threshold
Actual	
OK	 BAD	
Predicted	
OK	 1,352	 136	
BAD	 237	 260
Truth table depends on threshold
Same model,
different cutoff
threshold "
results in different
truth table
(confusion matrix)
 Actual	
OK	 BAD	
Predicted	
OK	 1540	 246	
BAD	 49	 150	
Actual	
OK	 BAD	
Predicted	
OK	 846	 47	
BAD	 743	 349
0	
10	
20	
30	
40	
50	
60	
70	
80	
90	
100	
0	 10	 20	 30	 40	 50	 60	 70	 80	 90	 100	
CumulaOve	%	Captured	Response	
PercenOle	
	
HMEQ	"Bads"	Regression	Model	
Baseline	 Model	 Best	
	Gain	
Cost	 Predicted	Return	 Predicted	Profit
“Multiple Myeloma I have been diagnosed with
Multiple Myeloma (cancer of the bone marrow) and
am currently undergoing treatment to prepare me for
an autologous stem cell transplant. There has been a
brain tumor associated with this, for which I have
had....”
26
Social Security Administration
Disability Approval Prediction
Text	informaOon	in	“AllegaOon	Field”	proved	most	valuable
•  Draw from Bayesian statistics and smooth the raw count with an
empirical prior
–  Use baseline probability of the most probable classification
•  For SSA, roughly 33% of applications approved
–  Counts for each word are initialized with the baseline probability
•  Similar to Shrinkage, James-Stein Estimator, Ridge Regression, etc.
•  Hypothetical Example: Multiple Myeloma
–  Appears 5 times, 4 times was approved = 80% predicted “yes”
–  Prior (given all data) is 33%. If we use an “initial mass of 3 (2 “no” +
1 “yes”) then the total “yes” is 5/8 = 62.5%
•  With no data, results in prior
•  With lots of data, measurement provides probability
•  In between, compromises between measured and prior %
27
Using a Prior: “non-zero initialization”
•  Common aggregations don’t match medical
domain requirements
– SUM: many symptoms increases probability of
predicting approval
– MAX: ignores multiple serious symptoms
– AVG: minor symptoms water down major
symptoms
28
Combining Weights
Business Understanding:"
Desired properties for joining evidence
•  Applicants with multiple severe diseases should be more
likely to be approved
•  A large number of mild ailments should not add up to a
high score that gets an applicant approved
•  Mild ailments should not detract from severe ones
•  Rare diseases should be included, but not with the same
confidence as those with more evidence
•  Calculation of disease severity must be self-adapting to
accommodate rapid changes in the medical field
We designed a joint probability function meeting these constraints
29
If (no data), then use prior
Else If (max(probability) < 0.5) then use that max.
Else:
i.  Ignore concepts with probability < 0.5
ii.  Combine the remaining ones with a log-likelihood
formula and use the resulting joint probability.
30
Our approach to combine evidence (SSA)
31	31	
Higher Level Optimization Issue:"
What is the Goal of the Project?
Aim at the right target
Example: Fraud Detection for international phone calls 
Daryl Pregibon and colleagues at Bell (Shannon) Labs: 
The normal approach would have been to attempt to
classify fraud/nonfraud for general calls
Instead they characterized normal behavior for each
account (phone), then flagged outliers.
Model had features like top 5 countries called, durations
of calls, times of day, days of week, “faxicity” of call, etc. 
All features slowly adapted if changes occurred.

-> A brilliant success.
32	32	
Even Higher-Level Optimization Issue:"
What Project Should you Choose?
ROI
Cost
(Disruption,TechnicalEffort)
Cost	factors	include:	
•  Time	required	
•  DisrupOon	effect	
•  Data	availability	
•  Data	quality	
	
Phantom	inventory
Summary
•  Squared error gives undue power to outliers and is
symmetric, but is very hard to escape.
•  You can always do better than to optimize AUC (but it’s
correlated with success, so don’t throw away its results).
•  Think about what you’re asking the computer to search
for: to solve the hardest problems, you’ll need to design
a custom metric.
•  Get at least a random global search capability ready.
•  Work closely with the client and creative folk to
brainstorm project goals and priorities.
•  If your work isn’t implemented, you failed.
33

More Related Content

What's hot

Scientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientificRevenue
 
1645 track 2 ard_using our laptop
1645 track 2 ard_using our laptop1645 track 2 ard_using our laptop
1645 track 2 ard_using our laptopRising Media, Inc.
 
1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptopRising Media, Inc.
 
CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologySergey Shelpuk
 
1310 keynote levi_using his laptop
1310 keynote levi_using his laptop1310 keynote levi_using his laptop
1310 keynote levi_using his laptopRising Media, Inc.
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
 
Aa proj assited-living_iot
Aa proj assited-living_iotAa proj assited-living_iot
Aa proj assited-living_iotIshanDhoble1
 

What's hot (20)

Scientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talk
 
Machine Learning in Healthcare: A Case Study
Machine Learning in Healthcare: A Case StudyMachine Learning in Healthcare: A Case Study
Machine Learning in Healthcare: A Case Study
 
1645 track 2 ard_using our laptop
1645 track 2 ard_using our laptop1645 track 2 ard_using our laptop
1645 track 2 ard_using our laptop
 
Data Visualization: Sales forecasting
Data Visualization: Sales forecastingData Visualization: Sales forecasting
Data Visualization: Sales forecasting
 
Buzzword scheme
Buzzword schemeBuzzword scheme
Buzzword scheme
 
1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop
 
CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodology
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
1555 track1 alam
1555 track1 alam1555 track1 alam
1555 track1 alam
 
1030 track1 heiler
1030 track1 heiler1030 track1 heiler
1030 track1 heiler
 
1310 keynote levi_using his laptop
1310 keynote levi_using his laptop1310 keynote levi_using his laptop
1310 keynote levi_using his laptop
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Aa proj assited-living_iot
Aa proj assited-living_iotAa proj assited-living_iot
Aa proj assited-living_iot
 
1440 track2 roberts
1440 track2 roberts1440 track2 roberts
1440 track2 roberts
 
Image Analytics In Healthcare
Image Analytics In HealthcareImage Analytics In Healthcare
Image Analytics In Healthcare
 
Machine Learning and Multi Drug Resistant(MDR) Infections case study
Machine Learning and Multi Drug Resistant(MDR) Infections case studyMachine Learning and Multi Drug Resistant(MDR) Infections case study
Machine Learning and Multi Drug Resistant(MDR) Infections case study
 
Machine Learning in ICU mortality prediction
Machine Learning in ICU mortality predictionMachine Learning in ICU mortality prediction
Machine Learning in ICU mortality prediction
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
940 diamond sponsor sengupta
940 diamond sponsor sengupta940 diamond sponsor sengupta
940 diamond sponsor sengupta
 

Viewers also liked

925 plenary rexer_using our laptop
925 plenary rexer_using our laptop925 plenary rexer_using our laptop
925 plenary rexer_using our laptopRising Media, Inc.
 
1140 track 3 ramirez_using our laptop
1140 track 3 ramirez_using our laptop1140 track 3 ramirez_using our laptop
1140 track 3 ramirez_using our laptopRising Media, Inc.
 
1555 track 3 cowan_using our laptop
1555 track 3 cowan_using our laptop1555 track 3 cowan_using our laptop
1555 track 3 cowan_using our laptopRising Media, Inc.
 
1530 track 3 gunther_using our laptop
1530 track 3 gunther_using our laptop1530 track 3 gunther_using our laptop
1530 track 3 gunther_using our laptopRising Media, Inc.
 
1000 track 2 redman_using our laptop
1000 track 2 redman_using our laptop1000 track 2 redman_using our laptop
1000 track 2 redman_using our laptopRising Media, Inc.
 
1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptop1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptopRising Media, Inc.
 

Viewers also liked (9)

925 plenary rexer_using our laptop
925 plenary rexer_using our laptop925 plenary rexer_using our laptop
925 plenary rexer_using our laptop
 
1140 track 3 ramirez_using our laptop
1140 track 3 ramirez_using our laptop1140 track 3 ramirez_using our laptop
1140 track 3 ramirez_using our laptop
 
1555 track 3 cowan_using our laptop
1555 track 3 cowan_using our laptop1555 track 3 cowan_using our laptop
1555 track 3 cowan_using our laptop
 
1530 track 3 gunther_using our laptop
1530 track 3 gunther_using our laptop1530 track 3 gunther_using our laptop
1530 track 3 gunther_using our laptop
 
1645 track 3 porter
1645 track 3 porter1645 track 3 porter
1645 track 3 porter
 
1615 track2 burt-do not share
1615 track2 burt-do not share1615 track2 burt-do not share
1615 track2 burt-do not share
 
1615 track 3 haensel
1615 track 3 haensel1615 track 3 haensel
1615 track 3 haensel
 
1000 track 2 redman_using our laptop
1000 track 2 redman_using our laptop1000 track 2 redman_using our laptop
1000 track 2 redman_using our laptop
 
1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptop1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptop
 

Similar to What to Optimize? The Heart of Every Analytics Problem

MH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMin-hyung Kim
 
Medical Segmentation Decathalon
Medical Segmentation DecathalonMedical Segmentation Decathalon
Medical Segmentation Decathalonimgcommcall
 
Progress in AI and its application to Asset Management.pptx
Progress in AI and its application to Asset Management.pptxProgress in AI and its application to Asset Management.pptx
Progress in AI and its application to Asset Management.pptxDerryn Knife
 
MLPA for health care presentation smc
MLPA for health care presentation   smcMLPA for health care presentation   smc
MLPA for health care presentation smcShaun Comfort
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsAkin Osman Kazakci
 
Meta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxMeta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxACSRM
 
D6 transforming oncology development with adaptive studies - 2011-04
D6   transforming oncology development with adaptive studies - 2011-04D6   transforming oncology development with adaptive studies - 2011-04
D6 transforming oncology development with adaptive studies - 2011-04therealreverendbayes
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningAI Summary
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Evangelos Kritsotakis
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Greg Makowski
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUCS, NcState
 
205250 crystall ball
205250 crystall ball205250 crystall ball
205250 crystall ballp6academy
 
Improving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxImproving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxMaarten van Smeden
 

Similar to What to Optimize? The Heart of Every Analytics Problem (20)

920 plenary elder
920 plenary elder920 plenary elder
920 plenary elder
 
910 plenary Elder
910 plenary Elder910 plenary Elder
910 plenary Elder
 
MH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -clean
 
Medical Segmentation Decathalon
Medical Segmentation DecathalonMedical Segmentation Decathalon
Medical Segmentation Decathalon
 
Progress in AI and its application to Asset Management.pptx
Progress in AI and its application to Asset Management.pptxProgress in AI and its application to Asset Management.pptx
Progress in AI and its application to Asset Management.pptx
 
MLPA for health care presentation smc
MLPA for health care presentation   smcMLPA for health care presentation   smc
MLPA for health care presentation smc
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
Meta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxMeta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptx
 
D6 transforming oncology development with adaptive studies - 2011-04
D6   transforming oncology development with adaptive studies - 2011-04D6   transforming oncology development with adaptive studies - 2011-04
D6 transforming oncology development with adaptive studies - 2011-04
 
Final_Presentation.pptx
Final_Presentation.pptxFinal_Presentation.pptx
Final_Presentation.pptx
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
 
Parkinson disease classification recorded v2.0
Parkinson disease classification recorded   v2.0Parkinson disease classification recorded   v2.0
Parkinson disease classification recorded v2.0
 
Parkinson disease classification v2.0
Parkinson disease classification v2.0Parkinson disease classification v2.0
Parkinson disease classification v2.0
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSU
 
205250 crystall ball
205250 crystall ball205250 crystall ball
205250 crystall ball
 
Analyzing Performance Test Data
Analyzing Performance Test DataAnalyzing Performance Test Data
Analyzing Performance Test Data
 
Improving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxImproving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradox
 

More from Rising Media, Inc.

1620 keynote olson_using our laptop
1620 keynote olson_using our laptop1620 keynote olson_using our laptop
1620 keynote olson_using our laptopRising Media, Inc.
 
1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptop1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptopRising Media, Inc.
 
1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptop1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptopRising Media, Inc.
 
1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptop1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptopRising Media, Inc.
 
1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptop1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptopRising Media, Inc.
 
855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptop855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptopRising Media, Inc.
 
1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareable1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareableRising Media, Inc.
 
905 keynote peele_using our laptop
905 keynote peele_using our laptop905 keynote peele_using our laptop
905 keynote peele_using our laptopRising Media, Inc.
 

More from Rising Media, Inc. (20)

Matt gershoff
Matt gershoffMatt gershoff
Matt gershoff
 
Keynote adam greco
Keynote adam grecoKeynote adam greco
Keynote adam greco
 
1620 keynote olson_using our laptop
1620 keynote olson_using our laptop1620 keynote olson_using our laptop
1620 keynote olson_using our laptop
 
1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptop1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptop
 
1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptop1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptop
 
1415 track 2 richardson
1415 track 2 richardson1415 track 2 richardson
1415 track 2 richardson
 
1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptop1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptop
 
1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptop1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptop
 
915 e metrics_claudia perlich
915 e metrics_claudia perlich915 e metrics_claudia perlich
915 e metrics_claudia perlich
 
855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptop855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptop
 
1615 plack using our laptop
1615 plack using our laptop1615 plack using our laptop
1615 plack using our laptop
 
1530 rimmele do not share
1530 rimmele do not share1530 rimmele do not share
1530 rimmele do not share
 
1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareable1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareable
 
1115 fiztgerald schuchardt
1115 fiztgerald schuchardt1115 fiztgerald schuchardt
1115 fiztgerald schuchardt
 
1000 kondic do not share
1000 kondic do not share1000 kondic do not share
1000 kondic do not share
 
905 keynote peele_using our laptop
905 keynote peele_using our laptop905 keynote peele_using our laptop
905 keynote peele_using our laptop
 
Stephen morse sharable
Stephen morse sharableStephen morse sharable
Stephen morse sharable
 
Elder shareable
Elder shareableElder shareable
Elder shareable
 
1115 ramirez using our laptop
1115 ramirez using our laptop1115 ramirez using our laptop
1115 ramirez using our laptop
 
1000 grandy using our laptop
1000 grandy using our laptop1000 grandy using our laptop
1000 grandy using our laptop
 

Recently uploaded

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一F sss
 

Recently uploaded (20)

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
 

What to Optimize? The Heart of Every Analytics Problem

  • 1. What to Optimize? The Heart of Every Analytics Problem Predictive Analytics World New York - October, 2017 John F. Elder, Ph.D. elder@elderresearch.com @johnelder4 Charlottesville, VA Washington, DC Baltimore, MD Raleigh, NC 434-973-7673 www.elderresearch.com
  • 2. Outline •  Squared error is convenient for the computer" but not for the client •  Lift (cumulative response) charts are great," but never optimize AUC (area under the curve) •  You may need to design a custom metric •  That may require a global search algorithm •  Brainstorm about the Project goal •  And what project to tackle in the first place 2
  • 3. 3 4 Series: (X,Y1) (X,Y2) (X,Y3) (X4,Y4) rxy = 0.85 yLS = 3 + 0.5x MSE = 1.25 R2 = 0.67 X Y1 Y2 Y3 X4 Y4 10 8.04 9.14 7.46 8 6.58 8 6.95 8.14 6.77 8 5.76 13 7.58 8.74 12.74 8 7.71 9 8.81 8.77 7.11 8 8.84 11 8.33 9.26 7.81 8 8.47 14 9.96 8.10 8.84 8 7.04 6 7.24 6.13 6.08 8 5.25 4 4.26 3.10 5.39 19 12.50 12 10.84 9.13 8.15 8 5.56 7 4.82 7.26 6.42 8 7.91 5 5.68 4.74 5.73 8 6.89
  • 4. Anscomb’s Quartet (1973, American Statistician) Y1 X 2 4 6 8 10 12 14 16 18 20 14 12 10 8 6 4 2 Y3 14 12 10 8 6 4 2 2 4 6 8 10 12 14 16 18 20 Y2 X 14 12 10 8 6 4 2 2 4 6 8 10 12 14 16 18 20 Y4 14 12 10 8 6 4 2 2 4 6 8 10 12 14 16 18 20
  • 10.
  • 11.
  • 15. Stock Market PredicOon Thought Experiment •  Say your model predicted a 10% price rise, from $10 to $11 over the next quarter. •  But the price later actually rises to $14. •  How do you feel about it? •  How does the model (under squared error) “feel” about it? •  14-11=3; 3*3=9. Had it instead lost 10% to $9, the error of 2 would’ve led to a squared error of less than half as much (4). •  So the model would have been “twice as happy” if you’d lost 10% instead of won 40%. •  Something is wrong with that metric!
  • 17.
  • 19. 19 19 Using Lift Charts 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1a. Set invesOgaOon limit 1b. Note expected response 2a. Or, Set desired response 2b. And note work requirements Prospects Ordered by Response Probability [293-295]
  • 20. Bound by Random and Perfect Models A random model (no predicOve power) would be a diagonal line. A perfect model (right predicOon every Ome) shoots up as fast as possible to 100%. The slope depends on event frequency.
  • 21. Never Use AUC (Area Under the Curve) •  The area between the lin curve and the random line (or the baseline) is onen maximized. •  This is never the best thing to do •  Instead, figure out how deep into the list you want to, or can, go. •  You are either constrained by resources (#cases you can invesOgate, for instance), or there is a problem-dependent cost tradeoff between false alarms and false dismissals (false posiOves and negaOves)
  • 22.
  • 23. Truth Table (confusion matrix) " with 25% Threshold Actual OK BAD Predicted OK 1,352 136 BAD 237 260
  • 24. Truth table depends on threshold Same model, different cutoff threshold " results in different truth table (confusion matrix) Actual OK BAD Predicted OK 1540 246 BAD 49 150 Actual OK BAD Predicted OK 846 47 BAD 743 349
  • 25. 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 CumulaOve % Captured Response PercenOle HMEQ "Bads" Regression Model Baseline Model Best Gain Cost Predicted Return Predicted Profit
  • 26. “Multiple Myeloma I have been diagnosed with Multiple Myeloma (cancer of the bone marrow) and am currently undergoing treatment to prepare me for an autologous stem cell transplant. There has been a brain tumor associated with this, for which I have had....” 26 Social Security Administration Disability Approval Prediction Text informaOon in “AllegaOon Field” proved most valuable
  • 27. •  Draw from Bayesian statistics and smooth the raw count with an empirical prior –  Use baseline probability of the most probable classification •  For SSA, roughly 33% of applications approved –  Counts for each word are initialized with the baseline probability •  Similar to Shrinkage, James-Stein Estimator, Ridge Regression, etc. •  Hypothetical Example: Multiple Myeloma –  Appears 5 times, 4 times was approved = 80% predicted “yes” –  Prior (given all data) is 33%. If we use an “initial mass of 3 (2 “no” + 1 “yes”) then the total “yes” is 5/8 = 62.5% •  With no data, results in prior •  With lots of data, measurement provides probability •  In between, compromises between measured and prior % 27 Using a Prior: “non-zero initialization”
  • 28. •  Common aggregations don’t match medical domain requirements – SUM: many symptoms increases probability of predicting approval – MAX: ignores multiple serious symptoms – AVG: minor symptoms water down major symptoms 28 Combining Weights
  • 29. Business Understanding:" Desired properties for joining evidence •  Applicants with multiple severe diseases should be more likely to be approved •  A large number of mild ailments should not add up to a high score that gets an applicant approved •  Mild ailments should not detract from severe ones •  Rare diseases should be included, but not with the same confidence as those with more evidence •  Calculation of disease severity must be self-adapting to accommodate rapid changes in the medical field We designed a joint probability function meeting these constraints 29
  • 30. If (no data), then use prior Else If (max(probability) < 0.5) then use that max. Else: i.  Ignore concepts with probability < 0.5 ii.  Combine the remaining ones with a log-likelihood formula and use the resulting joint probability. 30 Our approach to combine evidence (SSA)
  • 31. 31 31 Higher Level Optimization Issue:" What is the Goal of the Project? Aim at the right target Example: Fraud Detection for international phone calls Daryl Pregibon and colleagues at Bell (Shannon) Labs: The normal approach would have been to attempt to classify fraud/nonfraud for general calls Instead they characterized normal behavior for each account (phone), then flagged outliers. Model had features like top 5 countries called, durations of calls, times of day, days of week, “faxicity” of call, etc. All features slowly adapted if changes occurred. -> A brilliant success.
  • 32. 32 32 Even Higher-Level Optimization Issue:" What Project Should you Choose? ROI Cost (Disruption,TechnicalEffort) Cost factors include: •  Time required •  DisrupOon effect •  Data availability •  Data quality Phantom inventory
  • 33. Summary •  Squared error gives undue power to outliers and is symmetric, but is very hard to escape. •  You can always do better than to optimize AUC (but it’s correlated with success, so don’t throw away its results). •  Think about what you’re asking the computer to search for: to solve the hardest problems, you’ll need to design a custom metric. •  Get at least a random global search capability ready. •  Work closely with the client and creative folk to brainstorm project goals and priorities. •  If your work isn’t implemented, you failed. 33