This talk covers going over the various stages of building data mining models, putting them into production and eventually replacing them. A common theme throughout are three attributes of predictive models: accuracy, generalization and description. I assert you can have it all, and having all three is important for managing the lifecycle. A subtle point is that this is a step to developing embedded, automated data mining systems which can figure out themselves when they need to be updated.
7. 7
R package “caret”
Same parameter search wrapper over 217 algorithms
http://topepo.github.io/caret/index.html
A “section” of a model notebook
Still need to track the results of each section
Model Notebook
Accurate
8. 8
Bad vs. Good
217 R Algorithms
Covered
Do you really want a one-off solution?
• Experimenting with Algorithms
• Experimenting with Algorithm Parameters
• Variable description à refine preprocessing
• :
• Deep Learning architectures have many parameters
and network designs
Accurate
10. Model Notebook
10
Bad vs.
Good
Q) What is the best outcome metric?
ROC, R2, Lift, MAD ….
A) Deployment simulation of cost-value-strategy
Does the business problem mirror the 80-20 rule?
Just act on top 1% or top 5%?
Is the business deployment over all the score range? [0… 1]?
Just over the top 1% or 5% of the score (then NOT ROC, R2, corr)
Are some records 5* or 20* more valuable?
à Use cost-profit weighting, or more complex system Is this taught in
mining
competitions or
classes?
Accurate
in terms
of
business
focus
11. Calculate $ of “Business Pain”
zero
error
Over
Stock
Under
Stock
Need to Deeply
Understand
Business Metrics
Accurate
12. Calculate $ of “Business Pain”
1% bus
pain $
15% business
pain $
zero
error
?
←Equal mistakes →
Unequal PAIN in $
Over
Stock
Under
Stock
Need to Deeply
Understand
Business Metrics
At least use Type I vs.
Type II weighting
Accurate
in terms
of
business
focus
13. Calculate $ of “Business Pain”
No way – that could get you fired!
New progress in getting feedback
Over
Stock
4 week supply
of SKU →
30% off sale
Under
Stock
1% bus
pain $
30% bus
pain $15% business
pain $
zero
error
←Equal mistakes →
Unequal PAIN in $
Accurate
in terms
of
business
focus
14. Model Notebook
Outcome Details
• My Heuristic Design Objectives: (yours may be different)
– Accuracy in deployment
– Reliability and consistent behavior, a general solution
• Use one or more hold-out data sets to check consistency
• Penalize more, as the forecast becomes less consistent
– No penalty for model complexity (if it validates consistently)
– Develop a “smooth, continuous metric” to sort and find
models that perform “best” in future deployment
14
What would
you do?
15. Model Notebook
Outcome Details
• Training = results on the training set
• Validation = results on the validation hold out
• Gap = abs( Training – Validation )
A bigger gap (volatility) is a bigger concern for deployment, a symptom
Minimize Senior VP Heart attacks! (one penalty for volatility)
Set expectations & meet expectations
Regularization helps significantly
• Conservative Result
= worst( Training, Validation) + Gap_penalty
Corr / Lift / Profit → higher is better: Cons Result = min(Trn, Val) - Gap
MAD / RMSE / Risk → lower is better: Cons Result = max(Trn, Val) + Gap
Business Value or Pain ranking = function of( conservative result ) 15
Generalization:
You can’t
optimize
something you
don’t measure
17. Model Notebook Process
Tracking Detail ➔ Training the Data Miner
Input / Test Outcome
Regression
Top
5%
Top
10%
Top
20%
AutoNeural
Neural
Yippeee
!
More
Heuristic Strategy:
• Try a few models of many
algorithm types (seed the search)
• Opportunistically spend
more effort on what is
working (invest in top stocks)
• Still try a few trials on
medium success (diversify,
limited by project time-box)
• Try ensemble methods,
combining model forecasts
& top source vars w/ model
The Data Mining Battle Field