SlideShare a Scribd company logo
1 of 48
Download to read offline
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Information Theoretic Metrics for
Multi-Class Predictor Evaluation
Sam Steingold, Michal Laclav´ık
MLconf Seattle 2016-05-20
Slide 1 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Who We Are: A Fast Growing Startup
One of the fastest growing tech startups – 50% YoY
growth.
Marketing Tech Platform: Email + Site + Ads + Mobile
700+ customers in Retail, Auto, Travel, CPG, Finance
Four Engineering locations: NY, SF, London, Ann Arbor
275+ employees. We are hiring aggressively!
Slide 2 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Table of Contents
Introduction: predictors and their evaluation
Binary Prediction
Multi-Class Prediction
Multi-Label Categorization
Summary
Conclusion
Slide 3 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Predictor
A predictor is a black box which spits out an estimate of an
unknown parameter.
E.g.:
Will it rain tomorrow?
Will this person buy this product?
Is this person a terrorist?
Is this stock a good investment?
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Examples
Perfect – always right
Mislabeled – always the opposite
Random – independent of the actual
San Diego Weather Forecast:
Actual : 3 days of rain per 365 days
Predict : sunshine always!
Coin flip
Actual : true half the time
Predict : true if coin lands Head
Slide 5 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Why Evaluate Predictors?
Which one is better?
How much to pay for one?
You can always flip the coin yourself, so the random predictor is
the least valuable!
When to use this one and not that one?
Slide 6 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Confusion Matrix + Cost
Predictor
Predicted
Sun Rain Hurricane
Actual
Sun 100 10 1
Rain 5 20 6
Hurricane 0 3 2
Costs
Predicted
Sun Rain Hurricane
Actual
Sun 0 1 3
Rain 2 0 2
Hurricane 10 5 0
Total cost (i.e., predictor value) = 45
Slide 7 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Confusion/Costs Matrix
Probability/Costs
Predicted
Target Non-target
Actual
Bought 1% $1 9% $0
Did not buy 9% ($0.1) 81% $0
Profitable
Expected value of one customer: $0.001 > 0.
Worthless!
The Predicted and Actual are independent!
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Table of Contents
Introduction: predictors and their evaluation
Binary Prediction
Multi-Class Prediction
Multi-Label Categorization
Summary
Conclusion
Slide 9 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
What if the Cost is Unknown?
Total
N
Predicted
Population True(PT) False(PF)
Actual
True(AT) TP FN(type2)
False(AF) FP(type1) TN
Perfect : FN = FP = 0
Mislabeled : TP = TN = 0
Random (Predicted & Actual are independent) :
TP
N = PT
N × AT
N
FN
N = PF
N × AT
N
FP
N = PT
N × AF
N
TN
N = PF
N × AF
N
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Metrics Based on the Confusion Matrix
8 partial measures
1. Positive predictive value (PPV, Precision): TP
PT
2. False discovery rate (FDR): FP
PT
3. False omission rate (FOR): FN
PF
4. Negative predictive value (NPV): TN
PF
5. True positive rate (TPR, Sensitivity, Recall): TP
AT
6. False positive rate (FPR, Fall-out): FP
AF
7. False negative rate (FNR): FN
AT
8. True negative rate (TNR, Specificity): TN
AF
Slide 11 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Metrics Based on the Confusion Matrix
4 total measures
1. Accuracy: P(Actual = Predicted).
2. F1: the harmonic average of Precision and Recall
3. Matthew’s Correlation Coefficient (MCC): AKA Pearson
correlation coefficient.
4. Proficiency: the proportion of the information contained
in the Actual distribution which is captured by the
Predictor.
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Metric Requirements
Meaning : the meaning of the metric should be transparent
without resorting to averages of meaningful values
Discrimination :
Weak : its value is 1 for the perfect predictor (and only for
it)
Strong : additionally, its value is 0 for a worthless (random
with any base rate) predictor (and only for such a
predictor)
Universality : the metric should be usable in any setting,
whether binary or multi-class, classification (a unique class
is assigned to each example) or categorization/community
detection (an example can be placed into multiple
categories or communities)
Slide 13 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Accuracy
P(Actual = Predicted) = tp+tn
N
Perfect: 1.
Mislabeled: 0.
Sun Diego Weather Forecast:
Accuracy = 362/365 = 99.2%
The predictor is worthless!
Does not detect random predictors.
Slide 14 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
F1-Score
The harmonic mean of Precision and Recall: 2×tp
2×tp+fp+fn
Perfect: 1
0 if either Precision or Recall is 0
Correctly handles SDWF (because Recall = 0)...
...But only if we label rain as True!
Otherwise: Recall = 100%, Precision = 99.2%,
F1 = 99.6%
F1 is Asymmetric (Positive vs Negative)
Why “1”?! Fβ = (1+β2
)×tp
(1+β2)×tp+β2×fn+fp
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Matthews correlation coefficient
AKA Phi coefficient, Pearson correlation coefficient:
tp × tn − fp × fn
(tp + fp) × (tp + fn) × (fp + tn) × (fn + tn)
Range: [−1; 1]
Perfect: 1
Mislabeled: −1
Random: 0
Handles San Diego Weather Forecast
Hard to generalize to non-binary classifiers.
Slide 16 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Uncertainty coefficient
AKA Proficiency: α = I(Predicted;Actual)
H(Actual)
Range: [0; 1].
Measures the share of information (percentage of bits)
contained in the Actual which is captured by the
Predictor.
1 for both Perfect and Mislabeled predictors.
0 for random predictors.
Handles San Diego Weather Forecast and all the possible
quirks – the best.
Easily generalizes to any number of categories.
Slide 17 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Comparison
i=0
TP=20 FN=0
FP=0 TN=180
i=50
TP=15 FN=5
FP=45 TN=135
i=100
TP=10 FN=10
FP=90 TN=90
i=150
TP=5 FN=15
FP=135 TN=45
i=200
TP=0 FN=20
FP=180 TN=0
0 50 100 150 200
−100−50050100
Binary Predictor Metric Comparison (base rate=10%)
i=0..200; confusion matrix: tp=0.1*(200−i), fn=0.1*i, fp=0.9*i, tn=0.9*(200−i)
metric,%
Accuracy
F−score
Pearson's phi
Proficiency
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
2 Against 2 – take 1
A =
tp = 2 fn = 3
fp = 0 tn = 45
; B =
tp = 5 fn = 0
fp = 7 tn = 38
A B
Proficiency 30.96% 49.86%
Pearson’s φ 61.24% 59.32%
Accuracy 94.00% 86.00%
F1-score 57.14% 58.82%
Slide 19 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
2 Against 2 – take 2
A =
tp = 3 fn = 2
fp = 2 tn = 43
; B =
tp = 5 fn = 0
fp = 7 tn = 38
A B
Proficiency 28.96% 49.86%
Pearson’s φ 55.56% 59.32%
Accuracy 92.00% 86.00%
F1-score 60.00% 58.82%
Slide 20 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Proficiency – The Odd One Out
A =
tp = 3 fn = 2
fp = 1 tn = 44
; B =
tp = 5 fn = 0
fp = 6 tn = 39
A B
Proficiency 35.55% 53.37%
Pearson’s φ 63.89% 62.76%
Accuracy 94.00% 88.00%
F1-score 66.67% 62.50%
Slide 21 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Accuracy – The Odd One Out
A =
tp = 1 fn = 4
fp = 0 tn = 45
; B =
tp = 5 fn = 0
fp = 13 tn = 32
A B
Proficiency 14.77% 34.57%
Pearson’s φ 42.86% 44.44%
Accuracy 92.00% 74.00%
F1-score 33.33% 43.48%
Slide 22 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
F1-score – The Odd One Out
A =
tp = 1 fn = 4
fp = 0 tn = 45
; B =
tp = 2 fn = 3
fp = 2 tn = 43
A B
Proficiency 14.77% 14.71%
Pearson’s φ 42.86% 39.32%
Accuracy 92.00% 90.00%
F1-score 33.33% 44.44%
Slide 23 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Predictor Re-Labeling
For a predictor P, let 1 − P be the re-labeled predictor, i.e.,
when P predicts 1, 1 − P predicts 0 and vice versa.
Then
Accuracy(1 − P) = 1 − Accuracy(P)
φ(1 − P) = −φ(P)
α(1 − P) = α(P)
No similar simple relationship exists for F1.
Slide 24 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Table of Contents
Introduction: predictors and their evaluation
Binary Prediction
Multi-Class Prediction
Multi-Label Categorization
Summary
Conclusion
Slide 25 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Multi-Class Prediction
Examples:
Character recognition
Mislabeling is bad
Group detection
Label rotation is fine
Metrics:
Accuracy = P(Actual = Predicted)
No Recall, Precision, F1!
Slide 26 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Pearson’s φ
Define
φ2
=
χ2
N
=
N
i=1
N
j=1
(Oij − Eij)2
Eij
where
Oij = P(Predicted = i & Actual = j)
Eij = P(Predicted = i) × P(Actual = j)
0 for a worthless (independent) predictor
Perfect predictor: depends on the data
Slide 27 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Proficiency
Same as before!
α =
I(Predicted; Actual)
H(Actual)
H(A) = −
N
i=1
P(A = i) log2 P(A = i)
I(P; A) =
N
i=1
N
j=1
Oij log2
Oij
Eij
0 for the worthless predictor
1 for the perfect (and mis-labeled!) predictor
Slide 28 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
φ vs α
φ is to Chi-squared test
same as
α is to Likelihood-ratio test
Neyman-Pearson lemma
Likelihood-ratio test is the most powerful test.
Slide 29 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
This Metric is Old! Why is it Ignored?
Tradition : My teacher used it
Inertia : I used it previously
Cost : Log is more computationally expensive than ratios
Not anymore!
Intuition : Information Theory is hard
Intuition is learned: start Information Theory in High
School!
Mislabeled = Perfect : Can be confusing or outright
undesirable
Use the Hungarian algorithm
Slide 30 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Table of Contents
Introduction: predictors and their evaluation
Binary Prediction
Multi-Class Prediction
Multi-Label Categorization
Summary
Conclusion
Slide 31 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Multi-Label Categorization
Examples:
Text Categorization
Mislabeling is bad
But may indicate problems with taxonomy
Community Detection
Label rotation is fine
Metrics:
No Accuracy: cannot handle partial matches
Precision & Recall work again!
Slide 32 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Precision & Recall
Recall = i #{objects correctly classified as ci }
i #{objects actually in ci }
= i #{oj | ci ∈ Actual(oj ) ∩ Predicted(oj )}
i #{oj | ci ∈ Actual(oj )}
=
j #[Actual(oj ) ∩ Predicted(oj )]
j #Actual(oj )
Precision = i #{objects correctly classified as ci }
i #{objects classified as ci }
= i #{oj | ci ∈ Actual(oj ) ∩ Predicted(oj )}
i #{oj | ci ∈ Predicted(oj )}
=
j #[Actual(oj ) ∩ Predicted(oj )]
j #Predicted(oj )
Slide 33 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Precision & Recall – ?!
The above is the “macro” Precision & Recall (and F1)
Can also define “micro” Precision & Recall (and F1)
There is some confusion as to which is which
Side Note
Single label per object =⇒
Precision = Recall = Accuracy = F1
Slide 34 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Proficiency: Definition
Introduce binary random variables:
Aci := ci ∈ Actual
Pci := ci ∈ Predicted
Define:
α =
I( i Pci; i Aci)
H( i Aci)
Problem: cannot compute!
KDD Cup 2005 Taxonomy: 67 categories
Cartesian product: 267
> 1020
800k examples
Slide 35 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Proficiency: Estimate
Numerator : Assume that Aci is independent of everything
but Pci (similar to Na¨ıve Bayes).
Denominator : Use H(A × B) ≥ H(A) + H(B)
Define:
α = i I(Pci; Aci)
i H(Aci)
= i H(Aci)α(Pci, Aci)
i H(Aci)
where
α(Pci, Aci) =
I(Pci; Aci)
H(Aci)
Slide 36 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Proficiency: Permuted
Recover re-labeling invariance
Let M(c) be the optimal assignment with the cost matrix
being the pairwise mutual informations.
Define the Permuted Proficiency metric:
α = i I(M(Pci); Aci)
i H(Aci)
= i H(Aci)α(M(Pci), Aci)
i H(Aci)
M is optimal =⇒ α ≤ α
(equality iff the optimal assignment is the identity.)
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Proficiency: Properties
Meaning
(An estimate of) the share of the information contained in the
actual distribution recovered by the classifier.
Strong Discrimination
Yes!
Universality
The independence assumption above weakens the claim that
the metric has the same meaning across all domains and data
sets.
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Example: KDD Cup 2005
800 queries
67 categories
3 human labelers
Actual labeler 1 labeler 2 labeler 3
Predicted labeler 2 labeler 3 labeler 1
Precision 63.48% 36.50% 58.66%
Recall 41.41% 58.62% 55.99%
α 24.73% 28.06% 33.26%
α 25.02% 28.62% 33.51%
Reassigned 9 12 11
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Each Human Against Dice
Pit each of the three human labelers against the random
labeler with the same category probability distribution:
Labeler 1 Labeler 2 Labeler 3
F1 14.3% 7.7% 19.2%
examples/category 3.7 ± 1.1 2.4 ± 0.9 3.8 ± 1.1
categories/example 44 ± 56 28 ± 31 48 ± 71
More examples/category and categories/example =⇒ higher
F1!
Slide 40 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Example: Academic Setting
Consider a typical University department:
Every professor serves on 9 administrative committees out of
10 available.
Worthless Predictor
Assign each professor to 9 random committees.
Performance
Precision = Recall = F1 = 90%
Proficiency: α = 0
Slide 41 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Example: Brand Detection
Laclavik, Steingold, Ciglan, WSDM 2016
10k queries, 1k brands
90% of queries have no brands, 9% - one brand
Precision = 30%, Recall = 93%, F1 = 45%.
Proficiency: α = 67%!
Precision & Recall don’t care about no-brand queries!
Add 10k queries without brands, Precision & Recall are
the same, Proficiency: α = 69%.
Replace the empty brands with the None brand.
Original:
Precision = 78%, Recall = 81%, F1 = 80%, α = 61%.
Extra empties:
Precision = 89%, Recall = 91%, F1 = 90%, α = 65%.
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Numeric Stability
Think of the data as an infinite stream of observations,
and view the actually available data as a sample.
How would the metrics change if the sample is different?
All metrics have approximately the same variability
(standard deviation):
≈ 1% for 800 observations of KDD Cup 2005
≈ 0.5% for 10,000 observations in the Magnetic data set
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Table of Contents
Introduction: predictors and their evaluation
Binary Prediction
Multi-Class Prediction
Multi-Label Categorization
Summary
Conclusion
Slide 44 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Summary
If you know the costs – use the expected value.
If you know what you want (Recall/Precision &c) – use it.
If you want a general metric, use Proficiency instead of F1.
Slide 45 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Implementation
Python code in
https://github.com/Magnetic/proficiency-metric
Contributions of implementations in other languages are
welcome!
Slide 46 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Table of Contents
Introduction: predictors and their evaluation
Binary Prediction
Multi-Class Prediction
Multi-Label Categorization
Summary
Conclusion
Slide 47 of 48
Information
Theoretic Metrics
for Multi-Class
Predictor
Evaluation
Sam Steingold,
Michal Laclav´ık
Introduction:
predictors and
their evaluation
Binary Prediction
Multi-Class
Prediction
Multi-Label
Categorization
Summary
Conclusion
Conclusion
Questions?
Slide 48 of 48

More Related Content

Viewers also liked

Amy Langville, Professor of Mathematics, The College of Charleston in South C...
Amy Langville, Professor of Mathematics, The College of Charleston in South C...Amy Langville, Professor of Mathematics, The College of Charleston in South C...
Amy Langville, Professor of Mathematics, The College of Charleston in South C...MLconf
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016MLconf
 
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016MLconf
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
 
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016MLconf
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016MLconf
 
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016MLconf
 
Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...
Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...
Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...MLconf
 
Text Categorization Using Improved K Nearest Neighbor Algorithm
Text Categorization Using Improved K Nearest Neighbor AlgorithmText Categorization Using Improved K Nearest Neighbor Algorithm
Text Categorization Using Improved K Nearest Neighbor AlgorithmIJTET Journal
 
Florian Tramèr, Researcher, EPFL at MLconf SEA - 5/20/16
Florian Tramèr, Researcher, EPFL at MLconf SEA - 5/20/16Florian Tramèr, Researcher, EPFL at MLconf SEA - 5/20/16
Florian Tramèr, Researcher, EPFL at MLconf SEA - 5/20/16MLconf
 
Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16
Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16
Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16MLconf
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...MLconf
 
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016MLconf
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
 
Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016
Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016
Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016MLconf
 
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16MLconf
 
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016MLconf
 
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017MLconf
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf
 

Viewers also liked (20)

Amy Langville, Professor of Mathematics, The College of Charleston in South C...
Amy Langville, Professor of Mathematics, The College of Charleston in South C...Amy Langville, Professor of Mathematics, The College of Charleston in South C...
Amy Langville, Professor of Mathematics, The College of Charleston in South C...
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
 
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
 
Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...
Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...
Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...
 
Text Categorization Using Improved K Nearest Neighbor Algorithm
Text Categorization Using Improved K Nearest Neighbor AlgorithmText Categorization Using Improved K Nearest Neighbor Algorithm
Text Categorization Using Improved K Nearest Neighbor Algorithm
 
Florian Tramèr, Researcher, EPFL at MLconf SEA - 5/20/16
Florian Tramèr, Researcher, EPFL at MLconf SEA - 5/20/16Florian Tramèr, Researcher, EPFL at MLconf SEA - 5/20/16
Florian Tramèr, Researcher, EPFL at MLconf SEA - 5/20/16
 
Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16
Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16
Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016
Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016
Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016
 
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
 
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
 
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Sam Steingold, Lead Data Scientist, Magnetic Media Online at MLconf SEA - 5/20/16

  • 1. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık MLconf Seattle 2016-05-20 Slide 1 of 48
  • 2. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Who We Are: A Fast Growing Startup One of the fastest growing tech startups – 50% YoY growth. Marketing Tech Platform: Email + Site + Ads + Mobile 700+ customers in Retail, Auto, Travel, CPG, Finance Four Engineering locations: NY, SF, London, Ann Arbor 275+ employees. We are hiring aggressively! Slide 2 of 48
  • 3. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Table of Contents Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Slide 3 of 48
  • 4. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Predictor A predictor is a black box which spits out an estimate of an unknown parameter. E.g.: Will it rain tomorrow? Will this person buy this product? Is this person a terrorist? Is this stock a good investment?
  • 5. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Examples Perfect – always right Mislabeled – always the opposite Random – independent of the actual San Diego Weather Forecast: Actual : 3 days of rain per 365 days Predict : sunshine always! Coin flip Actual : true half the time Predict : true if coin lands Head Slide 5 of 48
  • 6. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Why Evaluate Predictors? Which one is better? How much to pay for one? You can always flip the coin yourself, so the random predictor is the least valuable! When to use this one and not that one? Slide 6 of 48
  • 7. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Confusion Matrix + Cost Predictor Predicted Sun Rain Hurricane Actual Sun 100 10 1 Rain 5 20 6 Hurricane 0 3 2 Costs Predicted Sun Rain Hurricane Actual Sun 0 1 3 Rain 2 0 2 Hurricane 10 5 0 Total cost (i.e., predictor value) = 45 Slide 7 of 48
  • 8. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Confusion/Costs Matrix Probability/Costs Predicted Target Non-target Actual Bought 1% $1 9% $0 Did not buy 9% ($0.1) 81% $0 Profitable Expected value of one customer: $0.001 > 0. Worthless! The Predicted and Actual are independent!
  • 9. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Table of Contents Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Slide 9 of 48
  • 10. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion What if the Cost is Unknown? Total N Predicted Population True(PT) False(PF) Actual True(AT) TP FN(type2) False(AF) FP(type1) TN Perfect : FN = FP = 0 Mislabeled : TP = TN = 0 Random (Predicted & Actual are independent) : TP N = PT N × AT N FN N = PF N × AT N FP N = PT N × AF N TN N = PF N × AF N
  • 11. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Metrics Based on the Confusion Matrix 8 partial measures 1. Positive predictive value (PPV, Precision): TP PT 2. False discovery rate (FDR): FP PT 3. False omission rate (FOR): FN PF 4. Negative predictive value (NPV): TN PF 5. True positive rate (TPR, Sensitivity, Recall): TP AT 6. False positive rate (FPR, Fall-out): FP AF 7. False negative rate (FNR): FN AT 8. True negative rate (TNR, Specificity): TN AF Slide 11 of 48
  • 12. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Metrics Based on the Confusion Matrix 4 total measures 1. Accuracy: P(Actual = Predicted). 2. F1: the harmonic average of Precision and Recall 3. Matthew’s Correlation Coefficient (MCC): AKA Pearson correlation coefficient. 4. Proficiency: the proportion of the information contained in the Actual distribution which is captured by the Predictor.
  • 13. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Metric Requirements Meaning : the meaning of the metric should be transparent without resorting to averages of meaningful values Discrimination : Weak : its value is 1 for the perfect predictor (and only for it) Strong : additionally, its value is 0 for a worthless (random with any base rate) predictor (and only for such a predictor) Universality : the metric should be usable in any setting, whether binary or multi-class, classification (a unique class is assigned to each example) or categorization/community detection (an example can be placed into multiple categories or communities) Slide 13 of 48
  • 14. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Accuracy P(Actual = Predicted) = tp+tn N Perfect: 1. Mislabeled: 0. Sun Diego Weather Forecast: Accuracy = 362/365 = 99.2% The predictor is worthless! Does not detect random predictors. Slide 14 of 48
  • 15. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion F1-Score The harmonic mean of Precision and Recall: 2×tp 2×tp+fp+fn Perfect: 1 0 if either Precision or Recall is 0 Correctly handles SDWF (because Recall = 0)... ...But only if we label rain as True! Otherwise: Recall = 100%, Precision = 99.2%, F1 = 99.6% F1 is Asymmetric (Positive vs Negative) Why “1”?! Fβ = (1+β2 )×tp (1+β2)×tp+β2×fn+fp
  • 16. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Matthews correlation coefficient AKA Phi coefficient, Pearson correlation coefficient: tp × tn − fp × fn (tp + fp) × (tp + fn) × (fp + tn) × (fn + tn) Range: [−1; 1] Perfect: 1 Mislabeled: −1 Random: 0 Handles San Diego Weather Forecast Hard to generalize to non-binary classifiers. Slide 16 of 48
  • 17. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Uncertainty coefficient AKA Proficiency: α = I(Predicted;Actual) H(Actual) Range: [0; 1]. Measures the share of information (percentage of bits) contained in the Actual which is captured by the Predictor. 1 for both Perfect and Mislabeled predictors. 0 for random predictors. Handles San Diego Weather Forecast and all the possible quirks – the best. Easily generalizes to any number of categories. Slide 17 of 48
  • 18. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Comparison i=0 TP=20 FN=0 FP=0 TN=180 i=50 TP=15 FN=5 FP=45 TN=135 i=100 TP=10 FN=10 FP=90 TN=90 i=150 TP=5 FN=15 FP=135 TN=45 i=200 TP=0 FN=20 FP=180 TN=0 0 50 100 150 200 −100−50050100 Binary Predictor Metric Comparison (base rate=10%) i=0..200; confusion matrix: tp=0.1*(200−i), fn=0.1*i, fp=0.9*i, tn=0.9*(200−i) metric,% Accuracy F−score Pearson's phi Proficiency
  • 19. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion 2 Against 2 – take 1 A = tp = 2 fn = 3 fp = 0 tn = 45 ; B = tp = 5 fn = 0 fp = 7 tn = 38 A B Proficiency 30.96% 49.86% Pearson’s φ 61.24% 59.32% Accuracy 94.00% 86.00% F1-score 57.14% 58.82% Slide 19 of 48
  • 20. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion 2 Against 2 – take 2 A = tp = 3 fn = 2 fp = 2 tn = 43 ; B = tp = 5 fn = 0 fp = 7 tn = 38 A B Proficiency 28.96% 49.86% Pearson’s φ 55.56% 59.32% Accuracy 92.00% 86.00% F1-score 60.00% 58.82% Slide 20 of 48
  • 21. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Proficiency – The Odd One Out A = tp = 3 fn = 2 fp = 1 tn = 44 ; B = tp = 5 fn = 0 fp = 6 tn = 39 A B Proficiency 35.55% 53.37% Pearson’s φ 63.89% 62.76% Accuracy 94.00% 88.00% F1-score 66.67% 62.50% Slide 21 of 48
  • 22. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Accuracy – The Odd One Out A = tp = 1 fn = 4 fp = 0 tn = 45 ; B = tp = 5 fn = 0 fp = 13 tn = 32 A B Proficiency 14.77% 34.57% Pearson’s φ 42.86% 44.44% Accuracy 92.00% 74.00% F1-score 33.33% 43.48% Slide 22 of 48
  • 23. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion F1-score – The Odd One Out A = tp = 1 fn = 4 fp = 0 tn = 45 ; B = tp = 2 fn = 3 fp = 2 tn = 43 A B Proficiency 14.77% 14.71% Pearson’s φ 42.86% 39.32% Accuracy 92.00% 90.00% F1-score 33.33% 44.44% Slide 23 of 48
  • 24. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Predictor Re-Labeling For a predictor P, let 1 − P be the re-labeled predictor, i.e., when P predicts 1, 1 − P predicts 0 and vice versa. Then Accuracy(1 − P) = 1 − Accuracy(P) φ(1 − P) = −φ(P) α(1 − P) = α(P) No similar simple relationship exists for F1. Slide 24 of 48
  • 25. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Table of Contents Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Slide 25 of 48
  • 26. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Multi-Class Prediction Examples: Character recognition Mislabeling is bad Group detection Label rotation is fine Metrics: Accuracy = P(Actual = Predicted) No Recall, Precision, F1! Slide 26 of 48
  • 27. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Pearson’s φ Define φ2 = χ2 N = N i=1 N j=1 (Oij − Eij)2 Eij where Oij = P(Predicted = i & Actual = j) Eij = P(Predicted = i) × P(Actual = j) 0 for a worthless (independent) predictor Perfect predictor: depends on the data Slide 27 of 48
  • 28. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Proficiency Same as before! α = I(Predicted; Actual) H(Actual) H(A) = − N i=1 P(A = i) log2 P(A = i) I(P; A) = N i=1 N j=1 Oij log2 Oij Eij 0 for the worthless predictor 1 for the perfect (and mis-labeled!) predictor Slide 28 of 48
  • 29. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion φ vs α φ is to Chi-squared test same as α is to Likelihood-ratio test Neyman-Pearson lemma Likelihood-ratio test is the most powerful test. Slide 29 of 48
  • 30. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion This Metric is Old! Why is it Ignored? Tradition : My teacher used it Inertia : I used it previously Cost : Log is more computationally expensive than ratios Not anymore! Intuition : Information Theory is hard Intuition is learned: start Information Theory in High School! Mislabeled = Perfect : Can be confusing or outright undesirable Use the Hungarian algorithm Slide 30 of 48
  • 31. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Table of Contents Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Slide 31 of 48
  • 32. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Multi-Label Categorization Examples: Text Categorization Mislabeling is bad But may indicate problems with taxonomy Community Detection Label rotation is fine Metrics: No Accuracy: cannot handle partial matches Precision & Recall work again! Slide 32 of 48
  • 33. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Precision & Recall Recall = i #{objects correctly classified as ci } i #{objects actually in ci } = i #{oj | ci ∈ Actual(oj ) ∩ Predicted(oj )} i #{oj | ci ∈ Actual(oj )} = j #[Actual(oj ) ∩ Predicted(oj )] j #Actual(oj ) Precision = i #{objects correctly classified as ci } i #{objects classified as ci } = i #{oj | ci ∈ Actual(oj ) ∩ Predicted(oj )} i #{oj | ci ∈ Predicted(oj )} = j #[Actual(oj ) ∩ Predicted(oj )] j #Predicted(oj ) Slide 33 of 48
  • 34. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Precision & Recall – ?! The above is the “macro” Precision & Recall (and F1) Can also define “micro” Precision & Recall (and F1) There is some confusion as to which is which Side Note Single label per object =⇒ Precision = Recall = Accuracy = F1 Slide 34 of 48
  • 35. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Proficiency: Definition Introduce binary random variables: Aci := ci ∈ Actual Pci := ci ∈ Predicted Define: α = I( i Pci; i Aci) H( i Aci) Problem: cannot compute! KDD Cup 2005 Taxonomy: 67 categories Cartesian product: 267 > 1020 800k examples Slide 35 of 48
  • 36. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Proficiency: Estimate Numerator : Assume that Aci is independent of everything but Pci (similar to Na¨ıve Bayes). Denominator : Use H(A × B) ≥ H(A) + H(B) Define: α = i I(Pci; Aci) i H(Aci) = i H(Aci)α(Pci, Aci) i H(Aci) where α(Pci, Aci) = I(Pci; Aci) H(Aci) Slide 36 of 48
  • 37. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Proficiency: Permuted Recover re-labeling invariance Let M(c) be the optimal assignment with the cost matrix being the pairwise mutual informations. Define the Permuted Proficiency metric: α = i I(M(Pci); Aci) i H(Aci) = i H(Aci)α(M(Pci), Aci) i H(Aci) M is optimal =⇒ α ≤ α (equality iff the optimal assignment is the identity.)
  • 38. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Proficiency: Properties Meaning (An estimate of) the share of the information contained in the actual distribution recovered by the classifier. Strong Discrimination Yes! Universality The independence assumption above weakens the claim that the metric has the same meaning across all domains and data sets.
  • 39. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Example: KDD Cup 2005 800 queries 67 categories 3 human labelers Actual labeler 1 labeler 2 labeler 3 Predicted labeler 2 labeler 3 labeler 1 Precision 63.48% 36.50% 58.66% Recall 41.41% 58.62% 55.99% α 24.73% 28.06% 33.26% α 25.02% 28.62% 33.51% Reassigned 9 12 11
  • 40. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Each Human Against Dice Pit each of the three human labelers against the random labeler with the same category probability distribution: Labeler 1 Labeler 2 Labeler 3 F1 14.3% 7.7% 19.2% examples/category 3.7 ± 1.1 2.4 ± 0.9 3.8 ± 1.1 categories/example 44 ± 56 28 ± 31 48 ± 71 More examples/category and categories/example =⇒ higher F1! Slide 40 of 48
  • 41. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Example: Academic Setting Consider a typical University department: Every professor serves on 9 administrative committees out of 10 available. Worthless Predictor Assign each professor to 9 random committees. Performance Precision = Recall = F1 = 90% Proficiency: α = 0 Slide 41 of 48
  • 42. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Example: Brand Detection Laclavik, Steingold, Ciglan, WSDM 2016 10k queries, 1k brands 90% of queries have no brands, 9% - one brand Precision = 30%, Recall = 93%, F1 = 45%. Proficiency: α = 67%! Precision & Recall don’t care about no-brand queries! Add 10k queries without brands, Precision & Recall are the same, Proficiency: α = 69%. Replace the empty brands with the None brand. Original: Precision = 78%, Recall = 81%, F1 = 80%, α = 61%. Extra empties: Precision = 89%, Recall = 91%, F1 = 90%, α = 65%.
  • 43. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Numeric Stability Think of the data as an infinite stream of observations, and view the actually available data as a sample. How would the metrics change if the sample is different? All metrics have approximately the same variability (standard deviation): ≈ 1% for 800 observations of KDD Cup 2005 ≈ 0.5% for 10,000 observations in the Magnetic data set
  • 44. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Table of Contents Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Slide 44 of 48
  • 45. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Summary If you know the costs – use the expected value. If you know what you want (Recall/Precision &c) – use it. If you want a general metric, use Proficiency instead of F1. Slide 45 of 48
  • 46. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Implementation Python code in https://github.com/Magnetic/proficiency-metric Contributions of implementations in other languages are welcome! Slide 46 of 48
  • 47. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Table of Contents Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Slide 47 of 48
  • 48. Information Theoretic Metrics for Multi-Class Predictor Evaluation Sam Steingold, Michal Laclav´ık Introduction: predictors and their evaluation Binary Prediction Multi-Class Prediction Multi-Label Categorization Summary Conclusion Conclusion Questions? Slide 48 of 48