SlideShare a Scribd company logo
1 of 45
Download to read offline
The Relationship Between
Precision-Recall and ROC Curves
BY. F.NOORBEHBAHANI
FALL 2013
Abstract
• Receiver Operator Characteristic (ROC) curves are commonly used to
present results for binary decision problems in machine learning.
• However, when dealing with highly skewed datasets, Precision-Recall
(PR) curves give a more informative picture of an algorithm's
performance.
• We show that a deep connection exists between ROC space and PR
space, such that a curve dominates in ROC space if and only if it
dominates in PR space.
11/22/2013By F.noorbehbahani 2
• Provost et al. (1998) have argued that simply using accuracy results can
be misleading.
• They recommended when evaluating binary decision problems to use
Receiver Operator Characteristic (ROC) curves, which show how the
number of correctly classified positive examples varies with the
number of incorrectly classified negative examples.
• However, ROC curves can present an overly optimistic view of an
algorithm's performance if there is a large skew.
• Drummond and Holte (2000; 2004) have recommended using cost
curves to address this issue. Cost curves are an excellent alternative to
ROC curves, but discussing them is beyond the scope of this paper.
11/22/2013By F.noorbehbahani 3
• Precision-Recall (PR) curves, often used in Information Retrieval
(Manning & Schutze, 1999; Raghavan et al., 1989), have been cited as
an alternative to ROC curves for tasks with a large skew in the class
distribution.
• An important difference between ROC space and PR space is the visual
representation of the curves.
• Looking at PR curves can expose differences between algorithms that
are not apparent in ROC space.
11/22/2013By F.noorbehbahani 4
11/22/2013By F.noorbehbahani 5
• The performances of the algorithms appear to be comparable in ROC
space, however, in PR space we can see that Algorithm 2 has a clear
advantage over Algorithm 1.
• This difference exists because in this domain the number of negative
examples greatly exceeds the number of positives examples.
Consequently, a large change in the number of false positives can lead
to a small change in the false positive rate used in ROC analysis.
• Precision, on the other hand, by comparing false positives to true
positives rather than true negatives, captures the effect of the large
number of negative examples on the algorithm's performance.
11/22/2013By F.noorbehbahani 6
Review of ROC and Precision-Recall
• In a binary decision problem, a classier labels examples as either
positive or negative. The decision made by the classier can be
represented in a structure known as a confusion matrix or contingency
table.
• The confusion matrix has four categories: True positives (TP) are
examples correctly labeled as positives.
• False positives (FP) refer to negative examples incorrectly labeled as
positive. True negatives (TN) correspond to negatives correctly labeled
as negative.
• Finally, false negatives (FN) refer to positive examples incorrectly
labeled as negative.
11/22/2013By F.noorbehbahani 7
11/22/2013By F.noorbehbahani 8
11/22/2013By F.noorbehbahani 9
The Inaccuracy of Accuracy
• A tacit assumption in the use of classification accuracy :
• class distribution among examples
• is constant
• and relatively balanced
• As the class distribution becomes more skewed evaluation based on
accuracy breaks down
11/22/2013By F.noorbehbahani 10
11/22/2013By F.noorbehbahani 11
• Any classier that appears in the lower right triangle performs worse
than random guessing. This triangle is therefore usually empty in ROC
graphs.
• Given an ROC graph in which a classifier's performance appears to be
slightly better than random, it is natural to ask “is this classifier's
performance truly significant or is it only better than random by
chance?"
• There is no conclusive test for this, but Forman (2002) has shown a
methodology that addresses this question with ROC curves.
11/22/2013By F.noorbehbahani 12
Random Performance
11/22/2013By F.noorbehbahani 13
Curves in ROC space
• Many classifiers, such as decision trees or rule sets, are designed to
produce only a class decision, i.e., a Y or N on each instance.
• When such a discrete classier is applied to a test set, it yields a single
confusion matrix, which in turn corresponds to one ROC point.
• Thus, a discrete classier produces only a single point in ROC space.
11/22/2013By F.noorbehbahani 14
Curves in ROC space
• Some classifiers, such as a Naive Bayes classier or a neural network,
naturally yield an instance probability or score, a numeric value that
represents the degree to which an instance is a member of a class.
• These values can be strict probabilities, in which case they adhere to
standard theorems of probability; or they can be general, uncalibrated
scores, in which case the only property that holds is that a higher score
indicates a higher probability.
• We shall call both a probabilistic classier, in spite of the fact that the
output may not be a proper probability.
11/22/2013By F.noorbehbahani 15
• Such a ranking or scoring classier can be used with a threshold to
produce a discrete (binary) classier: if the classier output is above the
threshold, the classier produces a Y, else a N.
• Each threshold value produces a different point in ROC space.
• Computationally, this is a poor way of generating an ROC curve, and
the next section describes a more efficient and careful method.
11/22/2013By F.noorbehbahani 16
11/22/2013By F.noorbehbahani 17
11/22/2013By F.noorbehbahani 18
Class skew
• ROC curves have an attractive property: they are insensitive to changes
in class distribution. If the proportion of positive to negative instances
changes in a test set, the ROC curves will not change.
11/22/2013By F.noorbehbahani 19
11/22/2013By F.noorbehbahani 20
Creating scoring classifiers
• Many classier models are discrete: they are designed to produce only a
class label from each test instance.
• However, we often want to generate a full ROC curve from a classier
instead of just a single point. To this end we want to generate scores
from a classier rather than just a class label. There are several ways of
producing such scores.
• Many discrete classier models may easily be converted to scoring
classifiers by looking inside" them at the instance statistics they keep.
• For example, a decision tree determines a class label of a leaf node from
the proportion of instances at the node; the class decision is simply the
most prevalent class. These class proportions may serve as a score
11/22/2013By F.noorbehbahani 21
Creating scoring classifiers
• A rule learner keeps similar statistics on rule confidence, and the
confidence of a rule matching an instance can be used as a score
• Even if a classier only produces a class label, an aggregation of them
may be used to generate a score. MetaCost (Domingos, 1999) employs
bagging to generate an ensemble of discrete classifiers, each of which
produces a vote. The set of votes could be used to generate a score
• Finally, some combination of scoring and voting can be employed. For
example, rules can provide basic probability estimates, which may then
be used in weighted voting.
11/22/2013By F.noorbehbahani 22
Area under an ROC Curve (AUC)
• An ROC curve is a two-dimensional depiction of classier performance.
To compare classifiers we may want to reduce ROC performance to a
single scalar value representing expected performance.
• A common method is to calculate the area under the ROC curve,
abbreviated AUC .
11/22/2013By F.noorbehbahani 23
11/22/2013By F.noorbehbahani 24
Averaging ROC curves
• two methods for averaging ROC curves:
• vertical
• threshold averaging.
11/22/2013By F.noorbehbahani 25
11/22/2013By F.noorbehbahani 26
11/22/2013By F.noorbehbahani 27
Vertical averaging
• Vertical averaging takes vertical samples of the ROC curves for fixed FP
rates and averages the corresponding TP rates.
• Such averaging is appropriate when the FP rate can indeed be fixed by
the researcher, or when a single-dimensional measure of variation is
desired.
11/22/2013By F.noorbehbahani 28
Threshold averaging
• Instead of sampling points based on their positions in ROC space, as
vertical averaging does, it samples based on the thresholds that
produced these points.
• The method must generate a set of thresholds to sample, then for each
threshold it finds the corresponding point of each ROC curve and
averages them.
11/22/2013By F.noorbehbahani 29
Multi-class ROC graphs
11/22/2013By F.noorbehbahani 30
Multi-class AUC
11/22/2013By F.noorbehbahani 31
Relationship between ROC Space
and PR Space
11/22/2013By F.noorbehbahani 32
11/22/2013By F.noorbehbahani 33
• One important definition we need for our next theorem is the notion
that one curve dominates another curve, meaning that all
other...curves are beneath it or equal to it.
11/22/2013By F.noorbehbahani 34
Costs and Classification Accuracy
11/22/2013By F.noorbehbahani 35
11/22/2013By F.noorbehbahani 36
11/22/2013By F.noorbehbahani 37
11/22/2013By F.noorbehbahani 38
11/22/2013By F.noorbehbahani 39
Cost-sensitive Classifier Evaluation using Cost
Curves
• Methods for evaluating the performance of classifiers fall into two broad
categories:
• Numerical : produce a single number summarizing a classifier's performance
• accuracy, expected cost, precision, recall, and area under a performance curve (AUC).
• recall, and area under a performance curve (AUC)
• Graphical: depict performance in a plot that typically has just two or three dimensions
so that it can be easily inspected by humans
• ROC curves, precision-recall curves, DET curves , regret graphs , loss difference plots , skill
plots , prevalence-value-accuracy plots, and the method presented in this talk, cost curves .
• Graphical methods are especially useful when there is uncertainty about the
misclassification costs or the class distribution that will occur when the
classier is deployed.
11/22/2013By F.noorbehbahani 40
• In this setting, graphical measures can present a classifier's actual
performance for a wide variety of different operating points
(combinations of costs and class distributions),
• whereas the best a numerical measure can do is to represent the
average performance across a set of operating points.
• Cost curves : perhaps the ideal graphical method in this setting
because they directly show performance as a function of the
misclassification costs and class distribution.
11/22/2013By F.noorbehbahani 41
11/22/2013By F.noorbehbahani 42
11/22/2013By F.noorbehbahani 43
cost curves have the following advantages over
ROC curves
• Cost curves directly show performance on their y-axis, whereas ROC
curves do not explicitly depict performance. This means performance
and performance differences can be easily seen in cost curves but not in
ROC curves.
• When applied to a set of cost curves the natural way of averaging two-
dimensional curves produces a cost curve that represents the average of
the performances represented by the given curves. By contrast, there is
no agreed upon way to average ROC curves, and none of the proposed
averaging methods produces an ROC curve representing average
performance.
11/22/2013By F.noorbehbahani 44
11/22/2013
Thanks for your attention
Active Learning (F.Noorbehbahani) 45 /64

More Related Content

Similar to Noorbehbahani classification evaluation measure

AIAA-MAO-DSUS-2012
AIAA-MAO-DSUS-2012AIAA-MAO-DSUS-2012
AIAA-MAO-DSUS-2012OptiModel
 
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...ssuser4b1f48
 
A new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid dataA new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid dataMark Heckmann
 
Ekreg ho-11-spatial ec 231112
Ekreg ho-11-spatial ec 231112Ekreg ho-11-spatial ec 231112
Ekreg ho-11-spatial ec 231112Catur Purnomo
 
Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection
Combining Lazy Learning, Racing and Subsampling for Effective Feature SelectionCombining Lazy Learning, Racing and Subsampling for Effective Feature Selection
Combining Lazy Learning, Racing and Subsampling for Effective Feature SelectionGianluca Bontempi
 
BWM: Best Worst Method
BWM: Best Worst MethodBWM: Best Worst Method
BWM: Best Worst MethodJafar Rezaei
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringAllen Wu
 
Bidanset Lombard JPTAA
Bidanset Lombard JPTAABidanset Lombard JPTAA
Bidanset Lombard JPTAAPaul Bidanset
 
27.docking protein-protein and protein-ligand
27.docking protein-protein and protein-ligand27.docking protein-protein and protein-ligand
27.docking protein-protein and protein-ligandAbhijeet Kadam
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpomosi2005
 
Simple Regression Analysis ch12.pptx
Simple Regression Analysis ch12.pptxSimple Regression Analysis ch12.pptx
Simple Regression Analysis ch12.pptxSoumyaBansal7
 
High-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECLHigh-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECLHPCC Systems
 
Multivariate gaussian distribution
Multivariate gaussian distribution Multivariate gaussian distribution
Multivariate gaussian distribution IsabelBailey3
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
 
ABDO_MLROM_PHYSOR2016
ABDO_MLROM_PHYSOR2016ABDO_MLROM_PHYSOR2016
ABDO_MLROM_PHYSOR2016Mohammad Abdo
 
Rank Monotonicity in Centrality Measures (A report about Quality guarantees f...
Rank Monotonicity in Centrality Measures (A report about Quality guarantees f...Rank Monotonicity in Centrality Measures (A report about Quality guarantees f...
Rank Monotonicity in Centrality Measures (A report about Quality guarantees f...Mahdi Cherif
 
Lessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleLessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleDomonkos Tikk
 

Similar to Noorbehbahani classification evaluation measure (20)

AIAA-MAO-DSUS-2012
AIAA-MAO-DSUS-2012AIAA-MAO-DSUS-2012
AIAA-MAO-DSUS-2012
 
AI_Session 19 Backtracking CSP.pptx
AI_Session 19 Backtracking CSP.pptxAI_Session 19 Backtracking CSP.pptx
AI_Session 19 Backtracking CSP.pptx
 
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
 
A new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid dataA new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid data
 
Ekreg ho-11-spatial ec 231112
Ekreg ho-11-spatial ec 231112Ekreg ho-11-spatial ec 231112
Ekreg ho-11-spatial ec 231112
 
Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection
Combining Lazy Learning, Racing and Subsampling for Effective Feature SelectionCombining Lazy Learning, Racing and Subsampling for Effective Feature Selection
Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection
 
BWM: Best Worst Method
BWM: Best Worst MethodBWM: Best Worst Method
BWM: Best Worst Method
 
Graph cluster randomization
Graph cluster randomizationGraph cluster randomization
Graph cluster randomization
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clustering
 
Bidanset Lombard JPTAA
Bidanset Lombard JPTAABidanset Lombard JPTAA
Bidanset Lombard JPTAA
 
27.docking protein-protein and protein-ligand
27.docking protein-protein and protein-ligand27.docking protein-protein and protein-ligand
27.docking protein-protein and protein-ligand
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpo
 
Simple Regression Analysis ch12.pptx
Simple Regression Analysis ch12.pptxSimple Regression Analysis ch12.pptx
Simple Regression Analysis ch12.pptx
 
High-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECLHigh-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECL
 
Multivariate gaussian distribution
Multivariate gaussian distribution Multivariate gaussian distribution
Multivariate gaussian distribution
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
dock.ppt
dock.pptdock.ppt
dock.ppt
 
ABDO_MLROM_PHYSOR2016
ABDO_MLROM_PHYSOR2016ABDO_MLROM_PHYSOR2016
ABDO_MLROM_PHYSOR2016
 
Rank Monotonicity in Centrality Measures (A report about Quality guarantees f...
Rank Monotonicity in Centrality Measures (A report about Quality guarantees f...Rank Monotonicity in Centrality Measures (A report about Quality guarantees f...
Rank Monotonicity in Centrality Measures (A report about Quality guarantees f...
 
Lessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleLessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scale
 

Recently uploaded

Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 

Recently uploaded (20)

Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 

Noorbehbahani classification evaluation measure

  • 1. The Relationship Between Precision-Recall and ROC Curves BY. F.NOORBEHBAHANI FALL 2013
  • 2. Abstract • Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. • However, when dealing with highly skewed datasets, Precision-Recall (PR) curves give a more informative picture of an algorithm's performance. • We show that a deep connection exists between ROC space and PR space, such that a curve dominates in ROC space if and only if it dominates in PR space. 11/22/2013By F.noorbehbahani 2
  • 3. • Provost et al. (1998) have argued that simply using accuracy results can be misleading. • They recommended when evaluating binary decision problems to use Receiver Operator Characteristic (ROC) curves, which show how the number of correctly classified positive examples varies with the number of incorrectly classified negative examples. • However, ROC curves can present an overly optimistic view of an algorithm's performance if there is a large skew. • Drummond and Holte (2000; 2004) have recommended using cost curves to address this issue. Cost curves are an excellent alternative to ROC curves, but discussing them is beyond the scope of this paper. 11/22/2013By F.noorbehbahani 3
  • 4. • Precision-Recall (PR) curves, often used in Information Retrieval (Manning & Schutze, 1999; Raghavan et al., 1989), have been cited as an alternative to ROC curves for tasks with a large skew in the class distribution. • An important difference between ROC space and PR space is the visual representation of the curves. • Looking at PR curves can expose differences between algorithms that are not apparent in ROC space. 11/22/2013By F.noorbehbahani 4
  • 6. • The performances of the algorithms appear to be comparable in ROC space, however, in PR space we can see that Algorithm 2 has a clear advantage over Algorithm 1. • This difference exists because in this domain the number of negative examples greatly exceeds the number of positives examples. Consequently, a large change in the number of false positives can lead to a small change in the false positive rate used in ROC analysis. • Precision, on the other hand, by comparing false positives to true positives rather than true negatives, captures the effect of the large number of negative examples on the algorithm's performance. 11/22/2013By F.noorbehbahani 6
  • 7. Review of ROC and Precision-Recall • In a binary decision problem, a classier labels examples as either positive or negative. The decision made by the classier can be represented in a structure known as a confusion matrix or contingency table. • The confusion matrix has four categories: True positives (TP) are examples correctly labeled as positives. • False positives (FP) refer to negative examples incorrectly labeled as positive. True negatives (TN) correspond to negatives correctly labeled as negative. • Finally, false negatives (FN) refer to positive examples incorrectly labeled as negative. 11/22/2013By F.noorbehbahani 7
  • 10. The Inaccuracy of Accuracy • A tacit assumption in the use of classification accuracy : • class distribution among examples • is constant • and relatively balanced • As the class distribution becomes more skewed evaluation based on accuracy breaks down 11/22/2013By F.noorbehbahani 10
  • 12. • Any classier that appears in the lower right triangle performs worse than random guessing. This triangle is therefore usually empty in ROC graphs. • Given an ROC graph in which a classifier's performance appears to be slightly better than random, it is natural to ask “is this classifier's performance truly significant or is it only better than random by chance?" • There is no conclusive test for this, but Forman (2002) has shown a methodology that addresses this question with ROC curves. 11/22/2013By F.noorbehbahani 12
  • 14. Curves in ROC space • Many classifiers, such as decision trees or rule sets, are designed to produce only a class decision, i.e., a Y or N on each instance. • When such a discrete classier is applied to a test set, it yields a single confusion matrix, which in turn corresponds to one ROC point. • Thus, a discrete classier produces only a single point in ROC space. 11/22/2013By F.noorbehbahani 14
  • 15. Curves in ROC space • Some classifiers, such as a Naive Bayes classier or a neural network, naturally yield an instance probability or score, a numeric value that represents the degree to which an instance is a member of a class. • These values can be strict probabilities, in which case they adhere to standard theorems of probability; or they can be general, uncalibrated scores, in which case the only property that holds is that a higher score indicates a higher probability. • We shall call both a probabilistic classier, in spite of the fact that the output may not be a proper probability. 11/22/2013By F.noorbehbahani 15
  • 16. • Such a ranking or scoring classier can be used with a threshold to produce a discrete (binary) classier: if the classier output is above the threshold, the classier produces a Y, else a N. • Each threshold value produces a different point in ROC space. • Computationally, this is a poor way of generating an ROC curve, and the next section describes a more efficient and careful method. 11/22/2013By F.noorbehbahani 16
  • 19. Class skew • ROC curves have an attractive property: they are insensitive to changes in class distribution. If the proportion of positive to negative instances changes in a test set, the ROC curves will not change. 11/22/2013By F.noorbehbahani 19
  • 21. Creating scoring classifiers • Many classier models are discrete: they are designed to produce only a class label from each test instance. • However, we often want to generate a full ROC curve from a classier instead of just a single point. To this end we want to generate scores from a classier rather than just a class label. There are several ways of producing such scores. • Many discrete classier models may easily be converted to scoring classifiers by looking inside" them at the instance statistics they keep. • For example, a decision tree determines a class label of a leaf node from the proportion of instances at the node; the class decision is simply the most prevalent class. These class proportions may serve as a score 11/22/2013By F.noorbehbahani 21
  • 22. Creating scoring classifiers • A rule learner keeps similar statistics on rule confidence, and the confidence of a rule matching an instance can be used as a score • Even if a classier only produces a class label, an aggregation of them may be used to generate a score. MetaCost (Domingos, 1999) employs bagging to generate an ensemble of discrete classifiers, each of which produces a vote. The set of votes could be used to generate a score • Finally, some combination of scoring and voting can be employed. For example, rules can provide basic probability estimates, which may then be used in weighted voting. 11/22/2013By F.noorbehbahani 22
  • 23. Area under an ROC Curve (AUC) • An ROC curve is a two-dimensional depiction of classier performance. To compare classifiers we may want to reduce ROC performance to a single scalar value representing expected performance. • A common method is to calculate the area under the ROC curve, abbreviated AUC . 11/22/2013By F.noorbehbahani 23
  • 25. Averaging ROC curves • two methods for averaging ROC curves: • vertical • threshold averaging. 11/22/2013By F.noorbehbahani 25
  • 28. Vertical averaging • Vertical averaging takes vertical samples of the ROC curves for fixed FP rates and averages the corresponding TP rates. • Such averaging is appropriate when the FP rate can indeed be fixed by the researcher, or when a single-dimensional measure of variation is desired. 11/22/2013By F.noorbehbahani 28
  • 29. Threshold averaging • Instead of sampling points based on their positions in ROC space, as vertical averaging does, it samples based on the thresholds that produced these points. • The method must generate a set of thresholds to sample, then for each threshold it finds the corresponding point of each ROC curve and averages them. 11/22/2013By F.noorbehbahani 29
  • 32. Relationship between ROC Space and PR Space 11/22/2013By F.noorbehbahani 32
  • 34. • One important definition we need for our next theorem is the notion that one curve dominates another curve, meaning that all other...curves are beneath it or equal to it. 11/22/2013By F.noorbehbahani 34
  • 35. Costs and Classification Accuracy 11/22/2013By F.noorbehbahani 35
  • 40. Cost-sensitive Classifier Evaluation using Cost Curves • Methods for evaluating the performance of classifiers fall into two broad categories: • Numerical : produce a single number summarizing a classifier's performance • accuracy, expected cost, precision, recall, and area under a performance curve (AUC). • recall, and area under a performance curve (AUC) • Graphical: depict performance in a plot that typically has just two or three dimensions so that it can be easily inspected by humans • ROC curves, precision-recall curves, DET curves , regret graphs , loss difference plots , skill plots , prevalence-value-accuracy plots, and the method presented in this talk, cost curves . • Graphical methods are especially useful when there is uncertainty about the misclassification costs or the class distribution that will occur when the classier is deployed. 11/22/2013By F.noorbehbahani 40
  • 41. • In this setting, graphical measures can present a classifier's actual performance for a wide variety of different operating points (combinations of costs and class distributions), • whereas the best a numerical measure can do is to represent the average performance across a set of operating points. • Cost curves : perhaps the ideal graphical method in this setting because they directly show performance as a function of the misclassification costs and class distribution. 11/22/2013By F.noorbehbahani 41
  • 44. cost curves have the following advantages over ROC curves • Cost curves directly show performance on their y-axis, whereas ROC curves do not explicitly depict performance. This means performance and performance differences can be easily seen in cost curves but not in ROC curves. • When applied to a set of cost curves the natural way of averaging two- dimensional curves produces a cost curve that represents the average of the performances represented by the given curves. By contrast, there is no agreed upon way to average ROC curves, and none of the proposed averaging methods produces an ROC curve representing average performance. 11/22/2013By F.noorbehbahani 44
  • 45. 11/22/2013 Thanks for your attention Active Learning (F.Noorbehbahani) 45 /64