SlideShare a Scribd company logo
1 of 28
Download to read offline
Algorithm
evaluation using
Item Response
Theory
• Sevvandi Kandanaarachchi
RMIT University
• AustMS 2020
• December 11th 2020
• Joint work with
Prof Kate Smith-Miles
1
This Photo by Unknown Author is licensed under CC BY-ND
Overview
Algorithm
portfolio
evaluation
Introduction to
Item Response
Theory (IRT)
Mapping IRT to
algorithm
evaluation
New metrics
and
reinterpretation
Anomaly
detection algo.
portfolio
Diagnostics
Algorithm Portfolio Evaluation
• Results from many algorithms on many
problems
• How do we evaluate the portfolio of
algorithms?
• Statistical methods: Friedman test, post-
hoc tests -> Ranking of algorithms
• On average Ranking
• Individual characteristics buried under
average performance
3
Item Response Theory
• Latent trait models used in social
sciences/psychometrics
• Unobservable characteristics and observed
outcomes
• Verbal or mathematical ability
• Racial prejudice or stress proneness
• Political inclinations
• Intrinsic “quality” that cannot be
measured directly
This Photo by Unknown Author is licensed under CC BY-SA
IRT in
education
• 𝑁 Students (participants)
answer 𝑛 questions (test
item)
• Student ability (latent
trait continuum)
• Test item discrimination
• Test item difficulty
This Photo by Unknown Author is licensed under CC BY
5
Dichotomous IRT
• Multiple choice
• True or false
• 𝜙 𝑥𝑖𝑗 = 1 𝜃𝑖, 𝛼𝑗, 𝑑𝑗, 𝛾𝑗 = 𝛾𝑗 +
1 −𝛾 𝑗
1+exp(−𝛼 𝑗(𝜃 𝑖−𝑑 𝑗))
• 𝑥𝑖𝑗 - outcome/score of examinee 𝑖 for item 𝑗
• 𝜃𝑖 - examinee’s (𝑖) ability
• 𝛾𝑗 - guessing parameter for item 𝑗
• 𝑑𝑗 - difficulty parameter
• 𝛼𝑗 - discrimination
This Photo by Unknown Author is licensed under CC BY-NC
6
Polytomous IRT
• Letter grades
• Score out of 5
• Theta is the ability
• For each score there is a curve
• 𝑃(𝑥𝑖𝑗 = 𝑘|𝜃𝑖, 𝑑𝑗, 𝛼𝑗)
• For a given ability what's the score you’re most likely to get
7
Continuous IRT
• Grades out of 100
• A 2D surface of probabilities
• 𝑃(𝑧𝑖𝑗|𝜃𝑖, 𝑑𝑗, 𝛼𝑗)
8
Mapping algorithm evaluation to IRT
• Item characteristics
• Difficulty, discrimination
• Person characteristic
• Ability
• In traditional IRT
• examinees > > questions
IRT Model
Person-doing something
Test - inanimate
9
Mapping IRT to algorithm evaluation
(Standard)
• Dataset (item) characteristics
• Difficulty, discrimination
• Algorithm (person) characteristic
• Ability
• We are evaluating datasets more
than algorithms!
IRT Model
Algorithm-doing something
Dataset - inanimate
10
New Inverted Mapping
• Dataset (person) characteristic
• Person ability dataset easiness
• Algorithm (item) characteristics
• Item difficulty algo. easiness threshold
• Item discrimination algo stability, and
anomalousness
• Now we are evaluating algorithms more
than datasets.
IRT Model
Algorithm-doing something
Dataset - inanimate
11
What are these new parameters?
• IRT - 𝜃𝑖 - ability of examinee 𝑖
• 𝜃 increases probability of a
higher score increases
• What is 𝜃𝑖, in terms of a
dataset?
• 𝜃𝑖 - easiness of the dataset
12
What are these new parameters?
• IRT - 𝛼𝑗- discrimination of item 𝑗
• 𝛼𝑗increases → slope of curve
increases
• What is 𝛼𝑗, in terms of an
algorithm?
• 𝛼𝑗- lack of stability/robustness
of algo
• (1/|𝛼 𝑗|)- stability/robustness of
algo
13
Stable algorithms
• Education – such a question
doesn’t give any information
• Algorithms – these algorithms
are really stable
• Stability = 1/|𝛼𝑗|
14
Anomalous algorithms
• Algorithms that perform poorly
on easy datasets and well on
difficult datasets
• Negative discrimination
• In education – such items are
discarded or revised
• If an algorithm anomalous, it is
interesting
• Anomalousness = sign(𝛼𝑗)
This Photo by Unknown Author is licensed under CC BY-NC-ND
15
Fitting Continuous IRT models
• Continuous models
• Does not fit items (algorithms) with negative discrimination
• d
• 𝛼𝑗 - discrimination parameter, 𝛾𝑗 - scaling parameter (for this formulation). . .
Assumption 𝛼𝑗 > 0, 𝛾𝑗 > 0
• 𝐶𝑗 - Covariance term
• 𝑡 - the iteration
• Negative covariance stops convergence 16
Minimize
this
Variance term
Fitting continuous IRT models
17
• Probability of score, given the ability
• Works if both 𝛼𝑗 > 0, 𝛾𝑗 > 0 OR 𝛼𝑗 < 0 , 𝛾𝑗 < 0 → 𝑠𝑖𝑔𝑛 𝛼𝑗 = 𝑠𝑖𝑔𝑛 𝛾𝑗
• So modify the original assumption 𝛼𝑗 > 0, 𝛾𝑗 > 0 to 𝑠𝑖𝑔𝑛 𝛼𝑗 = 𝑠𝑖𝑔𝑛 𝛾𝑗
Anomaly detection (8 algos, 3142 datasets)
18
What about the latent trait?
(dataset easiness spectrum)
19
Dataset
easiness and
algorithm
performance
20
Dataset easiness and algorithm performance
21
Dataset easiness and algorithm performance
22
Latent trait occupancy!
How much latent trait do you occupy?
Diagnostics
23
How well does the IRT model fit?
• Difference 𝑦𝑖𝑗 = |𝑥𝑖𝑗 − ො𝑥𝑖𝑗|
• Cumulative distribution of these
differences
• 𝑃(𝑦𝑖𝑗 ≤ 𝑐) for different 𝑐
• Model goodness curve (MGC)
• Area under this curve (AUMGC)
• Higher AUMGC is better
• Same idea for polytomous and
continuous
24
Effectiveness of algorithms
• Effective algorithms give better
performances for most datasets
• 𝑃 𝑥𝑖𝑗 ≥ 𝑐 - Actual
• 𝑃 ො𝑥𝑖𝑗 ≥ 𝑐 - Predicted
• Area under these curves
• Area Under Actual Effectiveness
Curve (AUAEC)
• Area Under Predicted
Effectiveness Curve (AUPEC)
25
Actual and Predicted effectiveness
• We can plot
(AUAEC, AUPEC) as well.
26
Summary
• Evaluating a portfolio of algorithms
• Use Item Response Theory from Psychometrics
• Accommodating it to include negative discrimination
• Inverting the intuitive mapping -> elegant reinterpretation
• A richer understanding of algorithms
• Includes additional diagnostics to test the goodness of the IRT model
• R package airt (on CRAN)
• https://sevvandi.github.io/airt/
• Pre-print: http://bit.ly/algorithmirt
• Comprehensive Algorithm Portfolio Evaluation using Item Response Theory
• More applications included
27
28

More Related Content

What's hot

Kaggle Days Madrid - Alberto Danese
Kaggle Days Madrid - Alberto DaneseKaggle Days Madrid - Alberto Danese
Kaggle Days Madrid - Alberto DaneseAlberto Danese
 
Inconsistent Outliers
Inconsistent OutliersInconsistent Outliers
Inconsistent OutliersNeil Rubens
 
Explainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableAditya Bhattacharya
 
Instance Space Analysis for Search Based Software Engineering
Instance Space Analysis for Search Based Software EngineeringInstance Space Analysis for Search Based Software Engineering
Instance Space Analysis for Search Based Software EngineeringAldeida Aleti
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learningShishir Choudhary
 
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Abdel Salam Sayyad
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learningKnoldus Inc.
 
Practical Constraint Solving for Generating System Test Data
Practical Constraint Solving for Generating System Test DataPractical Constraint Solving for Generating System Test Data
Practical Constraint Solving for Generating System Test DataLionel Briand
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineeringalessio_ferrari
 
A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
A Top-N Recommender System Evaluation Protocol Inspired by Deployed SystemsA Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
A Top-N Recommender System Evaluation Protocol Inspired by Deployed SystemsAlan Said
 
Kaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityKaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityAlberto Danese
 
AIAA Future of Fluids 2018 Moser
AIAA Future of Fluids 2018 MoserAIAA Future of Fluids 2018 Moser
AIAA Future of Fluids 2018 MoserQiqi Wang
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnSarah Guido
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineeringalessio_ferrari
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationSara Hooker
 
Machine learning - session 2
Machine learning - session 2Machine learning - session 2
Machine learning - session 2Luis Borbon
 
Module 1.2 data preparation
Module 1.2  data preparationModule 1.2  data preparation
Module 1.2 data preparationSara Hooker
 

What's hot (20)

Kaggle Days Madrid - Alberto Danese
Kaggle Days Madrid - Alberto DaneseKaggle Days Madrid - Alberto Danese
Kaggle Days Madrid - Alberto Danese
 
Learning from data
Learning from dataLearning from data
Learning from data
 
Inconsistent Outliers
Inconsistent OutliersInconsistent Outliers
Inconsistent Outliers
 
Explainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretable
 
Instance Space Analysis for Search Based Software Engineering
Instance Space Analysis for Search Based Software EngineeringInstance Space Analysis for Search Based Software Engineering
Instance Space Analysis for Search Based Software Engineering
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
 
VST2022.pdf
VST2022.pdfVST2022.pdf
VST2022.pdf
 
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
 
Statistical learning intro
Statistical learning introStatistical learning intro
Statistical learning intro
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
 
Practical Constraint Solving for Generating System Test Data
Practical Constraint Solving for Generating System Test DataPractical Constraint Solving for Generating System Test Data
Practical Constraint Solving for Generating System Test Data
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineering
 
A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
A Top-N Recommender System Evaluation Protocol Inspired by Deployed SystemsA Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
A Top-N Recommender System Evaluation Protocol Inspired by Deployed Systems
 
Kaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML InterpretabilityKaggle Days Paris - Alberto Danese - ML Interpretability
Kaggle Days Paris - Alberto Danese - ML Interpretability
 
AIAA Future of Fluids 2018 Moser
AIAA Future of Fluids 2018 MoserAIAA Future of Fluids 2018 Moser
AIAA Future of Fluids 2018 Moser
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineering
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Machine learning - session 2
Machine learning - session 2Machine learning - session 2
Machine learning - session 2
 
Module 1.2 data preparation
Module 1.2  data preparationModule 1.2  data preparation
Module 1.2 data preparation
 

Similar to Algorithm Portfolio Evaluation using Item Response Theory

Algorithm evaluation using Item Response Theory
Algorithm evaluation using Item Response TheoryAlgorithm evaluation using Item Response Theory
Algorithm evaluation using Item Response TheoryCSIRO
 
Explainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptxExplainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptxCSIRO
 
Explainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in educationExplainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in educationCSIRO
 
Getting better at detecting anomalies by using ensembles
Getting better at detecting anomalies by using ensemblesGetting better at detecting anomalies by using ensembles
Getting better at detecting anomalies by using ensemblesCSIRO
 
Explainable insights on algorithm performance
Explainable insights on algorithm performanceExplainable insights on algorithm performance
Explainable insights on algorithm performanceCSIRO
 
Anomalies! You can't escape them.
Anomalies! You can't escape them.Anomalies! You can't escape them.
Anomalies! You can't escape them.CSIRO
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networksCSIRO
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerDatabricks
 
Anomalies and events keep us on our toes
Anomalies and events keep us on our toesAnomalies and events keep us on our toes
Anomalies and events keep us on our toesCSIRO
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptxDr.Shweta
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceDamianMingle
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptxShree Shree
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!Khalid Salama
 

Similar to Algorithm Portfolio Evaluation using Item Response Theory (20)

Algorithm evaluation using Item Response Theory
Algorithm evaluation using Item Response TheoryAlgorithm evaluation using Item Response Theory
Algorithm evaluation using Item Response Theory
 
Explainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptxExplainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptx
 
Explainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in educationExplainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in education
 
Getting better at detecting anomalies by using ensembles
Getting better at detecting anomalies by using ensemblesGetting better at detecting anomalies by using ensembles
Getting better at detecting anomalies by using ensembles
 
Explainable insights on algorithm performance
Explainable insights on algorithm performanceExplainable insights on algorithm performance
Explainable insights on algorithm performance
 
Anomalies! You can't escape them.
Anomalies! You can't escape them.Anomalies! You can't escape them.
Anomalies! You can't escape them.
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networks
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Anomalies and events keep us on our toes
Anomalies and events keep us on our toesAnomalies and events keep us on our toes
Anomalies and events keep us on our toes
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data Science
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Nbvtalkonfeatureselection
NbvtalkonfeatureselectionNbvtalkonfeatureselection
Nbvtalkonfeatureselection
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Seminar nov2017
Seminar nov2017Seminar nov2017
Seminar nov2017
 

More from CSIRO

The painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataThe painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataCSIRO
 
The painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataThe painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataCSIRO
 
Sophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationSophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationCSIRO
 
A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?CSIRO
 
Anomalous Networks
Anomalous NetworksAnomalous Networks
Anomalous NetworksCSIRO
 
Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonCSIRO
 
Comparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataComparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataCSIRO
 
Mathematics of anomalies
Mathematics of anomaliesMathematics of anomalies
Mathematics of anomaliesCSIRO
 
Here is the anomalow-down!
Here is the anomalow-down!Here is the anomalow-down!
Here is the anomalow-down!CSIRO
 
Looking out for anomalies
Looking out for anomaliesLooking out for anomalies
Looking out for anomaliesCSIRO
 

More from CSIRO (10)

The painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataThe painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral data
 
The painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataThe painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS data
 
Sophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationSophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data exploration
 
A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?
 
Anomalous Networks
Anomalous NetworksAnomalous Networks
Anomalous Networks
 
Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparison
 
Comparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataComparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial data
 
Mathematics of anomalies
Mathematics of anomaliesMathematics of anomalies
Mathematics of anomalies
 
Here is the anomalow-down!
Here is the anomalow-down!Here is the anomalow-down!
Here is the anomalow-down!
 
Looking out for anomalies
Looking out for anomaliesLooking out for anomalies
Looking out for anomalies
 

Recently uploaded

Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 

Recently uploaded (20)

Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 

Algorithm Portfolio Evaluation using Item Response Theory

  • 1. Algorithm evaluation using Item Response Theory • Sevvandi Kandanaarachchi RMIT University • AustMS 2020 • December 11th 2020 • Joint work with Prof Kate Smith-Miles 1 This Photo by Unknown Author is licensed under CC BY-ND
  • 2. Overview Algorithm portfolio evaluation Introduction to Item Response Theory (IRT) Mapping IRT to algorithm evaluation New metrics and reinterpretation Anomaly detection algo. portfolio Diagnostics
  • 3. Algorithm Portfolio Evaluation • Results from many algorithms on many problems • How do we evaluate the portfolio of algorithms? • Statistical methods: Friedman test, post- hoc tests -> Ranking of algorithms • On average Ranking • Individual characteristics buried under average performance 3
  • 4. Item Response Theory • Latent trait models used in social sciences/psychometrics • Unobservable characteristics and observed outcomes • Verbal or mathematical ability • Racial prejudice or stress proneness • Political inclinations • Intrinsic “quality” that cannot be measured directly This Photo by Unknown Author is licensed under CC BY-SA
  • 5. IRT in education • 𝑁 Students (participants) answer 𝑛 questions (test item) • Student ability (latent trait continuum) • Test item discrimination • Test item difficulty This Photo by Unknown Author is licensed under CC BY 5
  • 6. Dichotomous IRT • Multiple choice • True or false • 𝜙 𝑥𝑖𝑗 = 1 𝜃𝑖, 𝛼𝑗, 𝑑𝑗, 𝛾𝑗 = 𝛾𝑗 + 1 −𝛾 𝑗 1+exp(−𝛼 𝑗(𝜃 𝑖−𝑑 𝑗)) • 𝑥𝑖𝑗 - outcome/score of examinee 𝑖 for item 𝑗 • 𝜃𝑖 - examinee’s (𝑖) ability • 𝛾𝑗 - guessing parameter for item 𝑗 • 𝑑𝑗 - difficulty parameter • 𝛼𝑗 - discrimination This Photo by Unknown Author is licensed under CC BY-NC 6
  • 7. Polytomous IRT • Letter grades • Score out of 5 • Theta is the ability • For each score there is a curve • 𝑃(𝑥𝑖𝑗 = 𝑘|𝜃𝑖, 𝑑𝑗, 𝛼𝑗) • For a given ability what's the score you’re most likely to get 7
  • 8. Continuous IRT • Grades out of 100 • A 2D surface of probabilities • 𝑃(𝑧𝑖𝑗|𝜃𝑖, 𝑑𝑗, 𝛼𝑗) 8
  • 9. Mapping algorithm evaluation to IRT • Item characteristics • Difficulty, discrimination • Person characteristic • Ability • In traditional IRT • examinees > > questions IRT Model Person-doing something Test - inanimate 9
  • 10. Mapping IRT to algorithm evaluation (Standard) • Dataset (item) characteristics • Difficulty, discrimination • Algorithm (person) characteristic • Ability • We are evaluating datasets more than algorithms! IRT Model Algorithm-doing something Dataset - inanimate 10
  • 11. New Inverted Mapping • Dataset (person) characteristic • Person ability dataset easiness • Algorithm (item) characteristics • Item difficulty algo. easiness threshold • Item discrimination algo stability, and anomalousness • Now we are evaluating algorithms more than datasets. IRT Model Algorithm-doing something Dataset - inanimate 11
  • 12. What are these new parameters? • IRT - 𝜃𝑖 - ability of examinee 𝑖 • 𝜃 increases probability of a higher score increases • What is 𝜃𝑖, in terms of a dataset? • 𝜃𝑖 - easiness of the dataset 12
  • 13. What are these new parameters? • IRT - 𝛼𝑗- discrimination of item 𝑗 • 𝛼𝑗increases → slope of curve increases • What is 𝛼𝑗, in terms of an algorithm? • 𝛼𝑗- lack of stability/robustness of algo • (1/|𝛼 𝑗|)- stability/robustness of algo 13
  • 14. Stable algorithms • Education – such a question doesn’t give any information • Algorithms – these algorithms are really stable • Stability = 1/|𝛼𝑗| 14
  • 15. Anomalous algorithms • Algorithms that perform poorly on easy datasets and well on difficult datasets • Negative discrimination • In education – such items are discarded or revised • If an algorithm anomalous, it is interesting • Anomalousness = sign(𝛼𝑗) This Photo by Unknown Author is licensed under CC BY-NC-ND 15
  • 16. Fitting Continuous IRT models • Continuous models • Does not fit items (algorithms) with negative discrimination • d • 𝛼𝑗 - discrimination parameter, 𝛾𝑗 - scaling parameter (for this formulation). . . Assumption 𝛼𝑗 > 0, 𝛾𝑗 > 0 • 𝐶𝑗 - Covariance term • 𝑡 - the iteration • Negative covariance stops convergence 16 Minimize this Variance term
  • 17. Fitting continuous IRT models 17 • Probability of score, given the ability • Works if both 𝛼𝑗 > 0, 𝛾𝑗 > 0 OR 𝛼𝑗 < 0 , 𝛾𝑗 < 0 → 𝑠𝑖𝑔𝑛 𝛼𝑗 = 𝑠𝑖𝑔𝑛 𝛾𝑗 • So modify the original assumption 𝛼𝑗 > 0, 𝛾𝑗 > 0 to 𝑠𝑖𝑔𝑛 𝛼𝑗 = 𝑠𝑖𝑔𝑛 𝛾𝑗
  • 18. Anomaly detection (8 algos, 3142 datasets) 18
  • 19. What about the latent trait? (dataset easiness spectrum) 19
  • 21. Dataset easiness and algorithm performance 21
  • 22. Dataset easiness and algorithm performance 22 Latent trait occupancy! How much latent trait do you occupy?
  • 24. How well does the IRT model fit? • Difference 𝑦𝑖𝑗 = |𝑥𝑖𝑗 − ො𝑥𝑖𝑗| • Cumulative distribution of these differences • 𝑃(𝑦𝑖𝑗 ≤ 𝑐) for different 𝑐 • Model goodness curve (MGC) • Area under this curve (AUMGC) • Higher AUMGC is better • Same idea for polytomous and continuous 24
  • 25. Effectiveness of algorithms • Effective algorithms give better performances for most datasets • 𝑃 𝑥𝑖𝑗 ≥ 𝑐 - Actual • 𝑃 ො𝑥𝑖𝑗 ≥ 𝑐 - Predicted • Area under these curves • Area Under Actual Effectiveness Curve (AUAEC) • Area Under Predicted Effectiveness Curve (AUPEC) 25
  • 26. Actual and Predicted effectiveness • We can plot (AUAEC, AUPEC) as well. 26
  • 27. Summary • Evaluating a portfolio of algorithms • Use Item Response Theory from Psychometrics • Accommodating it to include negative discrimination • Inverting the intuitive mapping -> elegant reinterpretation • A richer understanding of algorithms • Includes additional diagnostics to test the goodness of the IRT model • R package airt (on CRAN) • https://sevvandi.github.io/airt/ • Pre-print: http://bit.ly/algorithmirt • Comprehensive Algorithm Portfolio Evaluation using Item Response Theory • More applications included 27
  • 28. 28