SlideShare a Scribd company logo
1 of 39
Download to read offline
Banulescu-Radu (LEO) WiMLDS 13/04/2021 1 / 39
Data Science for Financial Fraud Detection
Denisa BANULESCU-RADU
University of Orléans, LEO
WiMLDS 13th of April 2021
Banulescu-Radu (LEO) WiMLDS 13/04/2021 2 / 39
Background
• Since 2015: Associate Professor – University of Orléans, LEO
• 2016: Young Researcher Award in Economics – Autorité des Marchés
Financiers
• 2015: Thesis Prize – Fondation Banque de France
• 2014-2015: Max Weber Postdoctoral Fellow – European University Institute
• 2011-2014: PhD in Economics – Maastricht University and University of
Orléans
Title dissertation: "Four essays in financial econometrics"
Banulescu-Radu (LEO) WiMLDS 13/04/2021 3 / 39
Main research interests
Banulescu-Radu (LEO) WiMLDS 13/04/2021 4 / 39
Outline
1 Econometrics vs Machine Learning
2 General aspects of fraud
3 Main challenges and solutions
4 Case studies
4.1 Case 1: Insurance fraud detection
4.2 Case 2: Social fraud detection
5 Conclusion
Banulescu-Radu (LEO) WiMLDS 13/04/2021 5 / 39
Econometrics vs Machine Learning
Outline
1 Econometrics vs Machine Learning
2 General aspects of fraud
3 Main challenges and solutions
4 Case studies
4.1 Case 1: Insurance fraud detection
4.2 Case 2: Social fraud detection
5 Conclusion
Banulescu-Radu (LEO) WiMLDS 13/04/2021 6 / 39
Econometrics vs Machine Learning
Econometrics vs Machine Learning
Banulescu-Radu (LEO) WiMLDS 13/04/2021 7 / 39
Econometrics vs Machine Learning
Econometrics vs Machine Learning
Banulescu-Radu (LEO) WiMLDS 13/04/2021 8 / 39
Econometrics vs Machine Learning
“there are a number of areas where there would be opportunities
for fruitful collaboration between econometrics and machine
learning ”
Hal Varian (2014) - Professor of Economics (University of Michigan) & Chief Economist
(Google)
Banulescu-Radu (LEO) WiMLDS 13/04/2021 9 / 39
General aspects of fraud
Outline
1 Econometrics vs Machine Learning
2 General aspects of fraud
3 Main challenges and solutions
4 Case studies
4.1 Case 1: Insurance fraud detection
4.2 Case 2: Social fraud detection
5 Conclusion
Banulescu-Radu (LEO) WiMLDS 13/04/2021 10 / 39
General aspects of fraud
Fraud detection - Why is it important?
Banulescu-Radu (LEO) WiMLDS 13/04/2021 11 / 39
General aspects of fraud
Definition of fraud
Definition
• Baesens et al. (2015)
Fraud is an uncommon, well-considered, imperceptibly
concealed, time-evolving, and often carefully organized crime
which appears in many types of forms.
Banulescu-Radu (LEO) WiMLDS 13/04/2021 12 / 39
General aspects of fraud
Typologies of fraud
Banulescu-Radu (LEO) WiMLDS 13/04/2021 13 / 39
Main challenges and solutions
Outline
1 Econometrics vs Machine Learning
2 General aspects of fraud
3 Main challenges and solutions
4 Case studies
4.1 Case 1: Insurance fraud detection
4.2 Case 2: Social fraud detection
5 Conclusion
Banulescu-Radu (LEO) WiMLDS 13/04/2021 14 / 39
Main challenges and solutions
Main CHALLENGES and solutions
Banulescu-Radu (LEO) WiMLDS 13/04/2021 15 / 39
Main challenges and solutions
Main CHALLENGES and solutions
Banulescu-Radu (LEO) WiMLDS 13/04/2021 16 / 39
Main challenges and solutions
Main CHALLENGES and solutions
Banulescu-Radu (LEO) WiMLDS 13/04/2021 17 / 39
Main challenges and solutions
Main CHALLENGES and solutions
Banulescu-Radu (LEO) WiMLDS 13/04/2021 18 / 39
Main challenges and solutions
Main challenges and SOLUTIONS
1. Main tools used to fight fraud
Banulescu-Radu (LEO) WiMLDS 13/04/2021 19 / 39
Main challenges and solutions
Main challenges and SOLUTIONS
2. Deal with imbalanced datasets
Banulescu-Radu (LEO) WiMLDS 13/04/2021 20 / 39
Main challenges and solutions
Main challenges and SOLUTIONS
2. Deal with imbalanced datasets
Banulescu-Radu (LEO) WiMLDS 13/04/2021 21 / 39
Main challenges and solutions
Main challenges and SOLUTIONS
Banulescu-Radu (LEO) WiMLDS 13/04/2021 22 / 39
Main challenges and solutions
Main challenges and SOLUTIONS
3. Evaluation of fraud detection models
Banulescu-Radu (LEO) WiMLDS 13/04/2021 23 / 39
Main challenges and solutions
Main challenges and SOLUTIONS
4. Improving the interpretability of fraud detection models
“if the users do not trust a model or a prediction, they will not use it”
(Ribeiro et al., 2016)
• LIME method
Ribeiro et al. (2016)
• SHAP (SHapley Additive exPlanations) value
Lundberg and Lee, (2017)
BUT ... to what extent do we need fraud detection models to be interpretable?
Banulescu-Radu (LEO) WiMLDS 13/04/2021 24 / 39
Case studies
Outline
1 Econometrics vs Machine Learning
2 General aspects of fraud
3 Main challenges and solutions
4 Case studies
4.1 Case 1: Insurance fraud detection
4.2 Case 2: Social fraud detection
5 Conclusion
Banulescu-Radu (LEO) WiMLDS 13/04/2021 25 / 39
Case studies Case 1: Insurance fraud detection
Outline
1 Econometrics vs Machine Learning
2 General aspects of fraud
3 Main challenges and solutions
4 Case studies
4.1 Case 1: Insurance fraud detection
4.2 Case 2: Social fraud detection
5 Conclusion
Banulescu-Radu (LEO) WiMLDS 13/04/2021 26 / 39
Case studies Case 1: Insurance fraud detection
General framework
• Fraud claims represented 10% of the total number of claims in 2019 (Insurance
Europe)
• Negative record for France: e2.5 Billion in 2014. Only e219 million recovered.
(ALFA)
Banulescu-Radu (LEO) WiMLDS 13/04/2021 27 / 39
Case studies Case 1: Insurance fraud detection
Methodology
DATA
• 45 954 house claims for the period 2013 to 2017
• French insurance company
• 0.76% of claims are fraudulent
Technical tools
• Logistic LASSO (Cox, 1958; Tibshirani, 1996)
• Random forest (Breiman, 2001)
• Extreme Gradient Boosting or Xgboost (Chen and Guestrin, 2016)
Resampling techniques to deal with imbalanced data
• Random Oversampling
• Synthetic Minority Oversampling TEchnique or SMOTE (Chawla et al., 2002)
• ADAptive SYNthetic sampling or ADASYN (He et al., 2008)
Performance metrics
• AUC-ROC, AUC-PR, Brier score, Log-Loss, F-measure
Banulescu-Radu (LEO) WiMLDS 13/04/2021 28 / 39
Case studies Case 1: Insurance fraud detection
Methodology
Banulescu-Radu (LEO) WiMLDS 13/04/2021 29 / 39
Case studies Case 1: Insurance fraud detection
• Interpretation of results: SHAP value method (global/individual level)
Figure 1: Fraudulent case
Figure 2: Non Fraudulent case
Banulescu-Radu (LEO) WiMLDS 13/04/2021 30 / 39
Case studies Case 2: Social fraud detection
Outline
1 Econometrics vs Machine Learning
2 General aspects of fraud
3 Main challenges and solutions
4 Case studies
4.1 Case 1: Insurance fraud detection
4.2 Case 2: Social fraud detection
5 Conclusion
Banulescu-Radu (LEO) WiMLDS 13/04/2021 31 / 39
Case studies Case 2: Social fraud detection
General framework
• Controlling the risks of social and fiscal fraud and combating illegal work are
also important problems for social justice and economic efficiency
• French mutual organization
• collects data systematically from their beneficiaries
• organizes regular controls on a subsample of their taxpayers
• manages a fraud detection system to identify those who do not pay
their contributions
Banulescu-Radu (LEO) WiMLDS 13/04/2021 32 / 39
Case studies Case 2: Social fraud detection
General framework
Objective: Estimate the tax shortfall.
Definition
The tax shortfall is defined as the potential sum of the tax adjustments
that could have been imposed on companies having defrauded or made er-
roneous social declarations, if they had been effectively audited, whereas
they were not in reality.
Banulescu-Radu (LEO) WiMLDS 13/04/2021 33 / 39
Case studies Case 2: Social fraud detection
Remarks
• the two decisions are neither sequential nor conditional
• the decisions are linked
Banulescu-Radu (LEO) WiMLDS 13/04/2021 34 / 39
Case studies Case 2: Social fraud detection
Banulescu-Radu (LEO) WiMLDS 13/04/2021 35 / 39
Case studies Case 2: Social fraud detection
Methodology: Estimation by Maximum Likelihood
Control decision
Ci =
(
1
0
if C∗
i = Xc,i βc + εc,i > 0
otherwise
∀i = 1, . . . , n (1)
Fraud decision
e
Di =

1
0
if D∗
i = Xd,i βd + εd,i  0
otherwise
∀i = 1, . . . , n (2)
Potential tax shortfall
M∗
i =
(
Xm,i βm + εm,i
0
if e
Di = 1
otherwise
∀i = 1, ..n (3)


εc,i
εd,i
εm,i

 ∼ N

0,
X
with
X
= DRD (4)
D =



σc 0 0
0 σd 0
0 0 σm


 R =



1 ρcd ρcm
ρcd 1 ρdm
ρcm ρdm 1


 (5)
Banulescu-Radu (LEO) WiMLDS 13/04/2021 36 / 39
Conclusion
Outline
1 Econometrics vs Machine Learning
2 General aspects of fraud
3 Main challenges and solutions
4 Case studies
4.1 Case 1: Insurance fraud detection
4.2 Case 2: Social fraud detection
5 Conclusion
Banulescu-Radu (LEO) WiMLDS 13/04/2021 37 / 39
Conclusion
Thank you for your attention!
Banulescu-Radu (LEO) WiMLDS 13/04/2021 38 / 39
Conclusion
Banulescu-Radu (LEO) WiMLDS 13/04/2021 39 / 39

More Related Content

Similar to Fraud detection by Denisa Banulescu-Radu

20687-39027-1-PB.pdf
20687-39027-1-PB.pdf20687-39027-1-PB.pdf
20687-39027-1-PB.pdfIjictTeam
 
Covid-19 Data Analysis and Visualization
Covid-19 Data Analysis and VisualizationCovid-19 Data Analysis and Visualization
Covid-19 Data Analysis and VisualizationIRJET Journal
 
Computer Invention And Its Effect On The Human Body
Computer Invention And Its Effect On The Human BodyComputer Invention And Its Effect On The Human Body
Computer Invention And Its Effect On The Human BodyJessica Myers
 
NEW CORONA VIRUS DISEASE 2022: SOCIAL DISTANCING IS AN EFFECTIVE MEASURE (COV...
NEW CORONA VIRUS DISEASE 2022: SOCIAL DISTANCING IS AN EFFECTIVE MEASURE (COV...NEW CORONA VIRUS DISEASE 2022: SOCIAL DISTANCING IS AN EFFECTIVE MEASURE (COV...
NEW CORONA VIRUS DISEASE 2022: SOCIAL DISTANCING IS AN EFFECTIVE MEASURE (COV...IRJET Journal
 
Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Shantanu Deshpande
 
Here are 4 discussion posts by class mates from the 495 class that.docx
Here are 4 discussion posts by class mates from the 495 class that.docxHere are 4 discussion posts by class mates from the 495 class that.docx
Here are 4 discussion posts by class mates from the 495 class that.docxpooleavelina
 
Aon Retail & Wholesale Inperspective Nov 2016
Aon Retail & Wholesale Inperspective Nov 2016Aon Retail & Wholesale Inperspective Nov 2016
Aon Retail & Wholesale Inperspective Nov 2016Graeme Cross
 
Data science landscape in the insurance industry
Data science landscape in the insurance industryData science landscape in the insurance industry
Data science landscape in the insurance industryStefano Perfetti
 
Data4Impact Expert Workshop Report
Data4Impact Expert Workshop ReportData4Impact Expert Workshop Report
Data4Impact Expert Workshop ReportData4Impact
 
cse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber securitycse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber securitysandeepnani2260
 
Predictive preventative-or-intelligence-led-policing
Predictive preventative-or-intelligence-led-policingPredictive preventative-or-intelligence-led-policing
Predictive preventative-or-intelligence-led-policingYellow Pages of Pakistan
 
Predictive-Preventative-or-Intelligence-Led-Policing
Predictive-Preventative-or-Intelligence-Led-PolicingPredictive-Preventative-or-Intelligence-Led-Policing
Predictive-Preventative-or-Intelligence-Led-PolicingMartin Smith
 
Wireless Communication, Sensing and REM: A Security Perspective
Wireless Communication, Sensing and REM: A Security PerspectiveWireless Communication, Sensing and REM: A Security Perspective
Wireless Communication, Sensing and REM: A Security PerspectiveIRJET Journal
 
Assessing The Nature Of Risk Management Implementation In Manufacturing Small...
Assessing The Nature Of Risk Management Implementation In Manufacturing Small...Assessing The Nature Of Risk Management Implementation In Manufacturing Small...
Assessing The Nature Of Risk Management Implementation In Manufacturing Small...Yolanda Ivey
 
Mind the Gaps: AML and Fraud Global Benchmark Survey
Mind the Gaps: AML and Fraud Global Benchmark Survey Mind the Gaps: AML and Fraud Global Benchmark Survey
Mind the Gaps: AML and Fraud Global Benchmark Survey Paul Hamilton
 

Similar to Fraud detection by Denisa Banulescu-Radu (20)

20687-39027-1-PB.pdf
20687-39027-1-PB.pdf20687-39027-1-PB.pdf
20687-39027-1-PB.pdf
 
Covid-19 Data Analysis and Visualization
Covid-19 Data Analysis and VisualizationCovid-19 Data Analysis and Visualization
Covid-19 Data Analysis and Visualization
 
Computer Invention And Its Effect On The Human Body
Computer Invention And Its Effect On The Human BodyComputer Invention And Its Effect On The Human Body
Computer Invention And Its Effect On The Human Body
 
NEW CORONA VIRUS DISEASE 2022: SOCIAL DISTANCING IS AN EFFECTIVE MEASURE (COV...
NEW CORONA VIRUS DISEASE 2022: SOCIAL DISTANCING IS AN EFFECTIVE MEASURE (COV...NEW CORONA VIRUS DISEASE 2022: SOCIAL DISTANCING IS AN EFFECTIVE MEASURE (COV...
NEW CORONA VIRUS DISEASE 2022: SOCIAL DISTANCING IS AN EFFECTIVE MEASURE (COV...
 
Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques Prediction of Corporate Bankruptcy using Machine Learning Techniques
Prediction of Corporate Bankruptcy using Machine Learning Techniques
 
Here are 4 discussion posts by class mates from the 495 class that.docx
Here are 4 discussion posts by class mates from the 495 class that.docxHere are 4 discussion posts by class mates from the 495 class that.docx
Here are 4 discussion posts by class mates from the 495 class that.docx
 
Aon Retail & Wholesale Inperspective Nov 2016
Aon Retail & Wholesale Inperspective Nov 2016Aon Retail & Wholesale Inperspective Nov 2016
Aon Retail & Wholesale Inperspective Nov 2016
 
Data science landscape in the insurance industry
Data science landscape in the insurance industryData science landscape in the insurance industry
Data science landscape in the insurance industry
 
Data4Impact Expert Workshop Report
Data4Impact Expert Workshop ReportData4Impact Expert Workshop Report
Data4Impact Expert Workshop Report
 
Pwc gdpr survey 2018
Pwc gdpr survey 2018Pwc gdpr survey 2018
Pwc gdpr survey 2018
 
cse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber securitycse-csp batch4 review-1.1.pptx cyber security
cse-csp batch4 review-1.1.pptx cyber security
 
Oxford workshop
Oxford workshopOxford workshop
Oxford workshop
 
Predictive preventative-or-intelligence-led-policing
Predictive preventative-or-intelligence-led-policingPredictive preventative-or-intelligence-led-policing
Predictive preventative-or-intelligence-led-policing
 
Predictive-Preventative-or-Intelligence-Led-Policing
Predictive-Preventative-or-Intelligence-Led-PolicingPredictive-Preventative-or-Intelligence-Led-Policing
Predictive-Preventative-or-Intelligence-Led-Policing
 
The Principles of CSR
The Principles of CSRThe Principles of CSR
The Principles of CSR
 
Wireless Communication, Sensing and REM: A Security Perspective
Wireless Communication, Sensing and REM: A Security PerspectiveWireless Communication, Sensing and REM: A Security Perspective
Wireless Communication, Sensing and REM: A Security Perspective
 
Questionnaire on Financial Consumer Protection measures re COVID-19- Summary ...
Questionnaire on Financial Consumer Protection measures re COVID-19- Summary ...Questionnaire on Financial Consumer Protection measures re COVID-19- Summary ...
Questionnaire on Financial Consumer Protection measures re COVID-19- Summary ...
 
Assessing The Nature Of Risk Management Implementation In Manufacturing Small...
Assessing The Nature Of Risk Management Implementation In Manufacturing Small...Assessing The Nature Of Risk Management Implementation In Manufacturing Small...
Assessing The Nature Of Risk Management Implementation In Manufacturing Small...
 
Cipfa Workshops Scotland
Cipfa Workshops ScotlandCipfa Workshops Scotland
Cipfa Workshops Scotland
 
Mind the Gaps: AML and Fraud Global Benchmark Survey
Mind the Gaps: AML and Fraud Global Benchmark Survey Mind the Gaps: AML and Fraud Global Benchmark Survey
Mind the Gaps: AML and Fraud Global Benchmark Survey
 

More from Paris Women in Machine Learning and Data Science

More from Paris Women in Machine Learning and Data Science (20)

Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Perspectives, by M. Pannegeon
 
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
 
Iana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdfIana Iatsun_ML in production_20Dec2022.pdf
Iana Iatsun_ML in production_20Dec2022.pdf
 
41 WiMLDS Kyiv Paris Poznan.pdf
41 WiMLDS Kyiv Paris Poznan.pdf41 WiMLDS Kyiv Paris Poznan.pdf
41 WiMLDS Kyiv Paris Poznan.pdf
 
Emergency plan to secure winter: what are the measures set up by RTE?
Emergency plan to secure winter: what are the measures set up by RTE?Emergency plan to secure winter: what are the measures set up by RTE?
Emergency plan to secure winter: what are the measures set up by RTE?
 

Recently uploaded

Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
SWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxSWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxviniciusperissetr
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证nhjeo1gg
 

Recently uploaded (20)

Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
SWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxSWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptx
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
 

Fraud detection by Denisa Banulescu-Radu

  • 1. Banulescu-Radu (LEO) WiMLDS 13/04/2021 1 / 39
  • 2. Data Science for Financial Fraud Detection Denisa BANULESCU-RADU University of Orléans, LEO WiMLDS 13th of April 2021 Banulescu-Radu (LEO) WiMLDS 13/04/2021 2 / 39
  • 3. Background • Since 2015: Associate Professor – University of Orléans, LEO • 2016: Young Researcher Award in Economics – Autorité des Marchés Financiers • 2015: Thesis Prize – Fondation Banque de France • 2014-2015: Max Weber Postdoctoral Fellow – European University Institute • 2011-2014: PhD in Economics – Maastricht University and University of Orléans Title dissertation: "Four essays in financial econometrics" Banulescu-Radu (LEO) WiMLDS 13/04/2021 3 / 39
  • 4. Main research interests Banulescu-Radu (LEO) WiMLDS 13/04/2021 4 / 39
  • 5. Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 5 / 39
  • 6. Econometrics vs Machine Learning Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 6 / 39
  • 7. Econometrics vs Machine Learning Econometrics vs Machine Learning Banulescu-Radu (LEO) WiMLDS 13/04/2021 7 / 39
  • 8. Econometrics vs Machine Learning Econometrics vs Machine Learning Banulescu-Radu (LEO) WiMLDS 13/04/2021 8 / 39
  • 9. Econometrics vs Machine Learning “there are a number of areas where there would be opportunities for fruitful collaboration between econometrics and machine learning ” Hal Varian (2014) - Professor of Economics (University of Michigan) & Chief Economist (Google) Banulescu-Radu (LEO) WiMLDS 13/04/2021 9 / 39
  • 10. General aspects of fraud Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 10 / 39
  • 11. General aspects of fraud Fraud detection - Why is it important? Banulescu-Radu (LEO) WiMLDS 13/04/2021 11 / 39
  • 12. General aspects of fraud Definition of fraud Definition • Baesens et al. (2015) Fraud is an uncommon, well-considered, imperceptibly concealed, time-evolving, and often carefully organized crime which appears in many types of forms. Banulescu-Radu (LEO) WiMLDS 13/04/2021 12 / 39
  • 13. General aspects of fraud Typologies of fraud Banulescu-Radu (LEO) WiMLDS 13/04/2021 13 / 39
  • 14. Main challenges and solutions Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 14 / 39
  • 15. Main challenges and solutions Main CHALLENGES and solutions Banulescu-Radu (LEO) WiMLDS 13/04/2021 15 / 39
  • 16. Main challenges and solutions Main CHALLENGES and solutions Banulescu-Radu (LEO) WiMLDS 13/04/2021 16 / 39
  • 17. Main challenges and solutions Main CHALLENGES and solutions Banulescu-Radu (LEO) WiMLDS 13/04/2021 17 / 39
  • 18. Main challenges and solutions Main CHALLENGES and solutions Banulescu-Radu (LEO) WiMLDS 13/04/2021 18 / 39
  • 19. Main challenges and solutions Main challenges and SOLUTIONS 1. Main tools used to fight fraud Banulescu-Radu (LEO) WiMLDS 13/04/2021 19 / 39
  • 20. Main challenges and solutions Main challenges and SOLUTIONS 2. Deal with imbalanced datasets Banulescu-Radu (LEO) WiMLDS 13/04/2021 20 / 39
  • 21. Main challenges and solutions Main challenges and SOLUTIONS 2. Deal with imbalanced datasets Banulescu-Radu (LEO) WiMLDS 13/04/2021 21 / 39
  • 22. Main challenges and solutions Main challenges and SOLUTIONS Banulescu-Radu (LEO) WiMLDS 13/04/2021 22 / 39
  • 23. Main challenges and solutions Main challenges and SOLUTIONS 3. Evaluation of fraud detection models Banulescu-Radu (LEO) WiMLDS 13/04/2021 23 / 39
  • 24. Main challenges and solutions Main challenges and SOLUTIONS 4. Improving the interpretability of fraud detection models “if the users do not trust a model or a prediction, they will not use it” (Ribeiro et al., 2016) • LIME method Ribeiro et al. (2016) • SHAP (SHapley Additive exPlanations) value Lundberg and Lee, (2017) BUT ... to what extent do we need fraud detection models to be interpretable? Banulescu-Radu (LEO) WiMLDS 13/04/2021 24 / 39
  • 25. Case studies Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 25 / 39
  • 26. Case studies Case 1: Insurance fraud detection Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 26 / 39
  • 27. Case studies Case 1: Insurance fraud detection General framework • Fraud claims represented 10% of the total number of claims in 2019 (Insurance Europe) • Negative record for France: e2.5 Billion in 2014. Only e219 million recovered. (ALFA) Banulescu-Radu (LEO) WiMLDS 13/04/2021 27 / 39
  • 28. Case studies Case 1: Insurance fraud detection Methodology DATA • 45 954 house claims for the period 2013 to 2017 • French insurance company • 0.76% of claims are fraudulent Technical tools • Logistic LASSO (Cox, 1958; Tibshirani, 1996) • Random forest (Breiman, 2001) • Extreme Gradient Boosting or Xgboost (Chen and Guestrin, 2016) Resampling techniques to deal with imbalanced data • Random Oversampling • Synthetic Minority Oversampling TEchnique or SMOTE (Chawla et al., 2002) • ADAptive SYNthetic sampling or ADASYN (He et al., 2008) Performance metrics • AUC-ROC, AUC-PR, Brier score, Log-Loss, F-measure Banulescu-Radu (LEO) WiMLDS 13/04/2021 28 / 39
  • 29. Case studies Case 1: Insurance fraud detection Methodology Banulescu-Radu (LEO) WiMLDS 13/04/2021 29 / 39
  • 30. Case studies Case 1: Insurance fraud detection • Interpretation of results: SHAP value method (global/individual level) Figure 1: Fraudulent case Figure 2: Non Fraudulent case Banulescu-Radu (LEO) WiMLDS 13/04/2021 30 / 39
  • 31. Case studies Case 2: Social fraud detection Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 31 / 39
  • 32. Case studies Case 2: Social fraud detection General framework • Controlling the risks of social and fiscal fraud and combating illegal work are also important problems for social justice and economic efficiency • French mutual organization • collects data systematically from their beneficiaries • organizes regular controls on a subsample of their taxpayers • manages a fraud detection system to identify those who do not pay their contributions Banulescu-Radu (LEO) WiMLDS 13/04/2021 32 / 39
  • 33. Case studies Case 2: Social fraud detection General framework Objective: Estimate the tax shortfall. Definition The tax shortfall is defined as the potential sum of the tax adjustments that could have been imposed on companies having defrauded or made er- roneous social declarations, if they had been effectively audited, whereas they were not in reality. Banulescu-Radu (LEO) WiMLDS 13/04/2021 33 / 39
  • 34. Case studies Case 2: Social fraud detection Remarks • the two decisions are neither sequential nor conditional • the decisions are linked Banulescu-Radu (LEO) WiMLDS 13/04/2021 34 / 39
  • 35. Case studies Case 2: Social fraud detection Banulescu-Radu (LEO) WiMLDS 13/04/2021 35 / 39
  • 36. Case studies Case 2: Social fraud detection Methodology: Estimation by Maximum Likelihood Control decision Ci = ( 1 0 if C∗ i = Xc,i βc + εc,i > 0 otherwise ∀i = 1, . . . , n (1) Fraud decision e Di = 1 0 if D∗ i = Xd,i βd + εd,i 0 otherwise ∀i = 1, . . . , n (2) Potential tax shortfall M∗ i = ( Xm,i βm + εm,i 0 if e Di = 1 otherwise ∀i = 1, ..n (3)   εc,i εd,i εm,i   ∼ N 0, X with X = DRD (4) D =    σc 0 0 0 σd 0 0 0 σm    R =    1 ρcd ρcm ρcd 1 ρdm ρcm ρdm 1    (5) Banulescu-Radu (LEO) WiMLDS 13/04/2021 36 / 39
  • 37. Conclusion Outline 1 Econometrics vs Machine Learning 2 General aspects of fraud 3 Main challenges and solutions 4 Case studies 4.1 Case 1: Insurance fraud detection 4.2 Case 2: Social fraud detection 5 Conclusion Banulescu-Radu (LEO) WiMLDS 13/04/2021 37 / 39
  • 38. Conclusion Thank you for your attention! Banulescu-Radu (LEO) WiMLDS 13/04/2021 38 / 39