SlideShare a Scribd company logo
1 of 42
Download to read offline
FRAUD DETECTION
BIG DATA ANALYSIS
(HEALTHCARE APPLICATION)
MAHDI ESMAILOGHLI MASMAILLOGLI@GMAIL.COM BIGDATA.CEIT.AUT.AC.IR
WHAT IS A
FRAUD???
“… any illegal act characterized by deceit, concealment, or
violation of trust. These acts are not dependent upon the
threat of violence or physical force. Frauds are perpetrated
by parties and organizations to obtain money, property, or
services; to avoid payment or loss of services; or to secure
personal or business advantage.”
International Professional
Practices Framework(IPPF)
DEFINITION
FRAUD
WHERE COULD BE
FOUND…
DOMAIN OF APPLICATION
WHERE FRAUD COULD BE FOUND?
▸ HealthCare Systems
▸ Credit Cards Domain
▸ Social Networks
▸ Satellite Or Army Systems Controlling
▸ …
HEALTHCARE
FRAUD IN
DIFFERENCES
WHAT IS THE CHARACTERISTICS OF HEALTHCARE DOMAIN DATA?
▸ Complexity and number of fields in these kind of data are
tremendous.
▸ The people or organizations attends to make profit to others.
▸ Data is really BIG and sometimes stream
▸ Many kinds of data like: Image, Raw Text, Sound, …
▸ Data are not labeled and hard to classification
▸ Concept drifting
SOME TIPS ABOUT IMPORTANCE OF
BigData in HealthCare
TIPS
ROLE OF BIG DATA IN HEALTHCARE
▸ DNA. One of the most important public datasets in
Amazon.
▸ Stanford’s BigData conference is all about Healthcare
▸ Microsoft has stablished an academic part to work on
healthcare
▸ Loss of money in many countries because of FRAUD in
healthcare (up to 10% US annual health care expenditure)
EXAMPLES
SOME FRAUDS THAT TRADITIONAL HEALTHCARE SYSTEMS USED TO FACE WITH
▸ Changing patient’s insurance identification document
▸ Prescribing some fixed brands of drugs by a Dr
▸ Prescribing expensive drugs than what is usual for same
disease
▸ getting some kinds of drugs by a patient more than usual
▸ and many more…
HEALTHCARE
METHODS OF FRAUD DETECTION IN
SOLUTIONS
DETECTING HEALTHCARE FRAUD
▸ Statistical
▸ Machine learning and Data mining
▸ Graph analysis
STATISTICAL METHODS
FRAUD DETECTION USING
STATISTICAL METHODS
STATISTICAL METHODS…
▸ Uses some rules
▸ Rules are described by a domain expert
▸ Creating application to initial statistical parameters ex:
▸ Count average of drugs in every prescription
▸ Total price of every disease
▸ Then they can be compared with new data. If high
difference found, ALARM GOES OFF
STATISTICAL METHODS
CONS AND PROS
▸ It’s very simple and easy to implement
▸ Low computation overhead
▸ Very easy to use for stream data
▸ Low flexibility
▸ Can’t be used for data concept drifting
▸ Adding rules is hard
▸ Every thing is based on domain expert knowledge
▸ It’s possible that defined solution wouldn’t be complete
MACHINE LEARNING AND DATA
MINING ALGORITHMS
FRAUD DETECTION USING
MACHINE LEARNING AND DATA MINING ALGORITHMS
MACHINE LEARNING ALGORITHMS
▸ Choosing one or more machine learning algorithm based
on the data
▸ Use them for learning and detecting frauds
▸ If (data are labeled) classification is perfect idea
▸ Else clustering
▸ Or using clustering to labeling and the using classifications
GRAPH ANALYSIS
FRAUD DETECTION USING
GRAPH BASED FRAUD DETECTION
GRAPH ANALYSIS
▸ It has been going popular since 2015
▸ It’s still just a assistant system to get along with machine
learning algorithms
▸ It can’t consider all aspects
▸ But handy
USING PAGE RANK TO
HEALTHCARE FRAUD DETECTION
HORTON WORKS
USING PAGE RANK TO HEALTHCARE FRAUD DETECTION
DATA FIELDS
▸ NPI (National Provider Id)
▸ Speciality
▸ Procedure Code
▸ Count
PERSONALIZED PAGE RANK
PAGE RANK AND
EXAMPLE
PAGE RANK ON DATA 13.5%
13.5%
9.5%
13.3%
17.6%
9.5%13.7%
9.5%
Dermatologist
Surgeon
Internist
EXAMPLE
PERSONALIZED PAGE RANK ON DERMATOLOGIST SPECIALITY
24.1%
24.1%
18.7%
15.4%
7.9%
2.9%
4.1%
2.9%
Dermatologist
Surgeon
Internist
GRAPH BASED
FRAUD DETECTION
ENVIRONMENT OF
THE PAPER
ENVIRONMENT OF PAPER
ENVIRONMENT OF PAPER
▸ Dataset: CMS Medicare Part-B
▸ Used Apache HADOOP and Apache Pig
▸ 8 nodes
▸ 4 cores for each node
▸ 64 GB of memory for each node
▸ Total time of execution: 3 hours
STEPS OF THE ALGORITHM
Step 1
STEP 1
COMPUTE THE SIMILARITY BETWEEN PROVIDERS
▸ Computing similarities between providers based on
shared procedure
▸ If similarity of two providers are more than a threshold an
edge connects them
▸ Sensitive Hashing & DimSum can help but it didn’t use
▸ 880K providers => 774 billion similarity computation
▸ My dataset: ~140 providers => 20K similarity computation
STEPS OF THE ALGORITHM
Step 2
STEP 2
COMPUTING PERSONALIZED PAGE RANK FOR EACH SPECIALITY
▸ Loop over all specialities
▸ For each speciality apply Personalized Page Rank to the
graph
▸ Identify anomalous providers: PRSpeciality(node) high but
whose whose speciality is not the one used for the page
rank calculation
EXAMPLE
PERSONALIZED PAGE RANK ON DERMATOLOGIST SPECIALITY
24.1%
24.1%
18.7%
15.4%
7.9%
2.9%
4.1%
2.9%
Dermatologist
Surgeon
Internist
IMPLEMENTATION ON APACHE
SPARK
OUR ANALYSIS
SPARK IMPLEMENTATION
WHAT WE DID IN SPARK
▸ Implementation from the scratch
▸ Changing the algorithm of page rank in Spark GraphX
▸ Every Personalized Page Rank runs 100 loops
▸ Dataset contains 20,000 raw data
▸ It took 20 minutes to run the algorithm on a core i7, 4core
macbook Pro with 4GB memory (main part of memory
occupied by OS)
SOME RESULT OF
FRAUD DETECTION
RESULTS
ALGORITHM SPEED ANALYSIS
SPEED ANALYSIS BASED ON ITERATION COUNT
0
175
350
525
700
10 25 50 75 100
68 120
249
462
690
SOLUTION
ANALYSIS
SOLUTION ANALYSIS
CONS. AND PROS.
▸ Algorithm need computing similarity for all pairs of providers.
▸ It just consider one aspect of the fraud. Not complete
▸ Low speed & needs huge amount of memory (because of
computing similarity at first) - 2GB data needs 512 GB Ram
▸ Hard to add new data and update the graph
▸ High cost of part 2
▸ Needs to define rules to use graph analysis (other papers)
SOLUTION ANALYSIS
CONS. AND PROS.
▸ Part 1 needs shuffle => reduce performance
▸ Modeling as a graph => easy to understand and analysis
▸ New way of fraud detection. progressing
▸ Capable of using LSH but wanted 100% accuracy
SUGGESTIONS
FUTURE
▸ Using other centrality algorithms
▸ Using algorithms like community detection instead of
clustering
▸ If we injects data of patients we can do more (in a bipartite
graph we can detect frauds of more popular providers).
Healthcare fraud detection

More Related Content

What's hot

Fraud Detection in Insurance with Machine Learning for WARTA - Artur Suchwalko
Fraud Detection in Insurance with Machine Learning for WARTA - Artur SuchwalkoFraud Detection in Insurance with Machine Learning for WARTA - Artur Suchwalko
Fraud Detection in Insurance with Machine Learning for WARTA - Artur SuchwalkoInstitute of Contemporary Sciences
 
Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcareDeZyre
 
Big-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewBig-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewHamdaoui Younes
 
Detecting Fraud Using Data Mining Techniques
Detecting Fraud Using Data Mining TechniquesDetecting Fraud Using Data Mining Techniques
Detecting Fraud Using Data Mining TechniquesDecosimoCPAs
 
Data Analytics on Healthcare Fraud
Data Analytics on Healthcare FraudData Analytics on Healthcare Fraud
Data Analytics on Healthcare FraudNicholas Szeto
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud DetectionNitesh Kumar
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for HealthcareChandan Reddy
 
Introduction to Population Health Analytics, Predictive Analytics, Big Data a...
Introduction to Population Health Analytics, Predictive Analytics, Big Data a...Introduction to Population Health Analytics, Predictive Analytics, Big Data a...
Introduction to Population Health Analytics, Predictive Analytics, Big Data a...Frank Wang
 
Electronic health records and machine learning
Electronic health records and machine learningElectronic health records and machine learning
Electronic health records and machine learningEman Abdelrazik
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
cyber security and forensic tools
cyber security and forensic toolscyber security and forensic tools
cyber security and forensic toolsSonu Sunaliya
 
Big data analytics in healthcare industry
Big data analytics in healthcare industryBig data analytics in healthcare industry
Big data analytics in healthcare industryBhagath Gopinath
 
Predictive Analytics: It's The Intervention That Matters
Predictive Analytics: It's The Intervention That MattersPredictive Analytics: It's The Intervention That Matters
Predictive Analytics: It's The Intervention That MattersHealth Catalyst
 
Hospital Management System.pptx
Hospital Management System.pptxHospital Management System.pptx
Hospital Management System.pptxLakshayPanchal
 
Hipaa overview 073118
Hipaa overview 073118Hipaa overview 073118
Hipaa overview 073118robint2125
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learningSandeep Garg
 

What's hot (20)

Fraud Detection in Insurance with Machine Learning for WARTA - Artur Suchwalko
Fraud Detection in Insurance with Machine Learning for WARTA - Artur SuchwalkoFraud Detection in Insurance with Machine Learning for WARTA - Artur Suchwalko
Fraud Detection in Insurance with Machine Learning for WARTA - Artur Suchwalko
 
Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcare
 
Big-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewBig-Data in HealthCare _ Overview
Big-Data in HealthCare _ Overview
 
Detecting Fraud Using Data Mining Techniques
Detecting Fraud Using Data Mining TechniquesDetecting Fraud Using Data Mining Techniques
Detecting Fraud Using Data Mining Techniques
 
Data Analytics on Healthcare Fraud
Data Analytics on Healthcare FraudData Analytics on Healthcare Fraud
Data Analytics on Healthcare Fraud
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Introduction to Population Health Analytics, Predictive Analytics, Big Data a...
Introduction to Population Health Analytics, Predictive Analytics, Big Data a...Introduction to Population Health Analytics, Predictive Analytics, Big Data a...
Introduction to Population Health Analytics, Predictive Analytics, Big Data a...
 
Electronic health records and machine learning
Electronic health records and machine learningElectronic health records and machine learning
Electronic health records and machine learning
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data cleansing
Data cleansingData cleansing
Data cleansing
 
cyber security and forensic tools
cyber security and forensic toolscyber security and forensic tools
cyber security and forensic tools
 
Web mining
Web miningWeb mining
Web mining
 
Big Data & Data Mining
Big Data & Data MiningBig Data & Data Mining
Big Data & Data Mining
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 
Big data analytics in healthcare industry
Big data analytics in healthcare industryBig data analytics in healthcare industry
Big data analytics in healthcare industry
 
Predictive Analytics: It's The Intervention That Matters
Predictive Analytics: It's The Intervention That MattersPredictive Analytics: It's The Intervention That Matters
Predictive Analytics: It's The Intervention That Matters
 
Hospital Management System.pptx
Hospital Management System.pptxHospital Management System.pptx
Hospital Management System.pptx
 
Hipaa overview 073118
Hipaa overview 073118Hipaa overview 073118
Hipaa overview 073118
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 

Similar to Healthcare fraud detection

Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperGarvit Burad
 
H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020
H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020
H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020innov-acts-ltd
 
Putting data science into perspective
Putting data science into perspectivePutting data science into perspective
Putting data science into perspectiveSravan Ankaraju
 
Data mining
Data miningData mining
Data miningsagar dl
 
Anomaly Detection in big data
Anomaly Detection in big dataAnomaly Detection in big data
Anomaly Detection in big dataaNumak & Company
 
Infographic Things You Should Know About Big Data Testing
Infographic Things You Should Know About Big Data TestingInfographic Things You Should Know About Big Data Testing
Infographic Things You Should Know About Big Data TestingKiwiQA
 
IRJET - Data Mining and Machine Learning for Cyber Security
IRJET - Data Mining and Machine Learning for Cyber SecurityIRJET - Data Mining and Machine Learning for Cyber Security
IRJET - Data Mining and Machine Learning for Cyber SecurityIRJET Journal
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data miningNeeda Multani
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Bigfinite
 
Introduction to business analytics.pptx
Introduction to business analytics.pptxIntroduction to business analytics.pptx
Introduction to business analytics.pptxCharlou Bautista
 
Essential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big DataEssential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big DataSociety of Petroleum Engineers
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining IntroAsma CHERIF
 
Next generation security analytics
Next generation security analyticsNext generation security analytics
Next generation security analyticsChristian Have
 
Data Driven Testing Is More Than an Excel File
Data Driven Testing Is More Than an Excel FileData Driven Testing Is More Than an Excel File
Data Driven Testing Is More Than an Excel FileMehmet Gök
 
Operationalizing Big Data to Reduce Risk of High Consequence Decisions in Com...
Operationalizing Big Data to Reduce Risk of High Consequence Decisions in Com...Operationalizing Big Data to Reduce Risk of High Consequence Decisions in Com...
Operationalizing Big Data to Reduce Risk of High Consequence Decisions in Com...OAG Analytics
 
Introduction to Data Stream Processing
Introduction to Data Stream ProcessingIntroduction to Data Stream Processing
Introduction to Data Stream ProcessingSafe Software
 

Similar to Healthcare fraud detection (20)

Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research Paper
 
H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020
H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020
H2020 finsec-ort-webinar-ml-dl-cybersecurity-july 2020
 
Putting data science into perspective
Putting data science into perspectivePutting data science into perspective
Putting data science into perspective
 
Data mining
Data miningData mining
Data mining
 
Anomaly Detection in big data
Anomaly Detection in big dataAnomaly Detection in big data
Anomaly Detection in big data
 
Infographic Things You Should Know About Big Data Testing
Infographic Things You Should Know About Big Data TestingInfographic Things You Should Know About Big Data Testing
Infographic Things You Should Know About Big Data Testing
 
IRJET - Data Mining and Machine Learning for Cyber Security
IRJET - Data Mining and Machine Learning for Cyber SecurityIRJET - Data Mining and Machine Learning for Cyber Security
IRJET - Data Mining and Machine Learning for Cyber Security
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data mining
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
 
Business Analytics.pptx
Business Analytics.pptxBusiness Analytics.pptx
Business Analytics.pptx
 
Introduction to business analytics.pptx
Introduction to business analytics.pptxIntroduction to business analytics.pptx
Introduction to business analytics.pptx
 
Data analysis
Data analysisData analysis
Data analysis
 
Essential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big DataEssential Prerequisites for Maximizing Success from Big Data
Essential Prerequisites for Maximizing Success from Big Data
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
Next generation security analytics
Next generation security analyticsNext generation security analytics
Next generation security analytics
 
Data Driven Testing Is More Than an Excel File
Data Driven Testing Is More Than an Excel FileData Driven Testing Is More Than an Excel File
Data Driven Testing Is More Than an Excel File
 
Operationalizing Big Data to Reduce Risk of High Consequence Decisions in Com...
Operationalizing Big Data to Reduce Risk of High Consequence Decisions in Com...Operationalizing Big Data to Reduce Risk of High Consequence Decisions in Com...
Operationalizing Big Data to Reduce Risk of High Consequence Decisions in Com...
 
HashCash big data services
HashCash big data servicesHashCash big data services
HashCash big data services
 
Introduction to Data Stream Processing
Introduction to Data Stream ProcessingIntroduction to Data Stream Processing
Introduction to Data Stream Processing
 

Recently uploaded

hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 

Recently uploaded (20)

hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 

Healthcare fraud detection

  • 1. FRAUD DETECTION BIG DATA ANALYSIS (HEALTHCARE APPLICATION) MAHDI ESMAILOGHLI MASMAILLOGLI@GMAIL.COM BIGDATA.CEIT.AUT.AC.IR
  • 3. “… any illegal act characterized by deceit, concealment, or violation of trust. These acts are not dependent upon the threat of violence or physical force. Frauds are perpetrated by parties and organizations to obtain money, property, or services; to avoid payment or loss of services; or to secure personal or business advantage.” International Professional Practices Framework(IPPF) DEFINITION
  • 5. DOMAIN OF APPLICATION WHERE FRAUD COULD BE FOUND? ▸ HealthCare Systems ▸ Credit Cards Domain ▸ Social Networks ▸ Satellite Or Army Systems Controlling ▸ …
  • 7. DIFFERENCES WHAT IS THE CHARACTERISTICS OF HEALTHCARE DOMAIN DATA? ▸ Complexity and number of fields in these kind of data are tremendous. ▸ The people or organizations attends to make profit to others. ▸ Data is really BIG and sometimes stream ▸ Many kinds of data like: Image, Raw Text, Sound, … ▸ Data are not labeled and hard to classification ▸ Concept drifting
  • 8. SOME TIPS ABOUT IMPORTANCE OF BigData in HealthCare
  • 9. TIPS ROLE OF BIG DATA IN HEALTHCARE ▸ DNA. One of the most important public datasets in Amazon. ▸ Stanford’s BigData conference is all about Healthcare ▸ Microsoft has stablished an academic part to work on healthcare ▸ Loss of money in many countries because of FRAUD in healthcare (up to 10% US annual health care expenditure)
  • 10. EXAMPLES SOME FRAUDS THAT TRADITIONAL HEALTHCARE SYSTEMS USED TO FACE WITH ▸ Changing patient’s insurance identification document ▸ Prescribing some fixed brands of drugs by a Dr ▸ Prescribing expensive drugs than what is usual for same disease ▸ getting some kinds of drugs by a patient more than usual ▸ and many more…
  • 12. SOLUTIONS DETECTING HEALTHCARE FRAUD ▸ Statistical ▸ Machine learning and Data mining ▸ Graph analysis
  • 14. STATISTICAL METHODS STATISTICAL METHODS… ▸ Uses some rules ▸ Rules are described by a domain expert ▸ Creating application to initial statistical parameters ex: ▸ Count average of drugs in every prescription ▸ Total price of every disease ▸ Then they can be compared with new data. If high difference found, ALARM GOES OFF
  • 15. STATISTICAL METHODS CONS AND PROS ▸ It’s very simple and easy to implement ▸ Low computation overhead ▸ Very easy to use for stream data ▸ Low flexibility ▸ Can’t be used for data concept drifting ▸ Adding rules is hard ▸ Every thing is based on domain expert knowledge ▸ It’s possible that defined solution wouldn’t be complete
  • 16. MACHINE LEARNING AND DATA MINING ALGORITHMS FRAUD DETECTION USING
  • 17. MACHINE LEARNING AND DATA MINING ALGORITHMS MACHINE LEARNING ALGORITHMS ▸ Choosing one or more machine learning algorithm based on the data ▸ Use them for learning and detecting frauds ▸ If (data are labeled) classification is perfect idea ▸ Else clustering ▸ Or using clustering to labeling and the using classifications
  • 19. GRAPH BASED FRAUD DETECTION GRAPH ANALYSIS ▸ It has been going popular since 2015 ▸ It’s still just a assistant system to get along with machine learning algorithms ▸ It can’t consider all aspects ▸ But handy
  • 20. USING PAGE RANK TO HEALTHCARE FRAUD DETECTION HORTON WORKS
  • 21. USING PAGE RANK TO HEALTHCARE FRAUD DETECTION DATA FIELDS ▸ NPI (National Provider Id) ▸ Speciality ▸ Procedure Code ▸ Count
  • 23. EXAMPLE PAGE RANK ON DATA 13.5% 13.5% 9.5% 13.3% 17.6% 9.5%13.7% 9.5% Dermatologist Surgeon Internist
  • 24. EXAMPLE PERSONALIZED PAGE RANK ON DERMATOLOGIST SPECIALITY 24.1% 24.1% 18.7% 15.4% 7.9% 2.9% 4.1% 2.9% Dermatologist Surgeon Internist
  • 26. ENVIRONMENT OF PAPER ENVIRONMENT OF PAPER ▸ Dataset: CMS Medicare Part-B ▸ Used Apache HADOOP and Apache Pig ▸ 8 nodes ▸ 4 cores for each node ▸ 64 GB of memory for each node ▸ Total time of execution: 3 hours
  • 27. STEPS OF THE ALGORITHM Step 1
  • 28. STEP 1 COMPUTE THE SIMILARITY BETWEEN PROVIDERS ▸ Computing similarities between providers based on shared procedure ▸ If similarity of two providers are more than a threshold an edge connects them ▸ Sensitive Hashing & DimSum can help but it didn’t use ▸ 880K providers => 774 billion similarity computation ▸ My dataset: ~140 providers => 20K similarity computation
  • 29. STEPS OF THE ALGORITHM Step 2
  • 30. STEP 2 COMPUTING PERSONALIZED PAGE RANK FOR EACH SPECIALITY ▸ Loop over all specialities ▸ For each speciality apply Personalized Page Rank to the graph ▸ Identify anomalous providers: PRSpeciality(node) high but whose whose speciality is not the one used for the page rank calculation
  • 31. EXAMPLE PERSONALIZED PAGE RANK ON DERMATOLOGIST SPECIALITY 24.1% 24.1% 18.7% 15.4% 7.9% 2.9% 4.1% 2.9% Dermatologist Surgeon Internist
  • 33. SPARK IMPLEMENTATION WHAT WE DID IN SPARK ▸ Implementation from the scratch ▸ Changing the algorithm of page rank in Spark GraphX ▸ Every Personalized Page Rank runs 100 loops ▸ Dataset contains 20,000 raw data ▸ It took 20 minutes to run the algorithm on a core i7, 4core macbook Pro with 4GB memory (main part of memory occupied by OS)
  • 34. SOME RESULT OF FRAUD DETECTION RESULTS
  • 35.
  • 36.
  • 37. ALGORITHM SPEED ANALYSIS SPEED ANALYSIS BASED ON ITERATION COUNT 0 175 350 525 700 10 25 50 75 100 68 120 249 462 690
  • 39. SOLUTION ANALYSIS CONS. AND PROS. ▸ Algorithm need computing similarity for all pairs of providers. ▸ It just consider one aspect of the fraud. Not complete ▸ Low speed & needs huge amount of memory (because of computing similarity at first) - 2GB data needs 512 GB Ram ▸ Hard to add new data and update the graph ▸ High cost of part 2 ▸ Needs to define rules to use graph analysis (other papers)
  • 40. SOLUTION ANALYSIS CONS. AND PROS. ▸ Part 1 needs shuffle => reduce performance ▸ Modeling as a graph => easy to understand and analysis ▸ New way of fraud detection. progressing ▸ Capable of using LSH but wanted 100% accuracy
  • 41. SUGGESTIONS FUTURE ▸ Using other centrality algorithms ▸ Using algorithms like community detection instead of clustering ▸ If we injects data of patients we can do more (in a bipartite graph we can detect frauds of more popular providers).