Presentation of a successful project executed on telecom fraud analytics @ 3rd International conference for businees analytics and intelligence, Indian Institute of Management Bangalore
2. A Quick Intro – Telecom Frauds
Fraud Analytics With Machine Learning &
Engineering
2
• Have you got missed call from unknown numbers from
overseas?
• Have you heard of PBX hacking and corporate facing huge
bills?
3. Problem Definition
• Telecom industries loose 46.3 billion USD
globally due to various frauds
• 10% operators have bad debt due to fraud
• Detection is cat and mouse game – pattern
changes to get undetected by available
data mining techniques
• Timely alert by processing huge volume of
call records is a challenge
• Alerts with high false positives have more
operational expenses
Fraud Analytics With Machine Learning &
Engineering
3
4. Importance to Telecom Industry & Society
• Efficient and self adaptive detection
mechanism can reduce significant loss
(about 2.1% of the revenue) due to fraud
and operational cost
• Less “Bad Money” to the system
Fraud Analytics With Machine Learning & Engineering 4
5. Data Source
• More than 1 TB of Call Detail Record
(CDR) from a reputed wholesale carrier
as history data
• Tested on few weeks of live CDR of the
carrier
Fraud Analytics With Machine Learning & Engineering 5
6. Analytics Technique
• Basic components of FAME are:
– Self adaptive Machine learning
methodology
– Actionable dash board for operations and
investigations team to act upon the alerts
and feedback sent to machine learning
model for adjusting weights.
– High performance big data platform for
data processing and machine learning
Fraud Analytics With Machine Learning & Engineering 6
7. How it detects and adapts …
7Fraud Analytics With Machine Learning & Engineering
Fraud Detection Model
Pipeline
Novelty Detection
Pipeline / Stacking
Actionable Dashboards
Pattern validation and
tuning work bench
CDR Feed
1
2 4
Remaining
Data
Frauds detected
3
5
6
7 New Patterns
More frauds
8
New model addition / Tuning of existing9
10
Operators
feedback
Analyst
Operator
8. Novelty Detection Pipeline
8Fraud Analytics With Machine Learning & Engineering
• Novelty detection of origin and destination
numbers separately
• Various Contextual Anomaly Detection used and
outputs are combined
• Below are some examples of algorithms used
• Box-plot based outlier
• Clustering to find out cluster with distinct
centroid
• Use of Mahalonbis Distance –
Mdist > ɸ. IQR
10. Fraud Detection Pipeline
10
• Use history data and flag records based on
“Novelty Detection Pipeline”
• Verify those records and mark them
• Build separate models (logistic regression,
random forest models and threshold based)
for different patterns
• Combine outputs of the models
Fraud Analytics With Machine Learning & Engineering
11. ACTIONABLE DASHBOARD
System Behind Magic …
11Fraud Analytics With Machine Learning & Engineering
ENSEMBLE OF SELF ADAPTIVE ALGOS
BIG DATA PLATFORM
POWERED BY HADOOP & SPARK
INTEGRATION
FACETS
FEEDBACK
CDR FEED
FROM TELECOM SYSTEM
13. Accuracy Results
13
0 0.2 0.4 0.6 0.8 1
True positive
False positive
Accuracy
B-Number A-Number
Fraud Analytics With Machine Learning & Engineering
• Individual accuracy for
origin and destination
numbers detection
• Combined mechanism
has <5% false positive
14. What Next …
14
• Test for different types telecom frauds
• Extend this industrialized approach to other
areas (such as network intrusion detection)
• Productize as cloud based service as well as on
premise implementation
Fraud Analytics With Machine Learning & Engineering
15. Contact Us @
15Fraud Analytics With Machine Learning & Engineering
Amartya Kumar Das
amartya_das_2014@cba.isb.edu
https://in.linkedin.com/pub/amartya-
das/b/72b/637
Subhadip Paul
Subhadip_paul_2014@cba.isb.edu
https://in.linkedin.com/in/subhadippaul
Pranab Kumar Dash
Pranab_dash_2014@cba.isb.edu
www.linkedin.com/profile/view?id=19155
039
Sudarson Roy Pratihar
sudarson_pratihar_2014@cba.isb.edu
www.linkedin.com/in/sudarson
Follow us #FAMETELCO