SlideShare a Scribd company logo
1 of 22
Drive Away Fraudsters With Driverless
AI
Agenda Problem
Approach
Experiments
Conclusion
©2017 PayPal Inc. Confidential and proprietary.
PROBLEM
Fraud Prevention @ PayPal
Robust feature engineering, machine
learning and statistical models
Highly scalable and multi-layered
infrastructure software
Superior team of data scientists,
researchers, financial and intelligence
analysts
Images source:
Collusion Fraud – An Example Scenario
Buyer purchases an item using PayPal
Seller ships item to buyer
Buyer finds box empty & asks PayPal for
refund
Images source:
PayPal can refund buyer (eligible transactions under Buyer Protection)
Collusion Fraud – An Example Scenario
PayPal incur loss
Seller gives proof
Buyer and seller split the money..
Images source:
Collusion Fraud
PayPal asks seller for
proof
Repays seller (Seller Protection)
© 2017 PayPal Inc. Confidential and proprietary.
Collusion Fraud
• What if there are many such buyers colluding with many such sellers?
• What if such buyers & sellers stay below the radar? ($Transaction <
Threshold)?
• What if such sellers behave well with majority of the buyers & collude
with a few?
• How do we detect if the buyers and sellers are legitimate or if they are
colluding with each other?
7
© 2017 PayPal Inc. Confidential and proprietary.
Collusion Fraud
8
• Can we exploit network structure of
fraudsters to solve collusion fraud?
Buyer
Buyer
Buyer
Buyer
IP1
IP2
IP3
Seller1
Seller1
Seller1
Ships empty box
Ships good
Logs in
Purchase item
APPROACH
© 2017 PayPal Inc. Confidential and proprietary.
Traditional ML vs End-to-End Learning
• Traditional ML –
Significant human effort
in engineering features &
labelling
• End-to-End Learning –
Use algorithm (such as
Deep learning) to learn
feature representations
automatically
• need tabular data
10
Raw
Data
Expert Driven
Feature Engineering
Features Algorithm
Models
Raw
Data
Feature Learning (Representation Learning)
Traditional ML
End-to-End Learning
Images source:
© 2017 PayPal Inc. Confidential and proprietary.
Approach
• Learn features from graphs & use Driverless AI to engineer additional
features and build model.
• Feature learning on graphs is fairly new research area & its full potential
has not been realized.
• Graph based feature learning helps to understand fraud network & thus
prevent collusion and other organized crimes
11
Models
Raw
Data
Representation Learning
(Automatic Feature)
Driverless AI
(Feature Engineering +
Model Training
Images source:
Expert Engineered
Feature
Feature
Tabular Data
© 2017 PayPal Inc. Confidential and proprietary.
Approach – Graph Based Representation Learning
• Idea
• You shall know a word (node) by the company (neigbhor) it keep(s)* (Firth, J. R. 1957)
• Word2Vec* - Continuous feature representation for words
• Suppose user searches for “hotel”, we want to also match “motel”
• One hot representation (discrete)
• Build a dense vector to predict other words from context
• Two algorithms – Skip Gram (SG) (Predict context words from target) &
Continuous Bag of Words (CBOW) (Predict target from context)
• Graph Based Representation Learning
• Learn continuous feature representation for nodes
• Representation incorporates community a node belong to & role they play
12*source: prof. manning’s lecture notes
(http://web.stanford.edu/class/cs224n/)
© 2017 PayPal Inc. Confidential and proprietary.
Algorithm – node2vec*Grover & Leskovec, 2016
13
*source: https://arxiv.org/pdf/1607.00653.pdf
• Node2Vec
• s1, s2, s3, s4, u - same
community
• u & s6 also play the role of hub
• “neigborhood preserving” graph
based objective function
optimized using SGD
• MLE optimization problem
• BFS & DFS sampling to
generate neigbhors
© 2017 PayPal Inc. Confidential and proprietary.
Implementation Framework
14
Raw Events
(HDFS)
Human Expert
Node
Representation
Feature vector
node2vec
Graph DB
Expert
Engineered
Feature vector
Driverless AI
(Feature Engineering +
Model Training)
EXPERIMENTS
© 2017 PayPal Inc. Confidential and proprietary.
Datasets
16
• Training Data
• Subset of 1 year transactions
• 1.5 billion edges & 0.5 million nodes
• Test Data
• 3 months
• # of features
• 400 - 600
© 2017 PayPal Inc. Confidential and proprietary.
Environment &Tools
17
• Node2Vec – Node representation learning
• Driverless AI – Feature Engineering & ModelTraining
• Spark – Data Preparation/Pre-processing
• Hardware – GPU server
• 4 Pascal 100 GPU
• 160 cores CPUs
• 1 TB RAM
© 2017 PayPal Inc. Confidential and proprietary.
Experiment
18
• Training time (subset of data) – Driverless AI on GPU 6x faster
• laptop (accuracy 1) - ~ 2 hours
• GPU (accuracy 1) – 21 minutes; (accuracy 5) – 58 minutes
© 2017 PayPal Inc. Confidential and proprietary. 19
Experiment
• Top 5 variables from DAI
• AUC – 0.9477
CONCLUSIONS
© 2017 PayPal Inc. Confidential and proprietary.
Conclusions
21
• Graph based representation learning yield robust feature set for complex
fraud patterns such as collusion fraud
• Driverless AI not only help to engineer additional features automatically
but also significantly improve model training time (under 2 hours).
• Journey into DAI just beginning…
• Next steps
• Evaluate Driverless AI results on out-of-time data sets.
• Evaluate Driverless AI directly on raw data
• Evaluate representation learning on edges and weighted graphs
• Machine learning on graphs
© 2017 PayPal Inc. Confidential and proprietary.
Acknowledgements
22
Driverless AI Team @ H2O.ai

More Related Content

What's hot

Driverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on LabDriverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on Lab
Sri Ambati
 
Scalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2OScalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2O
Sri Ambati
 
Introducción al Machine Learning Automático
Introducción al Machine Learning AutomáticoIntroducción al Machine Learning Automático
Introducción al Machine Learning Automático
Sri Ambati
 
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
Sri Ambati
 

What's hot (20)

H2O AutoML roadmap - Ray Peck
H2O AutoML roadmap - Ray PeckH2O AutoML roadmap - Ray Peck
H2O AutoML roadmap - Ray Peck
 
Driverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on LabDriverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on Lab
 
H2O Driverless AI Workshop
H2O Driverless AI WorkshopH2O Driverless AI Workshop
H2O Driverless AI Workshop
 
Machine Learning with H2O
Machine Learning with H2OMachine Learning with H2O
Machine Learning with H2O
 
Scalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2OScalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2O
 
Introducción al Machine Learning Automático
Introducción al Machine Learning AutomáticoIntroducción al Machine Learning Automático
Introducción al Machine Learning Automático
 
Scalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2OScalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2O
 
Intro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LAIntro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LA
 
An Introduction to H2O4GPU
An Introduction to H2O4GPUAn Introduction to H2O4GPU
An Introduction to H2O4GPU
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
Neo4j GraphDay Seattle- Sept19- in the enterprise
Neo4j GraphDay Seattle- Sept19-  in the enterpriseNeo4j GraphDay Seattle- Sept19-  in the enterprise
Neo4j GraphDay Seattle- Sept19- in the enterprise
 
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
 
Ml infra at an early stage
Ml infra at an early stageMl infra at an early stage
Ml infra at an early stage
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scale
 
Latest Developments in H2O
Latest Developments in H2OLatest Developments in H2O
Latest Developments in H2O
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
 

Similar to Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data Scientist, PayPal

Hadoop summit 2017 enterprise graph analytics
Hadoop summit 2017 enterprise graph analyticsHadoop summit 2017 enterprise graph analytics
Hadoop summit 2017 enterprise graph analytics
Jun(Terry) Yang
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create
PyData
 
Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...
DataWorks Summit
 

Similar to Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data Scientist, PayPal (20)

Graph representation learning to prevent payment collusion fraud
Graph representation learning to prevent payment collusion fraudGraph representation learning to prevent payment collusion fraud
Graph representation learning to prevent payment collusion fraud
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
 
Crawlable Spatial Data - #Geo4Web research topic #3
Crawlable Spatial Data - #Geo4Web research topic #3Crawlable Spatial Data - #Geo4Web research topic #3
Crawlable Spatial Data - #Geo4Web research topic #3
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
AI & AWS DeepComposer
AI & AWS DeepComposerAI & AWS DeepComposer
AI & AWS DeepComposer
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
Hadoop summit 2017 enterprise graph analytics
Hadoop summit 2017 enterprise graph analyticsHadoop summit 2017 enterprise graph analytics
Hadoop summit 2017 enterprise graph analytics
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create
 
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
 
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud - ENT328 - re:Inve...
 
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
 
Hadoop Summit 2017 Enterprise Graph Analytics
Hadoop Summit 2017 Enterprise Graph AnalyticsHadoop Summit 2017 Enterprise Graph Analytics
Hadoop Summit 2017 Enterprise Graph Analytics
 
Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)Artificial Intelligence (ML - DL)
Artificial Intelligence (ML - DL)
 
Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...
 
Emotion recognition in images: from idea to a model in production - Nordic DS...
Emotion recognition in images: from idea to a model in production - Nordic DS...Emotion recognition in images: from idea to a model in production - Nordic DS...
Emotion recognition in images: from idea to a model in production - Nordic DS...
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglot
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWS
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
 

More from Sri Ambati

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data Scientist, PayPal

  • 1. Drive Away Fraudsters With Driverless AI
  • 4. Fraud Prevention @ PayPal Robust feature engineering, machine learning and statistical models Highly scalable and multi-layered infrastructure software Superior team of data scientists, researchers, financial and intelligence analysts Images source:
  • 5. Collusion Fraud – An Example Scenario Buyer purchases an item using PayPal Seller ships item to buyer Buyer finds box empty & asks PayPal for refund Images source: PayPal can refund buyer (eligible transactions under Buyer Protection)
  • 6. Collusion Fraud – An Example Scenario PayPal incur loss Seller gives proof Buyer and seller split the money.. Images source: Collusion Fraud PayPal asks seller for proof Repays seller (Seller Protection)
  • 7. © 2017 PayPal Inc. Confidential and proprietary. Collusion Fraud • What if there are many such buyers colluding with many such sellers? • What if such buyers & sellers stay below the radar? ($Transaction < Threshold)? • What if such sellers behave well with majority of the buyers & collude with a few? • How do we detect if the buyers and sellers are legitimate or if they are colluding with each other? 7
  • 8. © 2017 PayPal Inc. Confidential and proprietary. Collusion Fraud 8 • Can we exploit network structure of fraudsters to solve collusion fraud? Buyer Buyer Buyer Buyer IP1 IP2 IP3 Seller1 Seller1 Seller1 Ships empty box Ships good Logs in Purchase item
  • 10. © 2017 PayPal Inc. Confidential and proprietary. Traditional ML vs End-to-End Learning • Traditional ML – Significant human effort in engineering features & labelling • End-to-End Learning – Use algorithm (such as Deep learning) to learn feature representations automatically • need tabular data 10 Raw Data Expert Driven Feature Engineering Features Algorithm Models Raw Data Feature Learning (Representation Learning) Traditional ML End-to-End Learning Images source:
  • 11. © 2017 PayPal Inc. Confidential and proprietary. Approach • Learn features from graphs & use Driverless AI to engineer additional features and build model. • Feature learning on graphs is fairly new research area & its full potential has not been realized. • Graph based feature learning helps to understand fraud network & thus prevent collusion and other organized crimes 11 Models Raw Data Representation Learning (Automatic Feature) Driverless AI (Feature Engineering + Model Training Images source: Expert Engineered Feature Feature Tabular Data
  • 12. © 2017 PayPal Inc. Confidential and proprietary. Approach – Graph Based Representation Learning • Idea • You shall know a word (node) by the company (neigbhor) it keep(s)* (Firth, J. R. 1957) • Word2Vec* - Continuous feature representation for words • Suppose user searches for “hotel”, we want to also match “motel” • One hot representation (discrete) • Build a dense vector to predict other words from context • Two algorithms – Skip Gram (SG) (Predict context words from target) & Continuous Bag of Words (CBOW) (Predict target from context) • Graph Based Representation Learning • Learn continuous feature representation for nodes • Representation incorporates community a node belong to & role they play 12*source: prof. manning’s lecture notes (http://web.stanford.edu/class/cs224n/)
  • 13. © 2017 PayPal Inc. Confidential and proprietary. Algorithm – node2vec*Grover & Leskovec, 2016 13 *source: https://arxiv.org/pdf/1607.00653.pdf • Node2Vec • s1, s2, s3, s4, u - same community • u & s6 also play the role of hub • “neigborhood preserving” graph based objective function optimized using SGD • MLE optimization problem • BFS & DFS sampling to generate neigbhors
  • 14. © 2017 PayPal Inc. Confidential and proprietary. Implementation Framework 14 Raw Events (HDFS) Human Expert Node Representation Feature vector node2vec Graph DB Expert Engineered Feature vector Driverless AI (Feature Engineering + Model Training)
  • 16. © 2017 PayPal Inc. Confidential and proprietary. Datasets 16 • Training Data • Subset of 1 year transactions • 1.5 billion edges & 0.5 million nodes • Test Data • 3 months • # of features • 400 - 600
  • 17. © 2017 PayPal Inc. Confidential and proprietary. Environment &Tools 17 • Node2Vec – Node representation learning • Driverless AI – Feature Engineering & ModelTraining • Spark – Data Preparation/Pre-processing • Hardware – GPU server • 4 Pascal 100 GPU • 160 cores CPUs • 1 TB RAM
  • 18. © 2017 PayPal Inc. Confidential and proprietary. Experiment 18 • Training time (subset of data) – Driverless AI on GPU 6x faster • laptop (accuracy 1) - ~ 2 hours • GPU (accuracy 1) – 21 minutes; (accuracy 5) – 58 minutes
  • 19. © 2017 PayPal Inc. Confidential and proprietary. 19 Experiment • Top 5 variables from DAI • AUC – 0.9477
  • 21. © 2017 PayPal Inc. Confidential and proprietary. Conclusions 21 • Graph based representation learning yield robust feature set for complex fraud patterns such as collusion fraud • Driverless AI not only help to engineer additional features automatically but also significantly improve model training time (under 2 hours). • Journey into DAI just beginning… • Next steps • Evaluate Driverless AI results on out-of-time data sets. • Evaluate Driverless AI directly on raw data • Evaluate representation learning on edges and weighted graphs • Machine learning on graphs
  • 22. © 2017 PayPal Inc. Confidential and proprietary. Acknowledgements 22 Driverless AI Team @ H2O.ai