SlideShare a Scribd company logo
1 of 29
Download to read offline
Randomness 
and 
fraud 
Michael 
Manapat 
@mlmanapat 
Stripe
About 
Stripe 
Feature 
genera6on: 
fraudsters 
are 
“pseudorandom” 
Model 
training: 
“customized” 
random 
forests 
Model 
evalua6on: 
counterfactual 
offline 
evalua6on
What 
is 
Stripe? 
“Full 
stack” 
for 
e-­‐commerce: 
-­‐ 
credit 
cards 
-­‐ 
Checkout 
(APMs: 
Alipay, 
Bitcoin) 
-­‐ 
fraud 
(beta) 
-­‐ 
etc. 
Merchant 
fraud 
Transac6on 
fraud
Fraudsters 
are 
“pseudorandom”
Example 
1: 
“Random” 
e-­‐mail 
addresses 
john.smith123@gmail.com 
elizabeth.jones456@outlook.com 
... 
sdkfsdfsdUjsd@live.com 
hkjhghfghVgj@yahoo.com
What 
features 
detect 
this 
kind 
of 
regularity? 
Distribu6on 
of 
leZer/digit/period/domain 
frequencies 
+ 
measures 
of 
distribu6onal 
difference 
Log-­‐likelihood 
ra6o: 
good 
at 
low 
counts 
Digit 
No 
digit 
Sample 
(p) 
9 
1 
Overall 
(q) 
200,000 
200,000 
Difference 
in 
log-­‐likelihood 
from 
a 
single 
model 
for 
the 
matrix 
vs. 
a 
model 
for 
each 
row
Example 
2: 
Distribu6on 
of 
user 
agents 
-­‐Transform 
so 
that 
it’s 
less 
“condi6onal” 
-­‐Get 
rid 
of 
the 
distribu6on 
en6rely 
(# 
dis6nct 
user 
agents) 
/ 
(# 
dis6nct 
IPs) 
@jvns 
@kelleyrivoire
Distributed 
random 
forest 
learning
At 
each 
node, 
pick 
a 
feature 
X 
and 
a 
value 
v 
Splihng 
on 
X 
< 
v 
should 
minimize 
I(L) 
+ 
I(R) 
-­‐ 
I(D) 
I: 
“Impurity” 
“PLANET”
Trained 
trees 
in 
Python 
with 
scikit, 
but... 
Our 
ETL 
pipeline 
runs 
on 
Hadoop 
and 
writes 
Parquet 
to 
HDFS 
Treatment 
of 
categorical 
variables 
is 
subop6mal 
(“x[1] 
<= 
0.500”) 
No 
customiza6on 
(impurity: 
“gini” 
or 
“entropy”)
“Brushfire” 
@avibryant 
@daniellesucher 
Implemented 
in 
Scala 
(Scalding) 
Distributed 
learning 
approach 
modeled 
on 
Google’s 
PLANET 
paper 
Na6ve 
support 
for 
ordered/ordinal/categorical 
vars 
Highly 
customizable/modular 
(e.g., 
splihng 
func6on)
Customiza6on 
We 
don’t 
necessarily 
want 
to 
maximize 
impurity 
drop 
with 
each 
split 
X: 
1 
2 
3 
4 
Y: 
0 
10 
80 
95 
We 
have 
a 
“split 
budget” 
(arer 
enough 
splits/ 
tree 
levels 
we’ll 
run 
out 
of 
data)
We 
want 
to 
choose 
splits 
so 
we 
improve 
the 
ROC 
curve 
in 
the 
region 
of 
interest 
(even 
at 
the 
expense 
of 
total 
AUC) 
Want 
improvement 
here 
Don’t 
care 
about 
improvement 
here
scikit 
(ler) 
vs. 
brushfire 
(right) 
Fixed 
FPR: 
+7 
percentage 
points 
in 
recall 
in 
region 
of 
interest
Brushfire 
to 
be 
open-­‐sourced 
in 
the 
next 
month 
(Talk 
this 
weekend 
at 
PNW 
Scala)
Counterfactual 
offline 
evaluaHon 
Li, 
Chen, 
Kleban, 
Gupta: 
“Counterfac6onal 
Es6ma6on 
and 
Op6miza6on 
of 
Click 
Metrics 
for 
Search 
Engines”
Every 
conversion 
results 
in 
some 
benefit 
b 
Every 
chargeback 
results 
in 
some 
cost 
c 
Margin 
= 
30%, 
product 
costs 
$10 
Conversion: 
$10 
-­‐ 
$7 
(CGS) 
= 
$3 
Chargeback: 
-­‐$7 
(CGS) 
-­‐ 
$15 
(fee) 
= 
-­‐$22 
The 
rela6ve 
sizes 
of 
b 
and 
c 
determine 
tolerance 
for 
false 
pos6ves 
and 
false 
nega6ves.
Train 
a 
model 
on 
charge 
history 
@ryw90 
Historical 
total 
payoff: 
3b 
– 
c 
# 
Outcome 
Payoff 
1 
Conversion 
b 
2 
Conversion 
b 
3 
Chargeback 
-­‐c 
4 
Conversion 
b
Evaluate 
it 
on 
charge 
history 
Historical 
total 
payoff: 
3b 
– 
c 
Payoff 
with 
model: 
2b 
# 
Outcome 
Payoff 
1 
Conversion 
b 
2 
Conversion 
b 
3 
Disputed 
-­‐c 
4 
Conversion 
b 
Class 
New 
Outcome 
Payoff 
Good 
Conversion 
(TN) 
b 
Good 
Conversion 
(TN) 
b 
Fraud 
Blocked 
(TP) 
0 
Fraud 
Blocked 
(FP) 
0 
c 
– 
b 
> 
0
Model 
evalua6on 
possible 
because 
of 
charge 
log 
without 
interven6ons 
Interve6on 
beZer 
than 
no 
interven6on 
if 
(odds 
of 
fraud) 
x 
(c/b) 
x 
(recall/fpr) 
> 
1 
What 
happens 
with 
the 
next 
model-­‐building 
itera6on?
Where 
does 
the 
new 
training 
data 
come 
from? 
# 
Outcome 
Payoff 
1 
Conversion 
b 
2 
Conversion 
b 
3 
Blocked 
0 
4 
Blocked 
0 
New 
model: 
“good” 
Conversion 
or 
chargeback? 
An 
A/B 
test 
would 
be 
complex/6me-­‐consuming
One 
answer: 
introduce 
randomness 
in 
policy
# 
Score 
Original 
acHon 
P(Block) 
Randomized 
acHon 
Outcome 
Payoff 
1 
5 
Allow 
0.05 
Allow 
Conversion 
b 
2 
20 
Allow 
0.10 
Allow 
Conversion 
b 
3 
10 
Allow 
0.07 
Block 
N/A 
0 
4 
50 
Block 
0.50 
Allow 
Chargeback 
-­‐c 
5 
65 
Block 
0.90 
Allow 
Conversion 
b 
Log 
of 
scores/probabili6es/ac6ons 
Evaluate 
performance 
of 
model 
on 
events 
where 
original 
ac6on 
== 
randomized 
ac6on
...but 
weight 
by 
inverse 
of 
expected 
probability 
# 
Score 
Original 
acHon 
P(allow) 
P(Block) 
Randomized 
acHon 
Outcome 
Payoff 
1 
5 
Allow 
0.95 
0.05 
Allow 
Conversion 
b 
2 
20 
Allow 
0.90 
0.10 
Allow 
Conversion 
b 
(1/0.95)b + (1/0.9)b 
(1/0.95) + (1/0.9) 
Average 
payoff: 
Intui6on: 
If 
the 
ac6on 
has 
a 
probability 
p 
and 
we 
see 
it 
in 
the 
log, 
there 
were 
~1/p 
total 
such 
events 
= b
Similarly 
for 
the 
candidate 
model... 
# 
Score 
Old 
model 
P(Allow) 
P(Block) 
Randomized 
acHon 
Outcome 
Payoff 
New 
model 
2 
20 
Allow 
0.90 
0.10 
Allow 
Conversion 
b 
Allow 
4 
50 
Block 
0.50 
0.50 
Allow 
Chargeback 
-­‐c 
Allow 
5 
65 
Block 
0.10 
0.90 
Allow 
Conversion 
b 
Allow 
(1/0.9)b + (1/0.5)(c) + (1/0.1)b 
(1/0.9) + (1/0.5) + (1/0.1) 
= 0.85b  0.15c
Compute 
the 
expected 
payoff 
offline 
(arbitrarily 
many 
“experiments”) 
Need 
more 
data 
as 
incumbent 
model/policy 
and 
candidate 
diverge 
Propensity 
func6on 
controls 
the 
“exploita6on” 
– 
“explora6on” 
tradeoff 
Keep 
merchant 
experience 
good 
(adds 
bias)
Technical 
issues: 
Propensity 
score 
func6on 
not 
generally 
a 
sigmoid 
Mul6ple 
ac6ons 
Events 
must 
be 
IID
Fraudsters 
generate 
randomness 
in 
non-­‐random 
ways 
(LLR 
good 
at 
low 
counts) 
We 
can 
improve 
our 
random 
forest 
performance 
by 
biasing 
the 
training 
(get 
lir 
where 
you 
need 
it) 
Randomizing 
ac6ons 
in 
produc6on 
makes 
counterfactual 
evalua6on 
easier 
(and 
faster)
Thanks 
mlm@stripe.com 
@mlmanapat 
Machine 
learning 
at 
Stripe: 
Avi 
Bryant 
@avibryant 
Chris 
Wu 
(@chriswu_) 
Dan 
Frank 
@danielhfrank 
Danielle 
Sucher 
@daniellesucher 
Julia 
Evans 
@jvns 
Kelley 
Rivoire 
@kelleyrivoire 
Ryan 
Wang 
@ryw90

More Related Content

Similar to Randomness and fraud

MATH 533 RANK Inspiring Innovation--math533rank.com
MATH 533 RANK Inspiring Innovation--math533rank.comMATH 533 RANK Inspiring Innovation--math533rank.com
MATH 533 RANK Inspiring Innovation--math533rank.comshanaabe65
 
MATH 533 RANK Lessons in Excellence-- math533rank.com
MATH 533 RANK Lessons in Excellence-- math533rank.comMATH 533 RANK Lessons in Excellence-- math533rank.com
MATH 533 RANK Lessons in Excellence-- math533rank.comRoelofMerwe118
 
Math 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersMath 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersNathanielZaleski
 
Math 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersMath 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersDennisHine
 
Math 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersMath 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersBrittneDean
 
Creating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning AlgorithmCreating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning AlgorithmBill Fite
 
Explainable Machine Learning
Explainable Machine LearningExplainable Machine Learning
Explainable Machine LearningBill Fite
 
MATH 533 RANK Achievement Education--math533rank.com
MATH 533 RANK Achievement Education--math533rank.comMATH 533 RANK Achievement Education--math533rank.com
MATH 533 RANK Achievement Education--math533rank.comkopiko162
 
MATH 533 RANK Redefined Education--math533rank.com
MATH 533 RANK Redefined Education--math533rank.comMATH 533 RANK Redefined Education--math533rank.com
MATH 533 RANK Redefined Education--math533rank.comkopiko180
 
Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)Divyansh Dokania
 
Math 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersMath 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersPatrickrasacs
 
Internship_presentation
Internship_presentationInternship_presentation
Internship_presentationAditya Gautam
 
"Test Design Techniques"
"Test Design Techniques" "Test Design Techniques"
"Test Design Techniques" HYS Enterprise
 
Multi-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspectiveMulti-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspectiveGabriele Sottocornola
 
Deepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn WayDeepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn Wayyingfeng
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured predictionzukun
 
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYONDIMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYONDRabi Das
 
Alpine ML Talk: Vtreat: A Package for Automating Variable Treatment in R By ...
Alpine ML Talk:  Vtreat: A Package for Automating Variable Treatment in R By ...Alpine ML Talk:  Vtreat: A Package for Automating Variable Treatment in R By ...
Alpine ML Talk: Vtreat: A Package for Automating Variable Treatment in R By ...Chester Chen
 
MATH 533 Education Specialist / snaptutorial.com
MATH 533 Education Specialist / snaptutorial.comMATH 533 Education Specialist / snaptutorial.com
MATH 533 Education Specialist / snaptutorial.comMcdonaldRyan97
 
Chi square analysis-for_attribute_data_(01-14-06)
Chi square analysis-for_attribute_data_(01-14-06)Chi square analysis-for_attribute_data_(01-14-06)
Chi square analysis-for_attribute_data_(01-14-06)Daniel Augustine
 

Similar to Randomness and fraud (20)

MATH 533 RANK Inspiring Innovation--math533rank.com
MATH 533 RANK Inspiring Innovation--math533rank.comMATH 533 RANK Inspiring Innovation--math533rank.com
MATH 533 RANK Inspiring Innovation--math533rank.com
 
MATH 533 RANK Lessons in Excellence-- math533rank.com
MATH 533 RANK Lessons in Excellence-- math533rank.comMATH 533 RANK Lessons in Excellence-- math533rank.com
MATH 533 RANK Lessons in Excellence-- math533rank.com
 
Math 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersMath 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answers
 
Math 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersMath 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answers
 
Math 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersMath 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answers
 
Creating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning AlgorithmCreating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning Algorithm
 
Explainable Machine Learning
Explainable Machine LearningExplainable Machine Learning
Explainable Machine Learning
 
MATH 533 RANK Achievement Education--math533rank.com
MATH 533 RANK Achievement Education--math533rank.comMATH 533 RANK Achievement Education--math533rank.com
MATH 533 RANK Achievement Education--math533rank.com
 
MATH 533 RANK Redefined Education--math533rank.com
MATH 533 RANK Redefined Education--math533rank.comMATH 533 RANK Redefined Education--math533rank.com
MATH 533 RANK Redefined Education--math533rank.com
 
Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)Engine90 crawford-decision-making (1)
Engine90 crawford-decision-making (1)
 
Math 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answersMath 533 ( applied managerial statistics ) final exam answers
Math 533 ( applied managerial statistics ) final exam answers
 
Internship_presentation
Internship_presentationInternship_presentation
Internship_presentation
 
"Test Design Techniques"
"Test Design Techniques" "Test Design Techniques"
"Test Design Techniques"
 
Multi-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspectiveMulti-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspective
 
Deepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn WayDeepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn Way
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
 
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYONDIMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
 
Alpine ML Talk: Vtreat: A Package for Automating Variable Treatment in R By ...
Alpine ML Talk:  Vtreat: A Package for Automating Variable Treatment in R By ...Alpine ML Talk:  Vtreat: A Package for Automating Variable Treatment in R By ...
Alpine ML Talk: Vtreat: A Package for Automating Variable Treatment in R By ...
 
MATH 533 Education Specialist / snaptutorial.com
MATH 533 Education Specialist / snaptutorial.comMATH 533 Education Specialist / snaptutorial.com
MATH 533 Education Specialist / snaptutorial.com
 
Chi square analysis-for_attribute_data_(01-14-06)
Chi square analysis-for_attribute_data_(01-14-06)Chi square analysis-for_attribute_data_(01-14-06)
Chi square analysis-for_attribute_data_(01-14-06)
 

Recently uploaded

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 

Recently uploaded (20)

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 

Randomness and fraud

  • 1. Randomness and fraud Michael Manapat @mlmanapat Stripe
  • 2. About Stripe Feature genera6on: fraudsters are “pseudorandom” Model training: “customized” random forests Model evalua6on: counterfactual offline evalua6on
  • 3. What is Stripe? “Full stack” for e-­‐commerce: -­‐ credit cards -­‐ Checkout (APMs: Alipay, Bitcoin) -­‐ fraud (beta) -­‐ etc. Merchant fraud Transac6on fraud
  • 5. Example 1: “Random” e-­‐mail addresses john.smith123@gmail.com elizabeth.jones456@outlook.com ... sdkfsdfsdUjsd@live.com hkjhghfghVgj@yahoo.com
  • 6. What features detect this kind of regularity? Distribu6on of leZer/digit/period/domain frequencies + measures of distribu6onal difference Log-­‐likelihood ra6o: good at low counts Digit No digit Sample (p) 9 1 Overall (q) 200,000 200,000 Difference in log-­‐likelihood from a single model for the matrix vs. a model for each row
  • 7. Example 2: Distribu6on of user agents -­‐Transform so that it’s less “condi6onal” -­‐Get rid of the distribu6on en6rely (# dis6nct user agents) / (# dis6nct IPs) @jvns @kelleyrivoire
  • 9. At each node, pick a feature X and a value v Splihng on X < v should minimize I(L) + I(R) -­‐ I(D) I: “Impurity” “PLANET”
  • 10. Trained trees in Python with scikit, but... Our ETL pipeline runs on Hadoop and writes Parquet to HDFS Treatment of categorical variables is subop6mal (“x[1] <= 0.500”) No customiza6on (impurity: “gini” or “entropy”)
  • 11. “Brushfire” @avibryant @daniellesucher Implemented in Scala (Scalding) Distributed learning approach modeled on Google’s PLANET paper Na6ve support for ordered/ordinal/categorical vars Highly customizable/modular (e.g., splihng func6on)
  • 12. Customiza6on We don’t necessarily want to maximize impurity drop with each split X: 1 2 3 4 Y: 0 10 80 95 We have a “split budget” (arer enough splits/ tree levels we’ll run out of data)
  • 13. We want to choose splits so we improve the ROC curve in the region of interest (even at the expense of total AUC) Want improvement here Don’t care about improvement here
  • 14. scikit (ler) vs. brushfire (right) Fixed FPR: +7 percentage points in recall in region of interest
  • 15. Brushfire to be open-­‐sourced in the next month (Talk this weekend at PNW Scala)
  • 16. Counterfactual offline evaluaHon Li, Chen, Kleban, Gupta: “Counterfac6onal Es6ma6on and Op6miza6on of Click Metrics for Search Engines”
  • 17. Every conversion results in some benefit b Every chargeback results in some cost c Margin = 30%, product costs $10 Conversion: $10 -­‐ $7 (CGS) = $3 Chargeback: -­‐$7 (CGS) -­‐ $15 (fee) = -­‐$22 The rela6ve sizes of b and c determine tolerance for false pos6ves and false nega6ves.
  • 18. Train a model on charge history @ryw90 Historical total payoff: 3b – c # Outcome Payoff 1 Conversion b 2 Conversion b 3 Chargeback -­‐c 4 Conversion b
  • 19. Evaluate it on charge history Historical total payoff: 3b – c Payoff with model: 2b # Outcome Payoff 1 Conversion b 2 Conversion b 3 Disputed -­‐c 4 Conversion b Class New Outcome Payoff Good Conversion (TN) b Good Conversion (TN) b Fraud Blocked (TP) 0 Fraud Blocked (FP) 0 c – b > 0
  • 20. Model evalua6on possible because of charge log without interven6ons Interve6on beZer than no interven6on if (odds of fraud) x (c/b) x (recall/fpr) > 1 What happens with the next model-­‐building itera6on?
  • 21. Where does the new training data come from? # Outcome Payoff 1 Conversion b 2 Conversion b 3 Blocked 0 4 Blocked 0 New model: “good” Conversion or chargeback? An A/B test would be complex/6me-­‐consuming
  • 22. One answer: introduce randomness in policy
  • 23. # Score Original acHon P(Block) Randomized acHon Outcome Payoff 1 5 Allow 0.05 Allow Conversion b 2 20 Allow 0.10 Allow Conversion b 3 10 Allow 0.07 Block N/A 0 4 50 Block 0.50 Allow Chargeback -­‐c 5 65 Block 0.90 Allow Conversion b Log of scores/probabili6es/ac6ons Evaluate performance of model on events where original ac6on == randomized ac6on
  • 24. ...but weight by inverse of expected probability # Score Original acHon P(allow) P(Block) Randomized acHon Outcome Payoff 1 5 Allow 0.95 0.05 Allow Conversion b 2 20 Allow 0.90 0.10 Allow Conversion b (1/0.95)b + (1/0.9)b (1/0.95) + (1/0.9) Average payoff: Intui6on: If the ac6on has a probability p and we see it in the log, there were ~1/p total such events = b
  • 25. Similarly for the candidate model... # Score Old model P(Allow) P(Block) Randomized acHon Outcome Payoff New model 2 20 Allow 0.90 0.10 Allow Conversion b Allow 4 50 Block 0.50 0.50 Allow Chargeback -­‐c Allow 5 65 Block 0.10 0.90 Allow Conversion b Allow (1/0.9)b + (1/0.5)(c) + (1/0.1)b (1/0.9) + (1/0.5) + (1/0.1) = 0.85b 0.15c
  • 26. Compute the expected payoff offline (arbitrarily many “experiments”) Need more data as incumbent model/policy and candidate diverge Propensity func6on controls the “exploita6on” – “explora6on” tradeoff Keep merchant experience good (adds bias)
  • 27. Technical issues: Propensity score func6on not generally a sigmoid Mul6ple ac6ons Events must be IID
  • 28. Fraudsters generate randomness in non-­‐random ways (LLR good at low counts) We can improve our random forest performance by biasing the training (get lir where you need it) Randomizing ac6ons in produc6on makes counterfactual evalua6on easier (and faster)
  • 29. Thanks mlm@stripe.com @mlmanapat Machine learning at Stripe: Avi Bryant @avibryant Chris Wu (@chriswu_) Dan Frank @danielhfrank Danielle Sucher @daniellesucher Julia Evans @jvns Kelley Rivoire @kelleyrivoire Ryan Wang @ryw90