We stand on the shoulder of giants when it comes to using the latest and greatest machine learning libraries. However, not much is said about how to deploy and monitor your beautiful model when it's out in production. I want to talk about the successes and potential perils of building real time machine learning solutions, discuss machine learning in general for the non-technical and discuss the architecture and approach we've taken at Ravelin to use machine learning to stop fraud.
2. what actually is fraud
architecting flexible data ‘plumbing’
building solid data products on top of them
3. stephen whitworth
2 years at Hailo as data scientist/jack of some trades out of
university
product and marketplace analytics, agent based
modelling, data engineering, ‘ML’ services
data science/engineering at ravelin, specifically
focused on our detection capabilities
4. what is ravelin?
online fraud detection and prevention platform
stream application/server data to our events API
we give fraud probability + beautiful data visualisation
backed by techstars/passion/playfair/amadeus/indeed.com
founder/wonga founder amongst other great investors
12. receive firehose through API
decode arbitrary data and store
extract hundreds of features
http/slack/whatever notification to customer
in 100-300ms (ish)
run through N models and rule engine to get probability
17. postgres: solid, start here
dynamodb: very high throughput, low latency data
bigquery: to answer any question you could possibly have
elasticsearch: rich querying in a reasonable amount of time
graph db: haven’t decided, recommendations?
18. asynchronous systemsfirehoses
nice deployment patterns
‘lambda architecture’ - the append only log
services store their own interpretation of events
services are almost entirely decoupled
21. ‘a random forest is like a room full of
experts who have seen different
cases of fraud from different
perspectives’
22. ‘a random forest is like a room full of
experts who have seen different
cases of fraud from different
perspectives’
N
23. precision: of all of my predictions, what % was I correct?
recall: out of all of the fraudsters, what % did I catch?
implicit tradeoff between conversion and fraud loss
‘accuracy’ a useless metric for fraud