20. 20
Credit Card Fraud Example
• What are we trying to predict?
– This is the Label or Target outcome:
– Fraud or Not Fraud
• What are the “if questions” or properties we can use to predict?
– These are the Features:
– Is the amount spent today > historical daily average?
– Are there transactions close in time at locations far apart ?
– Are the number of transactions today > historical average?
– Are there new state or foreign purchases?
Credit Card Transaction Features
Number of Transactions last 24 hours
Total $ Amount last 24 hours
Average Amount last 24 hours
Average Amount last 24 hours compared to historical use
Location and Time difference since Last Transaction
Average transaction
fraud risk of merchant type
Merchant types for day compared to historical use
Features derived
From Transaction
History
21. 21
Decision Tree For Classification
• Tree of decisions about features
• Estimates IF THEN ELSE questions
• Gives probability of a correct decision
Is the amount spent in 24 hours >
average
Is the number of
states used from > 2
Are there multiple
Purchases today from
risky merchants?
YES NO
NoYES
Fraud
90%
Not Fraud
50%
Fraud
90%
Not Fraud
30%
YES No
25. 25
Label:
House Price
Y
X
Feature: house size
(square meters)
Data point: price, size
House Price = intercept + coefficient * house size
y = a + bx
House Price Regression Example
55. 55
GPUs have been responsible for the advancement of deep learning in the
past several years
https://developer.nvidia.com/deep-learning-software
56. 56
cuDF cuIO
Analytics
Data Preparation VisualizationModel Training
cuML
Machine Learning
cuGraph
Graph Analytics
PyTorch, TensorFlow,
MxNet
Deep Learning
cuxfilter, pyViz,
plotly
Visualization
Dask
GPU Memory
RAPIDS
End-to-End Accelerated GPU Data Science
https://developer.nvidia.com/blog/building-an-accelerated-data-science-ecosystem-rapids-hits-two-years/
57. 57
� All systems utilize the same memory format
� No overhead for cross-system communication
� Projects can share functionality (eg, Parquet-to-
Arrow reader)
Source: From Apache Arrow Home Page - https://arrow.apache.org/
GPU DataFrame and Apache Arrow
60. 60
BENEFITS OF GPU ACCELERATED SPARK 3.0
Accelerate data science pipelines without code changes
One pipeline, from ingest to data prep to training
Data preparation and model training are both GPU-accelerated
Infrastructure is consolidated and simplified