More Related Content Similar to ATC302_How to Leverage AWS Machine Learning Services to Analyze and Optimize Your Google DoubleClick Campaign Manager Data at Scale (20) More from Amazon Web Services (20) ATC302_How to Leverage AWS Machine Learning Services to Analyze and Optimize Your Google DoubleClick Campaign Manager Data at Scale1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
How to Leverage AWS Machine Learning
Services to Analyze and Optimize Your
Google DoubleClick Campaign Manager
Data at Scale
A b r a h a m B a g h e r j e i r a n , A d v e r t i s i n g S c i e n c e T e c h L e a d , A m a z o n A 9
S h a s h i P r a b h a k a r , M a n a g e r , S o l u t i o n s A r c h i t e c t
V i j a y S a t h i s h , S o l u t i o n s B u i l d e r , S o l u t i o n s A r c h i t e c t
A T C 3 0 2
N o v e m b e r 2 7 , 2 0 1 7
2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
We have plenty of data—but we now need
insights
3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Audience
targeting
Channel
attribution
How are digital
advertising customers
using AWS
for machine learning?
Lookalike
modeling
Click fraud
detection
Traffic
shaping
Campaign
pacing
Multi-channel
optimization
Bid evaluation
Identity
enrichment
Machine learning is core to digital
advertising…
4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data processing
is time consuming
Without the right tools, preparation of digital advertising data sets
(sparse data, weak signals) into machine trainable formats can be
laborious
Scaling infrastructure
poses challenges
Provisioning infrastructure to handle multi-PB digital advertising data
sets can pose challenges
ML expertise
is scarce
Many digital advertising customers are early in establishing data
science capability—or existing resources are thinly stretched
…But getting started can be challenging
5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS handles the undifferentiated data processing
tasks—letting you focus on model development
AWS enables rapid scale up/down of compute
resources—reducing time and cost of experimentation
AWS provides simple frameworks to stand up ML
workflows—helping you get started quickly
Data processing
is time consuming
Scaling infrastructure
poses challenges
ML expertise
is scarce
AWS simplifies your machine learning
workflow
6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data
understanding
Data
preparation
ModelingEvaluation
Business
understanding
Deployment
CRISP-DM:
Cross-Industry
Standard Process
for Data Mining
AWS enables end-to-end data science
process
7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Overview: Using AWS to simplify your machine learning workflows1
2 Demo: Sample ML workflow on AWS
3 Customer use case: Amazon A9 and lookalike modeling walkthrough
How we will help today
8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Google DoubleClick Campaign Manager data
• Activity
• Impression
Data transformation
• csv.gz to parquet format
Workflow for Machine Learning
• Create Data Features
• Train a model to predict user conversion
• Model evaluation and iteration
Dataset and workflow of machine learning
9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity
• Activity ID #unique id per conversion
• User ID #specific user who clicked through and purchased
• Site ID (DCM) # site on which ad was viewed
• Browser/Platform ID #browser used (e.g. Chrome)
• Operating system ID #Operating system used (e.g. Mac OS)
• Ad ID #Advertisement ID
Impression
• Event time # time that the impression – viewing of ad - occurred
• User ID #specific user who viewed the ad
• Site ID (DCM) #site on which ad was viewed
• Browser/Platform ID #browser used (e.g. Chrome)
• Operating system ID #Operating system used (e.g. Mac OS)
DoubleClick data fields
10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data feature
engineering
for ML
training
Ad-hoc
data
exploration
AWS
Glue
AWS Glue Amazon Athena Amazon
QuickSight
Amazon EMR Spark ML
Amazon S3
Automated ETL
processing
Serverless
query
Visualization
Explore and visualize
your raw data directly
from Amazon S3
Rapidly create new
data features for
experimentation
Data lake
Amazon S3
Scalable infrastructure
for data processing
Data lake
AWS handles undifferentiated data
processing tasks in ML workflow
11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS
Glue
1 2 3 4Store your raw
DoubleClick log
data in S3
Use Athena to
perform serverless
queries of your raw
data with standard
SQL
QuickSight
produces rapid
visualizations
Glue automates
cumbersome ETL
tasks
Data exploration/visualization through
SQL querying
12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo
Data exploration and visualization
13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS
Glue
1 2 3 4Store your raw
DoubleClick log
data in S3
Use Athena to
perform serverless
queries of your raw
data with standard
SQL
QuickSight
produces rapid
visualizations
Glue automates
cumbersome ETL
tasks
Data exploration/visualization through
SQL querying
14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EMR Spark EMR Spark
(MLLib)
S3: Machine trainable
data
S3: Raw
data
Data preparation and feature
engineering
Model training and evaluation
Data science notebook
interface
Data feature engineering
15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo
Data feature engineering
16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EMR Spark EMR Spark
(MLLib)
S3: Machine trainable
data
S3: Raw
data
Data preparation and feature
engineering
Model training and evaluation
Data science notebook
interface
Data feature engineering
17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Abraham Bagherjeiran
Advertising Science Tech Lead, Amazon A9
18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EMR Spark EMR Spark
(MLLib)
S3: Machine trainable
data
S3: Raw
data
Data preparation and feature
engineering
Model training and evaluation
Data science notebook
interface
Data feature engineering
19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Advertising decision problem
20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Advertising decision problem
Targeting
Selection Bidding
21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Advertising decision problem
Targeting
22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ad targeting as triggers
Function applied to user returns trueIf
User can be targeted with adThen
= Set of all ads triggered for the user
23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Types of ad targeting
Triggered always for a long-lived property of a user; for example,
age, gender, declared interests.State
Behavior Triggered on activity in the past; for example, visit advertiser site,
frequent visits to category of sites
Context Triggered on current activity; for example, reading news, playing a
game, listening to music while jogging at night in a park
Triggered on activity in the future; for example, likely to buy a
product, will be a high-value customerPredictive
24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Predictive targeting
Target
Users
Reached
Users
Targetable Population
25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Predictive targeting
Target
UsersReached
Users
Targetable Population
26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lookalike modeling is predictive targeting
Reached users “look like” target usersSimilarity
Classification Reached users “perform like” target users
• How to define similarity?
• How does similarity translate to advertiser performance?
• Best fit: High-value customer list.
• Solves directly for advertiser performance
• Easily translated into revenue metrics
• Best fit: Ad-related activities—purchases, leads, signups
27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Classification-based lookalikes
Targeting Trigger
If Pr[user has activity within n days | user] > threshold
Representation: Event history for the userUser
Days Attribution window: How long to check for targets
Classifier: Trained ML model to predict probability valueProbability
Threshold Control: Guarantees on performance targets and cost
28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data processing architecture
Amazon
EMR
Spark
EMR Spark
(MLLib)
S3: Machine trainable
data
Amazon S3:
Raw data
S3: User List
$
Import into Campaign
Manager and run ad
campaign
EMR: Score
and apply
threshold
Training
Scoring
(weekly)
(daily)
S3: Model
29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
User representation
{"event_time":"1472689299702272","user_id":"U1","advertiser_id":"A1","campaign_id":"C1","ad_id":"Ad1","rendering_id":"R1","creative_version":"1","site_id_(dc
m)":"S1","placement_id":"P1","country_code":"US","state/region":"MA","browser/platform_id":"27","browser/platform_version":"0.0","operating_system_id":"7","de
signated_market_area_(dma)_id":"8","city_id":"17311","zip/postal_code":"02190","event_type":"VIEW","event_sub-type":"VIEW","partner1_id":"P1"}
{"event_time":"1472689535977600","user_id":"U1","advertiser_id":"A1","campaign_id":"C1","ad_id":"Ad2","rendering_id":"R2","creative_version":"1","site_id_(dc
m)":"S1","placement_id":"P2","country_code":"US","state/region":"MA","browser/platform_id":"27","browser/platform_version":"0.0","operating_system_id":"7","de
signated_market_area_(dma)_id":"8","city_id":"17311","zip/postal_code":"02190","event_type":"VIEW","event_sub-type":"VIEW","partner1_id":"P1"}
{"event_time":"1472689585099264","user_id":"U1","advertiser_id":"A1","campaign_id":"C1","ad_id":"Ad2","rendering_id":"R2","creative_version":"1","site_id_(dc
m)":"S1","placement_id":"P2","country_code":"US","state/region":"MA","browser/platform_id":"27","browser/platform_version":"0.0","operating_system_id":"7","de
signated_market_area_(dma)_id":"8","city_id":"17311","zip/postal_code":"02190","event_type":"VIEW","event_sub-type":"VIEW","partner1_id":"P1"}
{"event_time":"1472689620020480","user_id":"U1","advertiser_id":"A1","campaign_id":"C1","ad_id":"Ad3","rendering_id":"R3","creative_version":"1","site_id_(dc
m)":"S1","placement_id":"P3","country_code":"US","state/region":"MA","browser/platform_id":"27","browser/platform_version":"0.0","operating_system_id":"7","de
signated_market_area_(dma)_id":"8","city_id":"17311","zip/postal_code":"02190","event_type":"VIEW","event_sub-type":"VIEW","partner1_id":"P1"}
{“user_id”:"U1", ”ad_view_Ad1”:1,”ad_view_Ad2”:2, “ad_view_Ad3”:1, ”site_view_S1”:4,”state_MA”:4,”
browser_27”:4, “dma_8”:4}
Many ad events per userGroup
Bag of Events single record per userReduce
30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
USER_ID = 'User ID’
events = spark.read.csv(source, header=True)
.filter(col(USER_ID) != "0")
.repartition(1000).persist()
user_history = events.groupBy(USER_ID)
Code snippet: Group by user
31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Attribution window: Identifying target
users
Training Period Target Period
Blind
Period
Days relative to target time
0 2 3 … K-N -1-2-3-4…
Training Accumulate user history
Count activities within this periodTarget
Blind Data latency between event and feedback
32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Classification model: Setting the class
labels
Users with at least one activity in the target periodPositive
Users with no activity in the target periodNegative
Other variations:
• Count-regression: Total number of activities completed
• Ordinal regression: Ordered sequence of activities
• Delay prediction: Predict delay to the first activity
33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ACTIVITY_ID = 'Activity ID’
TARGET_ACTIVITY_ID = 12345
def create_labels(histories, activities):
"""
Create labels
:param histories: user histories during training period
:param activities: activities during target period
:return: create labels based on user activity during target period
"""
logger.info('create labels by joining impressions and activities')
target_activities = activities.withColumn('label', (col(ACTIVITY_ID) == TARGET_ACTIVITY_ID))
target_activities_by_user = target_activities.groupBy(USER_ID).agg(max('label').alias('label'))
labels = histories.join(target_activities_by_user, USER_ID, 'left_outer')
return labels
Code snippet: Setting labels, training
model
34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Evaluation
Reach
Precision
Proportion of targetable users eligible to see ad
Total Budget = Reach * Frequency CapReach
Probability that reached user performs the activityPrecision
35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Before you start: Data checks
Check #1: Users have
enough history (days) in
exploration ad campaign.
<mat pl ot l i b. axes. _subpl ot s. AxesSubpl ot at 0x10d7ba7f 0>
Day of exploration campaign
36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Before you start: Data checks
Check #2: How many
different browser and OS
types were there?
Day of exploration campaign
Out [ 2] : <mat pl ot l i b. axes. _subpl ot s. AxesSubpl ot at 0x116aaaf d0>
2.2 OS
I n [ 3] : oss = [ ]
f or i i n r ange( 1, 31) :
dat e=' 201609%02d' % i
pr ef i x = ' r esul t s/ f eat ur e_set s/ %s/ %s/ par t ' % ( dat e, OS)
3
os = get _pd_f r om_s3_csv( bucket _name, pr ef i x, compr essi on=None)
oss. append( { ' dat e' : dat e, ' count ' : os. shape[ 0] } )
df = pd. Dat aFr ame( oss)
df [ ' dat e' ] = pd. t o_dat et i me( df [ ' dat e' ] )
df . set _i ndex( ' dat e' ) . pl ot ( t i t l e=' os' )
Out [ 3] : <mat pl ot l i b. axes. _subpl ot s. AxesSubpl ot at 0x116d98eb8>
Day of exploration campaign
37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Before you start: Data checks
Check #3: How many
different sites and ads?
Day of exploration campaign
Day of exploration campaign
2.5 Co-occurrence of features with class
• Lebel positive and negative class based on Activity for each user
I n [ 6] : def cr eat e_segment s( act i vi t i es) :
act i vi t i es. gr oupBy( ACTI VI TY_I D) . count ( ) . sor t ( desc( ' count ' ) ) . sh
segment = act i vi t i es. wi t hCol umn( ' t ag' , ( col ( ACTI VI TY_I D) == ' 3
max( ' t ag' ) . al i as( ' t ag' ) )
r et ur n segment
2.5.1 Total
I n [ 15] : f r eq_t ot al _20160901 = get _pd_f r om_s3_j son( bucket _name, ' f eat ur es/
di spl ay( f r eq_t ot al _20160901)
pos = f r eq_t ot al _20160901. quer y( ' l abel ==1' ) [ ' count ' ] . val ues[ 0]
neg = f r eq_t ot al _20160901. quer y( ' l abel ==0' ) [ ' count ' ] . val ues[ 0]
t ot al _r at i o = ( pos / ( pos + neg) )
pr i nt ( ' post i ve r at i o: ' , t ot al _r at i o)
count l abel
0 43426343 0
6
2.4 Sites
38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Before you start: Targetable population
Use event attributes with
medium cardinality and high
coverage
Feature Types
• Sites: 25
• Browser: 24
• OS: 13
• Ads: 180
39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Evaluating a lookalike segment
3X precision lift
by targeting
top 10%
40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS helps with lookalike modeling
Data stays within AWS during processing, reducing transfer
costsAmazon S3
Spin up cluster to run each processing stepAmazon EMR
Automate lookalike modeling process dailyAWS Data
Pipeline
41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Making this better
Build your own programmatic bidder on AWSAmazon EC2
Real-time scoring and updates to lookalike modelAmazon Kinesis
Real-time notifications on user activity to update model
Amazon SNS
User history store updated in real timeAmazon
DynamoDB
42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ready to start building?
Devising an ML implementation plan
Building a business use case
Choosing the right data science tools
1 Start ML experimentation today Explore ML on AWS options2
ML for digital advertising framework with
sample ML model for predicting user
conversion:
Sample DoubleClick Campaign
Manager data set
Pre-populated data science notebook to
guide you through structured process
ML for digital advertising discovery
workshop to help dive deeper on:
Learn more at
https://aws.amazon.com/digital-marketing/ml/
43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!