SlideShare a Scribd company logo
1 of 24
Download to read offline
Algorithmic Digital
Attribution Using Spark
Anny (Yunzhu) Chen and William (Zhenyu) Yan
Adobe
Agenda
•  Digital attribution
•  Algorithmic digital attribution
•  Spark implementation
•  Lessons learned
Path of media touches to conversion
Display"
• See a display
ads of Adobe
Creative Cloud"
Paid Search"
• Search the term
“Creative Cloud”
and click the
promoted link"
Organic
Search"
• Search the term
“Adobe Creative
Cloud” and click
an organic link"
Signup"
• Signup for a free
trial"
Email"
• Receive an email
from Adobe"
Subscribe to
Adobe Creative
Cloud
(positive)
Not subscribe to
Adobe Creative
Cloud
(negative)
A customer may receive various kinds of media touch points before deciding
whether to subscribe to a product (conversion) or not
What’s digital attribution?
4
•  A digital attribution model determines how credits for conversions are assigned to
media touch points
•  It is quite important in performance monitoring and budget planning
Display"
• Saw a
display ads
of Adobe
Creative
Cloud"
• 0.1"
Paid Search"
• Search the
term
“Creative
Cloud” and
click the
promoted
link"
• 0.2"
Organic
Search"
• Search the
term “Adobe
Creative
Cloud” and
click an
organic link"
• 0.2"
Signup"
• Signup for a
free trial"
• 0.2"
Email"
• Received an
email from
Adobe"
• 0.3"
Convert!
Digital attribution model at a glance
Models!
Consider all
media touch
points!
Time decay!
Data driven!
(vs. rule based)!
Last interaction, first interaction,
last paid search click!
no" no" no"
Linear! yes" no" no"
Time decay! yes" yes" no"
Position based! yes" no" no"
Algorithmic! yes" yes " yes"
Algorithmic Attribution
•  Rule based: predetermined weights based on rules
•  Algorithmic: machine learning or statistical models are used to
determine the weights
6
Algorithmic Attribution Modeling
The attribution model is based on a combination of
•  Distributed-lag econometric model
•  Discrete-time survival model
Some highlights:
•  The basic idea is to compare the media touches in positive paths vs. those in negative paths and thus
infer their effects
•  Logistic link function
•  Time decay parameters to address time-decaying of media effects
•  Fit media touch effects and decay parameters simultaneously
•  Constraints on coefficients (combining with rules)
•  Stratified sampling
•  Bias reduction:
–  Using control variables, such as duration of exposure
–  Causal modeling
•  Maximum likelihood estimation
Tokenization
•  Tokenization is used to group neighboring events together
•  why is tokenization needed?
Tokenization
DISPLAY" DISPLAY" DISPLAY" DISPLAY" DISPLAY"
DISPLAY"
•  The effect of a media touch is hardly doubled or tripled if it is sent repeatedly
to a customer twice or three times in a short period
•  Tokenization is used to group neighboring events together
•  If you see the same advertisement 5 times in one
hour, will you be impressed 5 times or just have the
impression “I saw that ad”?"
Why Spark?
Raw data"
Big data
tools"
Hive, Pig,
Impala,
Presto…"
Big data
machine
learning tool"
Spark" Mahout"
Python, R,
Java, C++
…"
Big data Small data
Rule based Algorithmic
Iterative update
Data process and stitching
Event"
•  Media touch events; conversion events; other user behavior events"
•  Data fields: customer id , time stamp, event type, campaign id and etc."
Path"
•  Stitch the events of the same customer by “id” "
•  Generate both positive paths and negative paths"
Model"
•  Each path is a record"
•  Positive/negative label is the outcome"
Algorithm evolution
First model"
•  Fix time decay"
•  Logistic regression using
Mllib logistic regression with
SGD"
Second model"
•  Include both regression
coefficients and decay
parameters in the model"
•  Optimize alternatively in
each iteration"
•  Customize and extend the
original Mllib logistic
regression"
Third model "
•  Second model + Causal
modeling (matched sample)"
Implementation
architecture
Data storage: s3"
Computing platform: Spark
standalone cluster on AWS (Amazon
Web Service)"
Monitor: web UI provided by Spark standalone
cluster"
Server side encryption
Bastion host
Model building"
Data processing and tokenization"
Generate paths"
Parallel algorithm"
Save model to S3"
Attribution"
Data processing and tokenization"
Retrieve model from S3"
Generate positive paths"
Attribution with model"
Implementation
Detailed Workflow
Lessons learned
•  Memory management
–  Each record was transformed from String-based to Byte-based using a hash
function
•  Tremendously increased the speed
•  Reduced shuffle size in the next groupBy step
Lessons learned
•  Memory management
–  Write separate classes for model training and attribution computation
•  Model training needs complex transformation and intensive iteration
•  Attribution computation needs more information from each objects
Lessons Learned (cont’d)
•  Cache before iterative computation
–  Only cache the RDDs right before entering the model
–  Unpersist unused RDDs to save space
•  Clear jobs of stopped apps in workers from time to time
–  Check and clean ~/spark/work folder in workers
–  Check and clean /mnt/spark folder in workers if necessary
•  Be careful when dealing with unserialized Java objects
Lessons Learned (cont’d)
•  Errors of one line of code doesn’t necessarily come from that line
–  Spark is lazy evaluation
–  Errors arise at action step, but may come from the previous
transformation steps
•  Adjustment of step size
–  Time decay must be between 0 and 1
–  After the matching, sample size decreased dramatically, the step
size has to be adjusted accordingly
Thank you!

More Related Content

What's hot

OpenStack Backup, Restore, DR (Freezer)
OpenStack Backup, Restore, DR (Freezer)OpenStack Backup, Restore, DR (Freezer)
OpenStack Backup, Restore, DR (Freezer)Saad Zaher
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan BlueDatabricks
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkIlya Ganelin
 
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon GershinskyDatabricks
 
Twitter 與 ELK 基本使用
Twitter 與 ELK 基本使用Twitter 與 ELK 基本使用
Twitter 與 ELK 基本使用Mark Dai
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Julien Le Dem
 
Background Tasks Without a Separate Service: Hangfire for ASP.NET - KCDC - Ju...
Background Tasks Without a Separate Service: Hangfire for ASP.NET - KCDC - Ju...Background Tasks Without a Separate Service: Hangfire for ASP.NET - KCDC - Ju...
Background Tasks Without a Separate Service: Hangfire for ASP.NET - KCDC - Ju...Matthew Groves
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...
ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...
ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...XinliShang1
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingYaroslav Tkachenko
 
Elastic Cloud Enterprise @ Cisco
Elastic Cloud Enterprise @ CiscoElastic Cloud Enterprise @ Cisco
Elastic Cloud Enterprise @ CiscoElasticsearch
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueAmazon Web Services
 
When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...
When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...
When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...confluent
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Databricks
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo SanchezGoDataDriven
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbtSiligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbtJon Su
 
Building Fullstack Graph Applications With Neo4j
Building Fullstack Graph Applications With Neo4j Building Fullstack Graph Applications With Neo4j
Building Fullstack Graph Applications With Neo4j Neo4j
 

What's hot (20)

OpenStack Backup, Restore, DR (Freezer)
OpenStack Backup, Restore, DR (Freezer)OpenStack Backup, Restore, DR (Freezer)
OpenStack Backup, Restore, DR (Freezer)
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan Blue
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
 
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 
Twitter 與 ELK 基本使用
Twitter 與 ELK 基本使用Twitter 與 ELK 基本使用
Twitter 與 ELK 基本使用
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020
 
Background Tasks Without a Separate Service: Hangfire for ASP.NET - KCDC - Ju...
Background Tasks Without a Separate Service: Hangfire for ASP.NET - KCDC - Ju...Background Tasks Without a Separate Service: Hangfire for ASP.NET - KCDC - Ju...
Background Tasks Without a Separate Service: Hangfire for ASP.NET - KCDC - Ju...
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...
ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...
ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
Elastic Cloud Enterprise @ Cisco
Elastic Cloud Enterprise @ CiscoElastic Cloud Enterprise @ Cisco
Elastic Cloud Enterprise @ Cisco
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...
When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...
When Kafka Meets the Scaling and Reliability needs of World's Largest Retaile...
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
 
Data science at OLX
Data science at OLXData science at OLX
Data science at OLX
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbtSiligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
 
Building Fullstack Graph Applications With Neo4j
Building Fullstack Graph Applications With Neo4j Building Fullstack Graph Applications With Neo4j
Building Fullstack Graph Applications With Neo4j
 

Viewers also liked

Advanced attribution model
Advanced attribution model Advanced attribution model
Advanced attribution model Aspa Lekka
 
How To Make Your Marketing Match Your Reality (#mozcon 2015)
How To Make Your Marketing Match Your Reality (#mozcon 2015)How To Make Your Marketing Match Your Reality (#mozcon 2015)
How To Make Your Marketing Match Your Reality (#mozcon 2015)Dana DiTomaso
 
How to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 DayHow to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 DayPhillip Law
 
Multi touch attribution & attribution modeling - GAUC Sydney Melbourne - 2013...
Multi touch attribution & attribution modeling - GAUC Sydney Melbourne - 2013...Multi touch attribution & attribution modeling - GAUC Sydney Melbourne - 2013...
Multi touch attribution & attribution modeling - GAUC Sydney Melbourne - 2013...Vinoaj Vijeyakumaar
 
Attribution Modeling - Case Study
Attribution Modeling - Case StudyAttribution Modeling - Case Study
Attribution Modeling - Case StudyConcur
 
Attribution Super Modeling with Google Analytics
Attribution Super Modeling with Google AnalyticsAttribution Super Modeling with Google Analytics
Attribution Super Modeling with Google Analyticselliottkoppel
 
GAUC 2017 Workshop Attribution with Google Analytics: Peter Falcone (Google)
GAUC 2017 Workshop Attribution with Google Analytics: Peter Falcone (Google)GAUC 2017 Workshop Attribution with Google Analytics: Peter Falcone (Google)
GAUC 2017 Workshop Attribution with Google Analytics: Peter Falcone (Google)e-dialog GmbH
 
Introduction to Google Analytics
Introduction to Google AnalyticsIntroduction to Google Analytics
Introduction to Google AnalyticsDana DiTomaso
 
Webinar: Survival Analysis for Marketing Attribution - July 17, 2013
Webinar: Survival Analysis for Marketing Attribution - July 17, 2013Webinar: Survival Analysis for Marketing Attribution - July 17, 2013
Webinar: Survival Analysis for Marketing Attribution - July 17, 2013Revolution Analytics
 
Operational Attribution with Google Analytics
Operational Attribution with Google AnalyticsOperational Attribution with Google Analytics
Operational Attribution with Google AnalyticsJonathan Breton
 
Attribution Modeling and Big Data, Google
Attribution Modeling and Big Data, GoogleAttribution Modeling and Big Data, Google
Attribution Modeling and Big Data, GoogleInnovation Enterprise
 
Markov model for the online multichannel attribution problem
Markov model for the online multichannel attribution problemMarkov model for the online multichannel attribution problem
Markov model for the online multichannel attribution problemadavide1982
 
Webinar: Improve Campaign Results with Multi-Channel Funnels and Acquisio Att...
Webinar: Improve Campaign Results with Multi-Channel Funnels and Acquisio Att...Webinar: Improve Campaign Results with Multi-Channel Funnels and Acquisio Att...
Webinar: Improve Campaign Results with Multi-Channel Funnels and Acquisio Att...Acquisio
 
Red Door Interactive: Contribution-Attribution-Mix, Oh My! Creating Content f...
Red Door Interactive: Contribution-Attribution-Mix, Oh My! Creating Content f...Red Door Interactive: Contribution-Attribution-Mix, Oh My! Creating Content f...
Red Door Interactive: Contribution-Attribution-Mix, Oh My! Creating Content f...AMASanDiego
 
Marketing Attribution 101: Understanding Attribution and Calculating Cost of ...
Marketing Attribution 101: Understanding Attribution and Calculating Cost of ...Marketing Attribution 101: Understanding Attribution and Calculating Cost of ...
Marketing Attribution 101: Understanding Attribution and Calculating Cost of ...Kissmetrics on SlideShare
 

Viewers also liked (16)

Advanced attribution model
Advanced attribution model Advanced attribution model
Advanced attribution model
 
How To Make Your Marketing Match Your Reality (#mozcon 2015)
How To Make Your Marketing Match Your Reality (#mozcon 2015)How To Make Your Marketing Match Your Reality (#mozcon 2015)
How To Make Your Marketing Match Your Reality (#mozcon 2015)
 
How to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 DayHow to Build an Attribution Solution in 1 Day
How to Build an Attribution Solution in 1 Day
 
Multi touch attribution & attribution modeling - GAUC Sydney Melbourne - 2013...
Multi touch attribution & attribution modeling - GAUC Sydney Melbourne - 2013...Multi touch attribution & attribution modeling - GAUC Sydney Melbourne - 2013...
Multi touch attribution & attribution modeling - GAUC Sydney Melbourne - 2013...
 
Attribution Modeling - Case Study
Attribution Modeling - Case StudyAttribution Modeling - Case Study
Attribution Modeling - Case Study
 
Attribution Super Modeling with Google Analytics
Attribution Super Modeling with Google AnalyticsAttribution Super Modeling with Google Analytics
Attribution Super Modeling with Google Analytics
 
GAUC 2017 Workshop Attribution with Google Analytics: Peter Falcone (Google)
GAUC 2017 Workshop Attribution with Google Analytics: Peter Falcone (Google)GAUC 2017 Workshop Attribution with Google Analytics: Peter Falcone (Google)
GAUC 2017 Workshop Attribution with Google Analytics: Peter Falcone (Google)
 
Web Analytics Attribution
Web Analytics AttributionWeb Analytics Attribution
Web Analytics Attribution
 
Introduction to Google Analytics
Introduction to Google AnalyticsIntroduction to Google Analytics
Introduction to Google Analytics
 
Webinar: Survival Analysis for Marketing Attribution - July 17, 2013
Webinar: Survival Analysis for Marketing Attribution - July 17, 2013Webinar: Survival Analysis for Marketing Attribution - July 17, 2013
Webinar: Survival Analysis for Marketing Attribution - July 17, 2013
 
Operational Attribution with Google Analytics
Operational Attribution with Google AnalyticsOperational Attribution with Google Analytics
Operational Attribution with Google Analytics
 
Attribution Modeling and Big Data, Google
Attribution Modeling and Big Data, GoogleAttribution Modeling and Big Data, Google
Attribution Modeling and Big Data, Google
 
Markov model for the online multichannel attribution problem
Markov model for the online multichannel attribution problemMarkov model for the online multichannel attribution problem
Markov model for the online multichannel attribution problem
 
Webinar: Improve Campaign Results with Multi-Channel Funnels and Acquisio Att...
Webinar: Improve Campaign Results with Multi-Channel Funnels and Acquisio Att...Webinar: Improve Campaign Results with Multi-Channel Funnels and Acquisio Att...
Webinar: Improve Campaign Results with Multi-Channel Funnels and Acquisio Att...
 
Red Door Interactive: Contribution-Attribution-Mix, Oh My! Creating Content f...
Red Door Interactive: Contribution-Attribution-Mix, Oh My! Creating Content f...Red Door Interactive: Contribution-Attribution-Mix, Oh My! Creating Content f...
Red Door Interactive: Contribution-Attribution-Mix, Oh My! Creating Content f...
 
Marketing Attribution 101: Understanding Attribution and Calculating Cost of ...
Marketing Attribution 101: Understanding Attribution and Calculating Cost of ...Marketing Attribution 101: Understanding Attribution and Calculating Cost of ...
Marketing Attribution 101: Understanding Attribution and Calculating Cost of ...
 

Similar to Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, Adobe)

Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1Bill Liu
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02BIWUG
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman
 
Alex mang patterns for scalability in microsoft azure application
Alex mang   patterns for scalability in microsoft azure applicationAlex mang   patterns for scalability in microsoft azure application
Alex mang patterns for scalability in microsoft azure applicationCodecamp Romania
 
#SPFestSEA Introduction to #MicrosoftGraph
#SPFestSEA Introduction to #MicrosoftGraph#SPFestSEA Introduction to #MicrosoftGraph
#SPFestSEA Introduction to #MicrosoftGraphVincent Biret
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemPierre Gutierrez
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Amazon Web Services
 
CCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysisCCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysiswalk2talk srl
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Using AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceUsing AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceChristian Beedgen
 
Creating a Single Source of Truth: Leverage all of your data with powerful an...
Creating a Single Source of Truth: Leverage all of your data with powerful an...Creating a Single Source of Truth: Leverage all of your data with powerful an...
Creating a Single Source of Truth: Leverage all of your data with powerful an...Looker
 
Web Analytics Primer
Web Analytics PrimerWeb Analytics Primer
Web Analytics PrimerChad Richeson
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...
ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...
ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...Johann Schleier-Smith
 

Similar to Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, Adobe) (20)

Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Machine learning
Machine learningMachine learning
Machine learning
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
Alex mang patterns for scalability in microsoft azure application
Alex mang   patterns for scalability in microsoft azure applicationAlex mang   patterns for scalability in microsoft azure application
Alex mang patterns for scalability in microsoft azure application
 
#SPFestSEA Introduction to #MicrosoftGraph
#SPFestSEA Introduction to #MicrosoftGraph#SPFestSEA Introduction to #MicrosoftGraph
#SPFestSEA Introduction to #MicrosoftGraph
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
 
Workshop: Make the Most of Customer Data Platforms - David Raab
Workshop: Make the Most of Customer Data Platforms - David RaabWorkshop: Make the Most of Customer Data Platforms - David Raab
Workshop: Make the Most of Customer Data Platforms - David Raab
 
CCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysisCCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysis
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Using AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceUsing AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics Service
 
Creating a Single Source of Truth: Leverage all of your data with powerful an...
Creating a Single Source of Truth: Leverage all of your data with powerful an...Creating a Single Source of Truth: Leverage all of your data with powerful an...
Creating a Single Source of Truth: Leverage all of your data with powerful an...
 
Web Analytics Primer
Web Analytics PrimerWeb Analytics Primer
Web Analytics Primer
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...
ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...
ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent P...
 
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 

Recently uploaded (20)

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 

Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, Adobe)

  • 1. Algorithmic Digital Attribution Using Spark Anny (Yunzhu) Chen and William (Zhenyu) Yan Adobe
  • 2. Agenda •  Digital attribution •  Algorithmic digital attribution •  Spark implementation •  Lessons learned
  • 3. Path of media touches to conversion Display" • See a display ads of Adobe Creative Cloud" Paid Search" • Search the term “Creative Cloud” and click the promoted link" Organic Search" • Search the term “Adobe Creative Cloud” and click an organic link" Signup" • Signup for a free trial" Email" • Receive an email from Adobe" Subscribe to Adobe Creative Cloud (positive) Not subscribe to Adobe Creative Cloud (negative) A customer may receive various kinds of media touch points before deciding whether to subscribe to a product (conversion) or not
  • 4. What’s digital attribution? 4 •  A digital attribution model determines how credits for conversions are assigned to media touch points •  It is quite important in performance monitoring and budget planning Display" • Saw a display ads of Adobe Creative Cloud" • 0.1" Paid Search" • Search the term “Creative Cloud” and click the promoted link" • 0.2" Organic Search" • Search the term “Adobe Creative Cloud” and click an organic link" • 0.2" Signup" • Signup for a free trial" • 0.2" Email" • Received an email from Adobe" • 0.3" Convert!
  • 5. Digital attribution model at a glance Models! Consider all media touch points! Time decay! Data driven! (vs. rule based)! Last interaction, first interaction, last paid search click! no" no" no" Linear! yes" no" no" Time decay! yes" yes" no" Position based! yes" no" no" Algorithmic! yes" yes " yes"
  • 6. Algorithmic Attribution •  Rule based: predetermined weights based on rules •  Algorithmic: machine learning or statistical models are used to determine the weights 6
  • 7. Algorithmic Attribution Modeling The attribution model is based on a combination of •  Distributed-lag econometric model •  Discrete-time survival model Some highlights: •  The basic idea is to compare the media touches in positive paths vs. those in negative paths and thus infer their effects •  Logistic link function •  Time decay parameters to address time-decaying of media effects •  Fit media touch effects and decay parameters simultaneously •  Constraints on coefficients (combining with rules) •  Stratified sampling •  Bias reduction: –  Using control variables, such as duration of exposure –  Causal modeling •  Maximum likelihood estimation
  • 8. Tokenization •  Tokenization is used to group neighboring events together •  why is tokenization needed?
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Tokenization DISPLAY" DISPLAY" DISPLAY" DISPLAY" DISPLAY" DISPLAY" •  The effect of a media touch is hardly doubled or tripled if it is sent repeatedly to a customer twice or three times in a short period •  Tokenization is used to group neighboring events together •  If you see the same advertisement 5 times in one hour, will you be impressed 5 times or just have the impression “I saw that ad”?"
  • 15. Why Spark? Raw data" Big data tools" Hive, Pig, Impala, Presto…" Big data machine learning tool" Spark" Mahout" Python, R, Java, C++ …" Big data Small data Rule based Algorithmic Iterative update
  • 16. Data process and stitching Event" •  Media touch events; conversion events; other user behavior events" •  Data fields: customer id , time stamp, event type, campaign id and etc." Path" •  Stitch the events of the same customer by “id” " •  Generate both positive paths and negative paths" Model" •  Each path is a record" •  Positive/negative label is the outcome"
  • 17. Algorithm evolution First model" •  Fix time decay" •  Logistic regression using Mllib logistic regression with SGD" Second model" •  Include both regression coefficients and decay parameters in the model" •  Optimize alternatively in each iteration" •  Customize and extend the original Mllib logistic regression" Third model " •  Second model + Causal modeling (matched sample)"
  • 18. Implementation architecture Data storage: s3" Computing platform: Spark standalone cluster on AWS (Amazon Web Service)" Monitor: web UI provided by Spark standalone cluster" Server side encryption Bastion host
  • 19. Model building" Data processing and tokenization" Generate paths" Parallel algorithm" Save model to S3" Attribution" Data processing and tokenization" Retrieve model from S3" Generate positive paths" Attribution with model" Implementation Detailed Workflow
  • 20. Lessons learned •  Memory management –  Each record was transformed from String-based to Byte-based using a hash function •  Tremendously increased the speed •  Reduced shuffle size in the next groupBy step
  • 21. Lessons learned •  Memory management –  Write separate classes for model training and attribution computation •  Model training needs complex transformation and intensive iteration •  Attribution computation needs more information from each objects
  • 22. Lessons Learned (cont’d) •  Cache before iterative computation –  Only cache the RDDs right before entering the model –  Unpersist unused RDDs to save space •  Clear jobs of stopped apps in workers from time to time –  Check and clean ~/spark/work folder in workers –  Check and clean /mnt/spark folder in workers if necessary •  Be careful when dealing with unserialized Java objects
  • 23. Lessons Learned (cont’d) •  Errors of one line of code doesn’t necessarily come from that line –  Spark is lazy evaluation –  Errors arise at action step, but may come from the previous transformation steps •  Adjustment of step size –  Time decay must be between 0 and 1 –  After the matching, sample size decreased dramatically, the step size has to be adjusted accordingly