I presented this at the London Measurecamp Conference, in September 2016. This is an overview on how to build an attribution solution with Python and Tableau. This is meant as a starter solution.
1. How to build an attribution
solution in a day
- well maybe a couple of days ;)
Dr. Phillip Law
2. • By the end of this Talk I want to give you the tools and methods to be
able to go away and build your own solution.
• I started this on Monday and got something useable in a day, then it
took me a couple of days to polish this up and iron out any bugs.
• It’s not perfect, but give you an attribution solution with contextual
data and importantly the power to slice and dice.
3.
4. Overview
• Blurb About Me and the Company I work For.
• Quick Overview of Rules Based attribution.
• Discuss the tools I used and why, this is effectively an ETL process using
Python and Tableau.
• How I extracted the data.
• What Transformations I did using Python.
• What Transformations I did in Tableau.
• Run through the models in Tableau.
• Limitations (Scalability)
• Next Steps (Improve the models, Allocation of Credit, Bayesian
attribution)
11. 1. Improving Marketing
Performance
Improving conversion
throughout the
customer journey and
reducing inefficiency.
In a world where
customer experience is
the brand experience.
Digital has
great power.
2. Brand Building 3. Future Proofing
It is particularly
important to work to
future web standards
and consumer
patterns to create
lasting solutions.
12. Reporting &
implementation
(Adobe and GA)
SEO & PPC
Data Science
(modelling)
Data Visualisation
(D3, Tableau)
Growth
Audits
Optimisation
(A/B Testing, Videos)
Analytics
21. Raw Data Feed
Because of file size you’ll probably
need to get it delivered to an FTP
You can ask for the
full data feed, this file
is delivered hourly
and contains all data,
this file is Huge, you
can get this delivered
to D3 on the amazon
cloud, which is nice
22. Raw Data Feed
Because of file size you’ll probably
need to get it delivered to an FTP
You can ask for the
full data feed, this file
is delivered hourly
and contains all data,
this file is Huge, you
can get this delivered
to D3 on the amazon
cloud, which is nice
23. Process this Data File in Python (4 Steps)
(Did this whole thing in 140 lines of code)
Step 1: Clean file (remove all page views where page views equals zero), flag fist
touch point in visit, count page views in visits, create sort key.
Step 2: Group by tracking ID, and sort by time (need to sort by the sort key), flag
conversion event (Only one conversion Event per Visitor)
Step 3: Read in file backwards, create attribution window, count touch points from
conversion, write conversion time to the same row as the conversion touchpoint.
Step 4: Re-order file and step three reversed the process.
29. Traditional rules for assigning credit are arbitrary not and do not reflect the true
weighing of a touch point.
Weightings are skewed towards channels that retarget
There is a method from game theory that has been mathematically proven to allocate
credit in a fair way.
Shapley Value
35. 𝑝 𝑐𝑜𝑛𝑣𝑒𝑟𝑡 | 𝑃𝑃𝐶 𝑝 𝑐𝑜𝑛𝑣𝑒𝑟𝑡 | 𝑃𝑃𝐶
Probability they would have
converted anyway
Difference in probabilities is the
impact that channel has on
conversion
∆𝑝
Probability someone converts
given that they have seen a PPC
ad
36. Advantages
• Using the Shapley value provides a more “true” allocation
of the influence of channels.
• Bayesian model takes into account user journey that don’t
convert. Understand unconverted users that are the best
prospect of conversion.