PhD Seminar Riezlern 2016

Leveraging Regularity in Predicting Customer
Lifetime Value
Michael Platzer & Thomas Reutterer
Seminar Riezlern 2016

Warm Up
PAGE 2
Customer A
Customer B
1-Jan-16, 09:00 21-Jun-16, 10:28
1) Which customer would you prefer? The regular one, or the clumpy one?
2) Which type of customers are more prevalent? The regular ones, or the clumpy ones?
Two customers – Same Recency, Same Frequency

1. Intro to BTYD models
2. On the Subject of Regularity
3. Our Pareto/GGG model
4. Our (M)BG/CNBD-k model
5. Our BTYDplus R package

dead?
non-contractual setting
Customer purchases, until she stops purchasing.
However, dropout event is not observed.
Buy-Till-You-Die
alive!

Key Issues in the Management
of Customer Relationships
PAGE 5
?
?
Given: Purchase history of customer
cohort in non-contractual setting.
Example:
CD Sales
Broadening the context:
• purchase ≈ transaction ≈ event …
• customer relationship ≈ channel activity ≈
service activity …
Questions:
How valuable is that cohort?
How many purchases to expect?
Who will still be active?
Who will be most active?
When will next purchase take place?

BTYD “Gold Standard”
Pareto/NBD
Schmittlein, Morrison and Colombo, 1987
Assumptions
1. Purchase process (while ‘alive’)
• Purchases follow Poisson process, i.e. exponentially-distributed
inter-transaction times, itti,j ~ Exponential(λi)
• λi are Gamma (r, α) distributed across customers
Pareto
NBD
(Ehrenberg 1959)
à parameter estimation of (r, α, s, β) via Maximum Likelihood
à closed-form solutions for key expressions P(alive), # of future purchases
à require only recency/frequency summary statistics (x, tx, T) per customer
2. Dropout (‘death’) process
• (Unobserved) customer’s lifetime is exponentially distributed,
lifetime τi ~ Exponential(μi)
• μi are Gamma (s, β) distributed across customers
3. λ and μ vary independently

BTYD Models
• BG/NBD (Fader, Hardie, and Lee 2005)
Discrete time defection process (after any transaction) instead of continuous
• MBG/NBD (Batislam et al. 2007), CBG/NBD (Hoppe and Wagner 2007)
Customers can drop out at time zero (immediately after first purchase)
• PDO/NBD (Jerath et al. 2011)
Defection opportunities tied to calendar time (indep. of transaction timing)
• GG/NBD (Bemmaor and Glady 2012)
Flexible lifetime model, departing from exponential (Gamma-Gompertz)
• Pareto/NBD variant (Abe 2009)
Hierarchical Bayes extension of Pareto/NBD (dependencies of λi and μi)
à All modify dropout process, but not purchase process

Regularity improves
Predictability
PAGE 9
futurepast

next event?
Regularity improves
Predictability

next event?
Well, so what?
Regularity improves
Predictability

still alive?
A
B
Buy-Till-You-Die Setting
Customer A and B exhibit same Recency and Frequency,
yet we come to different assessments regarding P(alive).
Regularity improves
Predictability

• Erlang-k Herniter (1971)
• Gamma Wheat & Morrison (1990)
• CNBD Chatfield and Goodhardt (1973)
Schmittlein and Morrison (1983)
Morrison and Schmittlein (1988)
• CNBD Models Gupta (1991)
Wu and Chen (2000)
Schweidel and Fader (2009)
Regularity in Purchase
Timings

• RFMC Zhang, Bradlow and Small (2015)
Irregularity in
Purchase Timings

Empirical Findings
Data Sets
Grocery kwheat = 2.5
Donations kwheat = 2.2
Health Supplements kwheat = 2.1
Office Supply kwheat = 1.8
CD Sales kwheat = 1.0
Fashion & Accessoires kwheat = 0.6
Grocery Categories
Coffee pads kwheat = 3.1
Detergents kwheat = 2.8
Toilet Paper kwheat = 2.8
Cat food kwheat = 2.8
…
Light bulbs kwheat = 1.9
Cosmetics & perfumes kwheat = 1.6
Sparkling Wine kwheat = 1.6

Pareto/GGG
Platzer and Reutterer, forthcoming
PAGE 18
Customer Level
• Purchase Process: While alive, customer purchases with Gamma
distributed waiting times; i.e. itti,j ~ Gamma(ki, ki λi)
• Dropout Process: Each customer remains alive for an exponentially
distributed lifetime with death rate μi; i.e. lifetime τi ~ Exponential(μi)
Heterogeneity across Customers
• λi ~ Gamma(r, α)
• μi ~ Gamma(s, β)
• ki ~ Gamma(t, γ)
• λi, μi, ki vary independently
Pareto/GGG =
Pareto/NBD + Varying Regularity

Gamma Distributed
Interpurchase Times
PAGE 19
k=0.3 k=1
Exponential
k=8
Erlang-8
regularrandomclumpy
Coefficient of Variation = 1 / sqrt(k)

Pareto/GGG
Estimation via MCMC
Component-wise Slice Sampling
within Gibbs with Data Augmentation
SEITE 20
L Significantly Increased Computational Costs
(2mins for drawing 1’000 customers)

Pareto/GGG
Estimation via MCMC
Component-wise Slice Sampling
within Gibbs with Data Augmentation
SEITE 21
L Significantly Increased Computational Costs
(2mins for drawing 1’000 customers)
J but…
• Posterior Distributions instead of Point Estimates
• Also for Individual Level Parameters
• Direct Simulation of Key Metrics of Managerial Interest
• And only one additional summary statistic required

Simulation Study
Design
160 scenarios covering a wide range of parameter settings
(similar to simulation design from BG/BB paper)
• N = {1000, 4000}
• r = {0.25, 0.75}, α = {5, 15}
• s = {0.25, 0.75}, β = {5, 15}
• (t, γ) = {(1.6, 0.4), (5, 2.5), (6, 4), (8, 8), (17, 20)}
=> Total of 400’000 simulated customers
=> Total of 64 billion individual-level parameter draws (via slice sampling)
Compare individual-level forecast accuracy of Pareto/GGG vs. Pareto/NBD
in terms of mean absolute error (MAE). Study relative improvement in terms
in MAE.

Simulation Study
Regularity improves Predictability
• bigger lift for bigger regularity
• even for mildly regular patterns
we see lift
• no lift for random and clumpy
customers

Simulation Study
Lift in Predictive Accuracy
by Segment

Simulation Study
Interplay of Recency,
Frequency and Regularity
Assumptions: mean(itt) = 6 weeks, mean(lifetime) = 52 weeks
A

Simulation Study
Interplay of Recency,
Frequency and Regularity
Same RF, but different P(alive)
for different k! Particularly when
customer is already “overdue”.
Regular customers are less
likely and clumpy customers are
more likely to be still alive,
when compared to the
randomly purchasing customer.
Assumptions: mean(itt) = 6 weeks, mean(lifetime) = 52 weeks
A
B

Empirical Findings
regular
Poisson
clumpy
à regularity varies across but also within datasets

à improved predictive accuracy for datasets with regular patterns
median(k) rel. Lift in MAE
Empirical Findings

à estimates for next transaction timings differ, when
regularity is taking into consideration
Empirical Findings

(M)BG/CNBD-k
PAGE 32
Customer Level
• Purchase Process: While alive, customer purchases with Erlang-k
distributed waiting times; i.e. itti,j ~ Erlang-k(λi)
• Dropout Process: A customer drops out at a (re-)purchaseevent with
probability pi
Heterogeneity across Customers
• λi ~ Gamma(r, α)
• pi ~ Beta(a, b)
• λi, pi vary independently
BG/CNBD-k =
BG/NBD + Fixed Regularity
MBG/CNBD-k =
MBG/NBD + Fixed Regularity

(M)BG/CNBD-k
PAGE 33
Closed-Form Expressions
• Likelihood à 100-1000x faster parameter estimation via MLE than MCMC
• P(X(t)=x | r, α, a, b) à approximate Unconditional Expectation
• P(alive | r, α, a, b, x, tx, T) à key component for Conditional Expectation
• Conditional Expected Transactions à “pretty good” approximation possible
Erlang-k = Poisson with every kth event counted

Simulation Study
Design
324 scenarios covering a wide range of parameter settings – 5 repeats each
(similar to simulation design from BG/NBD paper)
• N = 4000, T.cal = 52, T.star = {4, 16, 52}
• r = {0.25, 0.50, 0.75}, α = {5, 10, 15}
• s = {0.50, 0.75, 1.00}, β = {2.5, 5, 10}
• k = {1, 2, 3, 4}
=> total of 1’300’000 simulated customers
Compare individual-level forecast accuracy of Pareto/GGG vs. Pareto/NBD
in terms of mean absolute error (MAE). Study relative improvement in terms
in MAE.

Simulation Study
Example

Simulation Study
Results
• bigger lift for bigger regularity
• even for mildly regular patterns we see lift

Empirical Findings
Results
Findings
1. MBG/NBD either on par
or better than BG/NBD
2. MBG/CNBD-k sees lift in
forecast accuracy, if
regularity present
3. MBG/CNBD-k comes
close to P/GGG

Empirical Findings
Results
Yet to come: Study Lift by Retail Category

BTYDplus
• https://github.com/mplatzer/BTYDplus
• GPL-3 license
• Implementations of
• MBG/NBD – Batislam et al. (2007)
• GammaGompertz/NBD – Bemmaor & Glady (2012)
• (M)BG/CNBD-k – Platzer and Reutterer (forthcoming)
• Pareto/NBD (MCMC) - Shao-Hui and Liu (2007)
• Pareto/NBD variant (MCMC) – Abe (2009)
• Pareto/GGG (MCMC) – Platzer and Reutterer (forthcoming)
• Fully tested and documented, incl. demos
• Vignette will be coming
…
Users

BTYDplus
demo
> elog
cust date
1: 4 1997-01-18
2: 4 1997-08-02
3: 4 1997-12-12
4: 18 1997-01-04
5: 21 1997-01-01
---
6914: 23556 1997-07-26
6915: 23556 1997-09-27
6916: 23556 1998-01-03
6917: 23556 1998-06-07
6918: 23569 1997-03-25
> (cbs <- elog2cbs(elog, per="week",
T.cal=as.Date("1997-09-30"), T.tot=as.Date("1997-09-30")))
cust x t.x litt T.cal T.star x.star
1: 4 1 28.000000 3.3322045 36.42857 39 1
2: 18 0 0.000000 0.0000000 38.42857 39 0
3: 21 1 1.714286 0.5389965 38.85714 39 0
4: 50 0 0.000000 0.0000000 38.85714 39 0
5: 60 0 0.000000 0.0000000 34.42857 39 0
---
2353: 23537 0 0.000000 0.0000000 27.00000 39 2
2354: 23551 5 24.285714 5.5243721 27.00000 39 0
2355: 23554 0 0.000000 0.0000000 27.00000 39 1
2356: 23556 4 26.571429 6.3127713 27.00000 39 2
2357: 23569 0 0.000000 0.0000000 27.00000 39 0
calibration summary stats
x = Frequency
t.x = Recency
litt = Sum Over Logarithmic
Intertransaction Times
holdout summary stats
Transform event-log to summary stats
(optionally one can split data into calibration and holdout)
customer ID

BTYDplus
demo MBG/CNBD-k
> params <- mbgcnbd.EstimateParameters(cbs)
> round(params, 2)
k r alpha a b
1.00 0.52 6.17 0.89 1.62
> cbs$xstar_est <- mbgnbd.ConditionalExpectedTransactions(params, cbs$T.star, cbs$x, cbs$t.x,
cbs$T.cal)
> cbs$palive_est <- mbgnbd.PAlive(params, cbs$x, cbs$t.x, cbs$T.cal)
> cbs
cust x t.x litt T.cal T.star x.star palive_est xstar_est
1: 4 1 28.000000 3.3322045 36.42857 39 1 0.6771113 0.7838636
2: 18 0 0.000000 0.0000000 38.42857 39 0 0.3919457 0.1558104
3: 21 1 1.714286 0.5389965 38.85714 39 0 0.1711458 0.1890291
4: 50 0 0.000000 0.0000000 38.85714 39 0 0.3907532 0.1540336
5: 60 0 0.000000 0.0000000 34.42857 39 0 0.4037292 0.1742668
---
2353: 23537 0 0.000000 0.0000000 27.00000 39 2 0.4294331 0.2206554
2354: 23551 5 24.285714 5.5243721 27.00000 39 0 0.8222069 3.9501015
2355: 23554 0 0.000000 0.0000000 27.00000 39 1 0.4294331 0.2206554
2356: 23556 4 26.571429 6.3127713 27.00000 39 2 0.8557381 3.4019351
2357: 23569 0 0.000000 0.0000000 27.00000 39 0 0.4294331 0.2206554
E(X(T+T.star))P(alive)

BTYDplus
demo Pareto/GGG
> params_draws <- pggg.mcmc.DrawParameters(cbs)
> round(summary(params_draws$level_2)$quantiles[, "50%"], 2)
t gamma r alpha s beta
45.31 43.36 0.55 10.74 0.66 12.51
> est_draws <- mcmc.DrawFutureTransactions(cbs, params_draws, cbs$T.star)
> cbs$palive_est <- sapply(params_draws$level_1, function(draws) mean(as.matrix(draws)[, 'z']))
> cbs$xstar_est <- apply(est_draws, 2, mean)
> cbs
cust x t.x litt T.cal T.star x.star palive_est xstar_est
1: 4 1 28.000000 3.3322045 36.42857 39 1 0.92 0.77
2: 18 0 0.000000 0.0000000 38.42857 39 0 0.26 0.08
3: 21 1 1.714286 0.5389965 38.85714 39 0 0.17 0.11
4: 50 0 0.000000 0.0000000 38.85714 39 0 0.33 0.05
5: 60 0 0.000000 0.0000000 34.42857 39 0 0.34 0.27
---
2353: 23537 0 0.000000 0.0000000 27.00000 39 2 0.38 0.15
2354: 23551 5 24.285714 5.5243721 27.00000 39 0 0.95 4.55
2355: 23554 0 0.000000 0.0000000 27.00000 39 1 0.36 0.17
2356: 23556 4 26.571429 6.3127713 27.00000 39 2 1.00 3.41
2357: 23569 0 0.000000 0.0000000 27.00000 39 0 0.51 0.31
E(X(T+T.star))P(alive)

Questions?
michael.platzer@gmail.com
thomas.reutterer@wu.ac.at
Try BTYDplus !!!

Appendix
• C Measure by Zhang, Bradlow, Small 2015
• MCMC Sampling Scheme

ZBS: Clumpiness Measure C
a metric-based approach
Predicting Customer Value Using Clumpiness: From RFM to RFMC
Zhang, Bradlow, Small
• Introduce metric C which captures the “non-randomness” in timing patterns
• Straightforward calculation at individual-level;
• Useful for descriptive analysis and segmentation;

Main Empirical Findings
• Capturing timing patterns adds
predictive power
• When controlling for R and F, then
clumpy customers tend to be more
active in the future
both findings are supported
and can be explained by our
model-based approach

Shortcomings
• Requires many transactions at
individual-level
• Metric C will be skewed when
dealing with different acquisition
dates and churn settings
both are appropriately handled
by our model-based approach

à sparse individual-level data mandates a model-based approach

Parameter Estimation via MCMC
Component-wise Slice Sampling within Gibbs with Data Augmentation
SEITE 50

PhD Seminar Riezlern 2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

More from MOSTLY AI

More from MOSTLY AI (10)

Recently uploaded

Recently uploaded (20)

PhD Seminar Riezlern 2016