Ubisoft

Matching as an Alternative to A/B Testing
Christoph Safferling
Head of Game Analytics
Ubisoft Blue Byte
Games Industry Analytics Forum
May 9th, 2013

Self-selection in games
in games, we routinely change things, and want to test if the
change was successful
game changes: quest changes, introduce new items, etc
shop conﬁgurations: amount of items, allocation, prices, etc
...and many examples more!
players self-select into the group that maximises their utility
(fun)
most game variables are the results of a player’s decision:
exogeneity is (usually) not given: E[ε|X] = 0

Treatment effects
test the outcome of a treatment effect
E[Y|X, D = 1] − E[Y|X, D = 0] = E[Y(1) − Y(0)|X]
with Y as the outcome, X as the observable data, and D as
the treatment dummy
we are intested in the average treatment effect on the treated:
ATT = E[Y(1) − Y(0)|D = 1]
= E[Y(1)|D = 1] − E[Y(0)|D = 1]

E[Y(0)|D = 1] is a counterfactual: unobservable
proper control groups (A/B testing!) provides a consistent
estimator
sometimes, A/B testing is not available/feasible
(one) different econometric modeling strategy: matching
estimator
reproduce the treatment group among the non-treated:
ﬁnd individuals who differ only in their outcomes, and their
treatment effect (“statistical twins”)

Assumptions and problems
Conditional Independence Assumption: given X, we assume
the outcome Y to be independent of the treatment D.
→ conditional on observed characteristics, selection bias is
removed
Common Support is given: 0 < P(D = 1|X) < 1
→ we exclude unmatched observations
Curse of Dimensionality: increasing X improves the matching
quality, but makes matching more difﬁcult!
→ e.g. for continuous variables: P(X1 = x) = 0

Several matching algorithms
one-to-one matching estimators
with/without replacement
nearest-neighbour
within-caliper
smoothed matching estimators
k-nearest neighbour
radius matching
weighted smoothed matching estimators
kernel smoothing
local linear regression smoothing
Mahalanobis distance matching

Zeropayments in TSO Russia
payment conversion in TSO RU was low
one explanation: payment process “scary”
“zeropayments” guide the player through the payment
process, offering a small reward for completing a fake
payment

Results of the treatment
reference: lifetime pay-to-active TSO RU a
paid at least once additionally to the zeropayment 5.9a
paid after their zeropayment 3.5a
paid after their zeropayment, not paid before 1.6a

Matching results (tobit)
(1) (2) (5) (6)
tobit full tobit2 full tobit cem tobit2 cem
had zero payments 7.376 19.71 -356.3 -350.1
(0.974) (0.931) (0.270) (0.276)
level 315.3∗∗ 354.1∗∗ 674.4 696.4
(0.007) (0.000) (0.177) (0.179)
level squared -0.796 -1.441 -9.274 -9.635
(0.709) (0.416) (0.291) (0.289)
uniqueLogins -26.27∗∗ -28.22∗∗ -33.35 -34.78
(0.018) (0.007) (0.199) (0.204)
rating for week -407.0† -400.7† 39.74 42.50
(0.076) (0.076) (0.915) (0.908)
guild 647.9∗∗ 651.2∗∗ 639.6 627.8
(0.012) (0.011) (0.388) (0.400)
age 53.18∗∗ 52.37∗∗ 185.4 171.8
(0.024) (0.025) (0.264) (0.288)
(additional controls, including intercept)
N 12376 19522 4114 6894
pseudo R2 0.162 0.189 0.139 0.158
p-values in parentheses

Matching results (zero-inﬂated negbin)
(1) (2) (5) (6)
zinb full zinb2 full zinb cem zinb2 cem
had zero payments 0.111 0.110 0.540∗∗ 0.538∗∗
(0.463) (0.466) (0.005) (0.006)
level 0.148∗∗ 0.150∗∗ -0.153 -0.255†
(0.012) (0.010) (0.332) (0.096)
level squared -0.00211∗∗ -0.00213∗∗ 0.00429 0.00617∗∗
(0.036) (0.032) (0.155) (0.035)
uniqueLogins -0.0180∗∗ -0.0180∗∗ -0.0308∗∗ -0.0310∗∗
(0.007) (0.006) (0.005) (0.005)
rating for week 0.747∗∗ 0.748∗∗ 1.662∗∗ 1.653∗∗
(0.000) (0.000) (0.000) (0.000)
guild -0.112 -0.112 0.280 0.297
(0.319) (0.319) (0.286) (0.264)
age 0.0383∗∗ 0.0383∗∗ 0.119 0.192†
(0.012) (0.012) (0.308) (0.096)
(additional controls, including intercept and inﬂate regression)
N 12376 19522 4114 6894
p-values in parentheses

further reading
Rosenbaum, P. R., Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal
effects. Biometrika 70 (1), pp. 41-55.
Heckman, J. J., H. Ichimura, and P. Todd (1997). Matching as an Econometric Evaluation Estimator: Evidence
From Evaluating a Job Training Programme. Review of Economic Studies 64, pp. 605-54.
Angrist, J. D. and A. B. Krueger (1999). Empirical Strategies in Labor Economics. pp. 1277-1366 in Handbook of
Labor Economics, vol. 3, edited by O. C. Ashenfelter and D. Card. Amsterdam: Elsevier.
Blackwell, M., Iacus, S., King, G., Porro, G., (2009). cem: Coarsened exact matching in stata. Stata Journal 9 (4),
pp. 524-546.
Iacus, S., King, G., Porro, G. (June 2008). Matching for causal inference without balance checking. UNIMI –
Research Papers in Economics, Business, and Statistics 1073, Universit´a degli Studi di Milano.
Lechner M. (2002). Some practical issues in the evaluation of heterogeneous labour market programmes by matching
methods. Journal of the Royal Statistical Society. Series A, 165, pp. 59-82.
Leuven, E., Sianesi, B. (April 2003). Psmatch2: Stata module to perform full mahalanobis and propensity score
matching, common support graphing, and covariate imbalance testing. S432001 Statistical Software Components,
Boston College Department of Economics

Ubisoft

Recommended

Recommended

More Related Content

Similar to Ubisoft

Similar to Ubisoft (20)

More from GIAF

More from GIAF (6)

Recently uploaded

Recently uploaded (20)

Ubisoft