How many newspapers should be distributed to each store for sale every day? The data science group at The New York Times addresses this optimization problem using custom time series modeling and analytical solutions, while also incorporating qualitative business concerns. I'll describe our modeling and data engineering approaches, written in Python and hosted on Google Cloud Platform.
Deploying Data Science for Distribution of The New York Times - Anne Bauer
1. Deploying Data Science
for Distribution of
Anne Bauer
anne.bauer@nytimes.com
Lead Data Scientist, NYTimes
PyData 20181017
2. Single copy newspaper distribution
1. people still buy physical newspapers?
2. algorithms
3. experiments to test the algorithms
4. ...we need to modify the algorithms
5. app architecture
3. Single copy newspaper distribution
1. people still buy physical newspapers?
2. algorithms
3. experiments to test the algorithms
4. ...we need to modify the algorithms
5. app architecture
5. How many papers should we
deliver to each store each day?
Too many or too few: a waste of $$, or missed sales!
“Single copy” optimization
6. Single copy: the process
Weekly process
• Stores report sales for 1-2 weeks ago (depending on the distributor)
• We pick up the data via FTP, ingest them into our systems
• Our models are retrained, predictions run
• Predictions are handed off via FTP to the circulation department
Turnaround time ~ few hours
7. Single copy: the existing algorithm
Heuristics with many if/then statements
• Highest sale over recent weeks × A + B
• A, B are extremely hand-tuned by store type, location, ...
• Interspersed amid 4600 lines of COBOL
8. Single copy: the existing algorithm
Heuristics with many if/then statements
• Highest sale over recent weeks × A + B
• A, B are extremely hand-tuned by store type, location, ...
• Interspersed amid 4600 lines of COBOL
9. Single copy: the existing algorithm
Heuristics with many if/then statements
• Highest sale over recent weeks × A + B
• A, B are extremely hand-tuned by store type, location, ...
• Interspersed amid 4600 lines of COBOL
• Difficult to modify to include, e.g. print site cost differences
• Quintessential time series modeling problem.
Perfect for data science!
10. Single copy newspaper distribution
1. people still buy physical newspapers?
2. algorithms
3. experiments to test the algorithms
4. ...we need to modify the algorithms
5. app architecture
11. Algorithm components
The problem is separable into two parts:
Prediction: Given previous sales, how many papers will sell
next Thursday?
Policy: We think N papers will sell, with a known
uncertainty distribution. How many should we send (draw)?
13. AR(1)
Prediction
• Xt = c + φ Xt-1 + εt
• Today’s sale is a linear function of last week(s)
• One model per store per day of week
• Use the past year’s data to fit for c, φ
• AR(1) vs. AR(N) and training window chosen via cross-validation
Policy
• Draw = ceil(demand)
• Bump: if there have been recent sell-outs, send an extra
14. AR(1)
Implementation
• Python 2, with statsmodels AR model. Single script.
• Plots (matplotlib pngs) hosted using Flask to monitor draws & sales
• Run by cron on a local server
• No separate dev/prd environments; code “deployed” via scp
16. Poisson Regression
Prediction
• Today’s sale is a linear function of the previous
week(s) and the previous year
• One model per store per day of week
• Use the past year’s data to fit model
parameters
• Feature time scales chosen via cross-validation
• Assume the sales are drawn from a Poisson
distribution rather than Gaussian
• Sell-outs considered in the likelihood function
17. Poisson Regression
b: # papers bought
d: # papers delivered (the draw)
z: demand (Poisson distributed latent variable)
λ: Poisson parameter for the demand distribution
Each store has a different λ each day. z for that store & day is drawn from a Poisson distribution with that λ.
Parameterize Poisson parameter λ as log-linear
function of features X.
θ are the parameters fitted in the problem via ML.
18. Poisson Regression
b: # papers bought
d: # papers delivered (the draw)
z: demand (Poisson distributed latent variable)
λ: Poisson parameter for the demand distribution
Probability of the # bought given the demand depends on if
the demand > papers delivered (i.e. if there was a sell-out)
Use this probability for a maximum likelihood
estimation of the parameters θ that describe λ
19. Poisson Regression
Policy: Newsvendor Algorithm
• Profit = price × min(d, z) – cost × d
• Take derivative of the profit, set it equal to zero, implies:
Probability(z <= d) = (price-cost)/price
• Optimal draw: smallest integer such that
Probability(z <= d) >= (price-cost)/price
• Probability given by the CDF of the Poisson distribution,
z = the demand prediction, brute force find best d.
z = demand
d = draw = # delivered
20. Poisson Regression
Implementation: refactored code!
• Models abstracted to sklearn-like classes to allow for easy future
expansion with plug & play model integration
• Common library of functions to:
• get data from the DB
• calculate costs
• check data quality
• ...
• __init__()
• query()
• transform()
• fit()
• predict()
• policy()
21. Single copy newspaper distribution
1. people still buy physical newspapers?
2. algorithms
3. experiments to test the algorithms
4. ...we need to modify the algorithms
5. app architecture
22. Treatment & Control groups: match sales
Simple approach
• Take a random sample that approximates the total sales distribution
• For each member of this “treatment” sample, find closest match in mean sales
Trial & error checks!
• Exclude cases with any large differences in sales during the training period
• Only consider matches with the same production costs (~print site)
• Make sure treatment & control sell the paper on the same weekdays
• Better no match than a distant match
23. Reporting
D3 Dashboard
Optimize for profit: ✔
Make stakeholders happy: ✗
Our profit comes at the expense
of sales!
Sales matter beyond sales profit.
Circulation numbers matter.
Hard to quantify that value!
24. Goal: Optimize for profit
... but don’t decrease sales “too much”
∴ Constrained optimization
25. Single copy newspaper distribution
1. people still buy physical newspapers?
2. algorithms
3. experiments to test the algorithms
4. ...we need to modify the algorithms
5. app architecture
26. Constrained newsvendor algorithm
Policy: Newsvendor Algorithm
• Profit = price × min(d, z) – cost × d
Maximize profit – λ × sales (negative λ to boost sales)
Effectively modifies the sales price of the paper
• (price – λ) × min(d, z) – cost × d
• Optimal draw: smallest integer such that
Probability(z <= d) >= (price-λ-cost)/(price-λ)
Negative λ → increase effective sales price → worth sending extra papers
z = demand
d = draw = # delivered
27. The stakeholders choose λ
To our surprise, they chose λ such that
sales loss ~0 and profit was suboptimal.
But still much better than the original
algorithm!!
This tuneable knob is very handy; we run
experiments with different λs and the
stakeholders can make the final decisions
on which results are best.
Δ
| |
28. Reporting: model comparison
Look at both profit and sales differences between treatment & control
Leave trade-off decisions to the stakeholders: better for everyone.
29. Single copy newspaper distribution
1. people still buy physical newspapers?
2. algorithms
3. experiments to test the algorithms
4. ...we need to modify the algorithms
5. app architecture
30. Current architecture: Google Cloud
App Engine: Web front end
App Engine Flex: Back ends for
reporting and predictions
BigQuery, Cloud Storage,
Cloud SQL: for hosting data and
configuration
Deployed via Drone
(github.com/NYTimes/drone-gae)
Github → Docker → GCR → AE Flex
Github → AE Standard
31. Architecture: Process
Data transfer
• Weekly cron job per distributor, on AE instance
• Taskqueue task: copy data from FTP to BQ, using config info in GCS
• Task fails if the data are not there
• The task queue retries every N minutes until the data shows up
Logging
• Logs sent to Stackdriver, emails sent upon errors
• Quality checks and progress messages sent to Slack
32. Architecture: Process
Reporting
• Reads data from BQ
• Calculates aggregations & stats about algorithm experiments, using
config info from CloudSQL (BQ & pandas)
• Saves aggregated data back to BQ
• Runs statistical tests on data quality (e.g. last week’s total sales within
3σ of previous mean), aborts if failure
• Syncs the aggregated BQ tables with CloudSQL, for use in filtering the
front end UI
33. Architecture: Process
Predictions
• Reads data from BQ
• Retrains and predicts next week’s sales & how many papers to deliver
to each store each day (sklearn, scipy), using config info from CloudSQL
• Saves results to GCS
• Runs tests for unexpected changes in predictions, aborts if failure
Upload
• The front end copies the results from GCS back to the FTP site
34. A well-distributed project
experiments
A/B testing algorithms
with $ directly as a KPI
communication
Fold qualitative business
concerns into the math
engineering
Google Cloud Platform
improves our process
algorithms
Sell-outs, costs
directly incorporated