Using Data Science to Build an End-to-End Recommendation System

© Copyright 2018 Pivotal Software, Inc. All rights Reserved. Version 1.0
Ambarish Joshi, Senior Data Scientist at Pivotal
June 21, 2018
Using Data Science to Build an End-
to-End Recommendation System

True Digital
Transformation requires
modern software
informed by data science-
driven insights.

Context
End to end Recommendation System, from data to insights
Power utility company
seeking to build and end to
end recommendation
systems for ancillary products
which will integrate with a
mobile app and call center
systems
●  Machine learning techniques
and rich data to build models
to recommend products
●  Microservices based
architecture to integrate data
science results into mobile
app and call center systems
●  Agile development practices
to build high quality software
✓  End to end product
recommendation solution
✓  Model results exposed via API
✓  Enablement of Data Science
team
Customer Solution Outcome

Technology and Data Overview
●  Electric charges
●  Account
●  Demographic data (Acxiom)
●  Product eligibility
●  Product participation
●  6.5+ Million Customers
●  150+ Million rows
Data Sources
Tools
Platform

Agile Data Science
Pair Programming
Retros
Test Driven Development
Continuous Integration /
API First
Tracker
Standups

Agile Data Science
Discovery
Phase
✓  Data exploration for
understanding context of the
data and its business
implications
✓  Data cleansing, transformation
and feature engineering
✓  Training, validation and
evaluation of ML algorithms
✓  Multiple iteration of above steps
to get the desired model
performance
Operationalization (O16n)
Phase
✓  Test driven development of data
cleansing and feature
engineering scripts
✓  Setup automatic data pipelines
to clean, cleanse and score new
data
✓  Setup monitoring code to check
incoming data to identify
remodeling efforts
✓  Build APIs to consume model
output

End to End
Data exploration ,
feature generation
and ad-hoc ML
modeling
Use test driven
development
(TDD) to create
production quality
pyspark scripts
Build an automated
scoring workflow
using pyspark
scripts to generate
recommendations
TDD
Recommendation
microservice on
Pivotal Cloud Foundry
to server customer
recommendations

Discovery Phase : Data Exploration
Worked with subject matter experts (SMEs) to
understand how the data is generated, how the
data is used and business implications of data
Takeaways
●  Context and business impact of data gained here is very valuable to eventual
success of the machine learning model
●  There might be resistance from stakeholders for such activity (“not real work”)
●  Mitigate this resistance by sharing the data exploration insights and their
business implications

Discovery Phase : Feature Engineering
●  Our goal was to predict the propensity of a customer to buy a
particular ancillary product
●  We only had information when a customer bought the product
●  We did not have any solicitation history
●  We took all the buy events and calculated features for that event
with a backward looking window for our +ve examples
●  We sampled -ve events randomly and calculated features using the
same backward looking window Time
Buy Event
Window for features
Takeaways
●  Setting up data to run machine learning algorithm is more of an art than science
●  Balance +ve and -ve examples especially for rare events
●  Be aware of biases that may affect data, these biases have modeling implications

Discovery Phase : ML modeling iterations
Takeaways
●  Getting feedback from SMEs on the model results is very important
●  Sharing impactful features a great way to get feedback and build
SME trust in ML models
●  Figure alongside show the ML model iteration process
●  We tried many algorithm with various hyper parameters
●  Elastic net models were the most viable models and were
chosen to deploy during operationalization phase

O16n : Production scripts using TDD
After the discovery phase, we used TDD to write production scripts
for data cleansing, feature generation and model scoring

Why Paring and TDD?
Pair Programming
Test Driven Development
“Time spent writing a test beforehand is rarely wasted. Code written to pass
a test takes much less time to debug.” – client 1
“TDD gives me the confidence that I won’t commit code that breaks existing
functionality, no matter what I change” – client 2
“Pairing instills critical thinking, builds confidence, distributes knowledge, and
gets work done. Most methods of work only do one of those things.” – client 1
“Pairing was an educational experience for me, as well as a real-time validator.
If my pair catches a problem with my code, I’ll know about it in real time.”
– client 2

Summary of Enablement
●  Ad-hoc model building in SAS
enterprise miner
●  Minimal data science rigor
●  Manual data upload to SAS
environment for modeling
●  Model results shared using
Excel
●  Results used only for
forecasting
Before
✓  Data science on modern open
source tools
✓  Data science rigor
✓  Automated workflow for data
cleansing, feature generation
and scoring
✓  Robust logging and validation
of data and model results
✓  Recommendation microservice
up in production to be
consumed by app developers
After

There are no shortcuts
THINNEST IMPACTFUL SLICE
https://hackernoon.com/the-ai-hierarchy-of-needs

Using Data Science to Build an End-to-End Recommendation System

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Using Data Science to Build an End-to-End Recommendation System

Similar to Using Data Science to Build an End-to-End Recommendation System (20)

More from VMware Tanzu

More from VMware Tanzu (20)

Recently uploaded

Recently uploaded (20)

Using Data Science to Build an End-to-End Recommendation System