This document summarizes a presentation given at PyCon Dublin 2016 about using factorisation machines for recommendations. Factorisation machines were chosen as they are accurate, fast, and can handle sparse data. The key steps discussed are: structuring user, content, and context data; modelling with factorisation machines; and deploying the model via a machine learning service. Further analysis areas mentioned include injecting new content and tracking consumption via A/B testing.
Advanced Machine Learning for Business Professionals
Content Recommendation using factorisation machines ; Pycon Ireland 2016
1. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Fabrikatyr Analytics
Uncover tangible truths amidst the noise of modern media
Recommendation service using Factorisation
Machines
PyCon Dublin - 2016
@Conr
@fabrikatyr
2. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Agenda
The Problem
Factorisation machines as a method
Getting the data right
Modelling and Deployment
Further Research
3. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation MachinesOct 2015 - PyCon Dublin 2015
Fabrikatyr – Increasing Customer Response Rate
Business Problem
Increase User Engagement by displaying content
which is both personalised and interesting
4. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Content can be User Generated or Taken from 3rd party
content provider;
Users Communities
A user can be in MANY
communities
Behaviours are consistent
across communities,
Content consumption is not
Interesting Content will generate
● Likes
● Comments
● Share
Content can be ‘EverGreen’
5. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation MachinesOct 2015 - PyCon Dublin 2015 Fabrikatyr – Increasing Customer Response Rate
Optimisation
Problem
5
General
recommender
Speed & Scalability
Sparse data set
How to select an method
to deploy which can
answer the challenges?
Accuracy is important, but the goal is
to generate recommendations which
are consumed
The system needs to respond quickly
to trends and topics across
communities
Lot of ‘hidden’ behaviours
Measurably engagement is Low so
the data set is very sparse
6. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Factorisation Machines appeared to be the method
which answered the challenge
Factorisation
Machines
General accuracy Quick Designed for it
Accuracy Speed Sparsity
Collaborative Filter Too Accurate Suitable Suitable
Support Vector
Machines
Too Accurate Suitable Unsuitable
Random Forest /
CART
General Accuracy Unsuitable Unsuitable
7. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation MachinesOct 2015 - PyCon Dublin 2015
Fabrikatyr – Increasing Customer Response Rate
Factorisation machines as a
method
8. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Factorisation Machine
- The Equation
9. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Factorisation Machine
- The Equation
10. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Factorisation Machine
- The Equation
11. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Limits of Factorisation Machines
Need to understand your features as the model
Not good with ‘dense’ data with binary outcomes
Relatively newer method, but supported by most languages
General model, so predictions are also general
12. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation MachinesOct 2015 - PyCon Dublin 2015
Fabrikatyr – Increasing Customer Response Rate
Getting the data right
13. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines 13
The data from the systems needs to be examined and
structured before executing the model
3 groups of
information
Users
Content
Context
Time was NOT a feature
14. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Important
Unimportant
● Is the user and Admin / Moderator
● Has the User ‘logged-in’
Not all USER behaviours are important when using
a generalised model
● User behaviour
● Engagement
● Count of Community membership
15. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Engagement
Keywords
● Did the user ‘Like’ the content
● Did the user ‘comment’ on the content
● Did the user ‘share’ the content
Content needs to be given ‘Context’ to be worked
with effectively
● Which keywords does the content have?
16. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
The general behaviour is that a set of users and
content generate most of the activity
17. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Final result was a ‘wide’ dataset per user
event with many columns
● Each time a user either saw content or it engaged
with it a row must be added to the data set
● Keywords, likes, etc. all receive a 1 or a 0 for ALL the
events
19. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
We used an application which could deploy
our python model at scale
Turi Predictive Services supports model
predictions, hosting and managing
machine learning models as low-latency
RESTful services.
Turi was acquired by Apple Inc. for $200
mill
Domino Data labs is an alternative
20. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
sFrame versus Pandas - works for
Factorisation Machines
SFrame is an scalable, out-of-core
dataframe, which
Allows you to work with datasets that are
larger than the amount of RAM on your
system.
Similar to Spark RDD
21. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Content store
Solution Architecture
Guest data
Factorisation
Machine
model
Scoring Engine
&
Recommendations
22. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Machine Learning Service
Content store
URL’s
Guest behaviour
Balance between online / offline calcuations
Guest data
Factorisation
Machine
model
Content consumption
Guest classification
Scoring Engine
&
Recommendations
Content URL
C#
content
server
(Offline/batch)
(Online)
Model weights
23. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Further analysis
Injecting content into the model
Consumption tracking using A/B testing
Presentation Bias - does rank affect consumption
24. Nov 2016- PyCon Dublin 2016 Fabrikatyr – Factorisation Machines
Fabrikatyr Analytics
Uncover tangible truths amidst the noise of modern media
Thank you - Any Questions?
PyCon Dublin - 2016
@Conr
@fabrikatyr