SlideShare a Scribd company logo
1 of 38
Why Neural Net Field Aware Factorization
Machines
are able to break ground in
digital behaviours prediction
Presenter: Gunjan Sharma
Co-Author: Varun Kumar Modi
About the Authors
Presenter: Gunjan Sharma
System Architect @ InMobi (3 years)
SE @Facebook (2.5 Years)
DPE @Google (1 year)
Twitter Handle: @gunjan_1409
LinkedIn:
https://www.linkedin.com/in/gunjan-
sharma-a6794414/
Co-author: Varun Kumar Modi
Sr Research Scientist @ InMobi(5 years)
LinkedIn:
https://www.linkedin.com/in/varun-
modi-33800652/
Content
1) The problem and context
2) The Motivation
3) Building the model theory: piece by piece
4) Results of the 2 use cases
5) Understanding exactly why it works
6) Implementation at InMobi scale
Content
1) The problem and context
2) The Motivation
3) Building the model theory: piece by piece
4) Results of the 2 use cases
5) Understanding exactly why it works
6) Implementation at InMobi scale
InMobi is one of the largest advertising platform at scale globally
InMobi reaches >2 billion MAU across the world - specialised in mobile In-app advertising
JAPA
N
INDIA+
SEA
CHINA
Afri
ca
ANZ
NORTH
AMERICA
KOREA
EMEA
Latin
America
LATIN
AMERICA
Afri
ca
AfricaAFRICA
China
APAC
Consolidation has taken place to
clean up the ecosystem few
advertising platforms at scale exist
North America
(only
Video) Very limited number of players have
presence in Asia, InMobi is dominating
Few players control each component of the
chain; No presence of global players, except
InMobi
Problem stmt and why it matters
● What are the problems:
Use case 1 - Conversion ratio (CVR) prediction:
- CVR = Install rate of users = Probability of a install given a click
- Usage: CPM = CTR * CVR * CPI
Use case 2 - Video completion rate (VCR) prediction:
- Video completion rate of users watching advertising videos given click
● Why are they important:
○ Performance business - based on arbitrage, so the model directly determines the margin/profit of the
business and the ability of the campaign to achieve significant scale = > multi-million dollar
businesses!
Existing context and challenges
● Models traditionally used Linear/Logistic Regression and Tree-based models
● Both have their strengths and weaknesses when used in production
● What we need is an awesome model that sits somewhere in the middle and
can bring in the best of both worlds
LR Tree Based
Generalise for unseen combinations Our use cases could not
Potentially Underfit at times Potentially can overfit at times
Requires lesser RAM Can at times bloat RAM usage specially
with high cardinality features
Content
1) The problem and context
2) The Motivation
3) Building the model theory: piece by piece
4) Results of the 2 use cases
5) Understanding exactly why it works
6) Implementation at InMobi scale
Why think of NN for CVR/VCR
prediction
● Using cross features in LR wasn’t cutting it for us.
● Plus at some point it starts to become cumbersome both at training and
prediction time.
● All the major predictions noted here follow a complex curve
● LR left much to desire compared to Tree based models for example because
interaction-terms are limited
● We tried couple of awesome models that were also not able to beat Tree
based models
We all agreed that Neural Nets are a suitable technology to find higher order
interactions between our features
At the same time they have the power of generalising to unseen combinations.
Challenges Involved
● Traditionally NNs are more utilized for Classification problems
● We want to model our predictions as regression problem
● Most of the features are categorical which means we need to use one-hot
encoding
● This causes NN to spew very bad results as they need a lot of data to train
efficiently.
● Plus cardinality of some features is very high and it makes life more troublesome.
● Model should be easy to productionised both for training and serving
● Spark isn’t suited for custom NN networks.
● Model should be debuggable as much as possible to be able to explain the
Business changes
● The resistance to using NN for a long time came because of the lack of
understanding into their internals
Content
1) The problem and context
2) The Motivation
3) Building the model theory: piece by piece
4) Results of the 2 use cases
5) Understanding exactly why it works
6) Implementation at InMobi scale
Consider the following dummy dataset
Publisher Advertiser Gender CVR
ESPN Nike Male 0.01
CNBC Nike Male 0.0004
ESPN Adidas Female 0.008
Sony Coke Female 0.0005
Sony P&G Male 0.002
Factorization Machine (FM) - What are those
ESPN CNBC SONY Adi Nike Coke P&G Male Female
X0
X1
X2
Y0
Y1
Y2
Z0
Z1
Z2
Publisher Advertiser Gender CVR
ESPN Nike Male 0.01
1 0 0 0 1 0 0 1 0
= Publisher
Latent Vector
(PV)
= Advertiser
Latent Vector
(AV)
= Gender
Latent Vector
(GV)
PVT*AV + AVT*GV + GVT*PV = pCVR
NOTE: All vectors are K dimensional which is hyper parameter for the algorithm
Factorization Machine (FM) - What are those
● K dimensional representation for every feature value
● Captures second order interactions across all the features (ATB =
|A|*|B|*cos(Θ))
● Essentially a combination of hyperbolas summed up to form the final
prediction
● Works better than LR but tree based models are still more powerful.
● EG: Predict movie’s revenue:
Features
Movie
City
Gender
Latent Features
Horror
Comedy
Action
Romance
Second Order Intuition
● For every latent feature
● For every pair of original feature
● How much this latent feature affect
revenue when considering these pair
Final predicted revenue is linear sum over
all latent features
Field aware Factorization Machine (FFM)
ESPN CNBC SONY Adi Nike Coke P&G Male Female
XA
0
XA
1
XA
2
Publisher Advertiser Gender CVR
ESPN Nike Male 0.01
1 0 0 0 1 0 0 1 0
PVA
PVA
T*AVP + AVG
T*GVA + GVP
T*PVG = pCVR
NOTE: All vectors are K dimensional which is hyper parameter for the algorithm
XG
0
XG
1
XG
2
PVG
YP
0
YP
1
YP
2
AVP
YG
0
YG
1
YG
2
AVG
ZP
0
ZP
1
ZP
2
GVP
ZA
0
ZA
1
ZA
2
GVA
Field aware Factorization Machine (FFM)
● We have a K dimensional vector for every feature value for every other feature
type
● Still second order interactions but with more degrees of freedom than FM
● Intuition: Latent features interact with every other cross feature differently
Works significantly better than FM, but at certain cuts was still not able to beat
Tree based model
Deep neural-net with Factorisation Machine:
DeepFM
Sigmoid(FM + NeuralNet(PV :+ AV :+ GV)) = pCVR
DeepFM
● Now we are entering the neural net world
● This model is a combination of FM and NN and the final prediction is sum of
the output from the 2 models
● Here we optimize the entire graph together.
● It performs better than using the latent vectors from FM and then running
them through neural net as a secondary optimization (FNN)
● It performs better than FM but not better than FFM
● Intuition: FM finds the second order interactions while neural net uses the
latent vectors to find the higher order nonlinear interactions.
Neural Factorization Machine: NFM
NeuralNet((PV.*AV .+ AV.*GV .+ GV.*PV)T) = pCVR
NFM
● In this architecture you only run the second order features through NN instead
of the raw latent vectors
● Intuition: The neural net takes the second order interactions and uses them to
find the higher order nonlinear interactions
● Performs better than DeepFM mostly attributed to the 2 facts
○ The size of the net is smaller hence converges faster.
○ The neural net can take the second order interactions and convert them easily to higher order
interactions.
● Results were better than DeepFM as well. But still not better than FFM
InMobi Spec: DeepFFM
Feature1
F2E
Dense
Embeddings
F3E F1E F3E F1E F2E
Hidden Layers
Act
FF Machine
Ypred
Feature2 Feature3 Spare Features
InMobi Spec: DeepFFM
● A simple upgrade to deepFM
● Performs better than both DeepFM and FFM
● Training is slower
● FFM part of things does the majority of the prediction heavy lifting. Evidently
due to faster gradient convergence.
● Intuition: Take the latent vectors run them through NN for higher order
interactions and use FFM for second order interactions.
InMobi Spec: NFFM
Feature1
F2E
Dense
Embeddings
F3E F1E F3E F1E F2E
Feature2 Feature3
Sparse
Features
FF Machine
Hidden Layers
….... K inputs
Ypred
InMobi Spec: NFFM
● A simple upgrade to NFM
● Does better than everyone significantly.
● Converges faster than DeepFFM
● Intuition: Take the second order interactions from FFM and run them through
neural net to find higher order nonlinear interactions.
Content
1) The problem and context
2) The Motivation
3) Building the model theory: piece by piece
4) Results of the 2 use cases
5) Understanding exactly why it works
6) Implementation at InMobi scale
Use case 1 - Results CVR
Accuracy function: (ΣWᵢ * abs(Yactᵢ - Ypredᵢ))
ΣWᵢ
Model FFM DeepFM DeepFFM NFFM
Accuracy %
Improvement over
Linear model (small
DS)
44% 35% 48% 64%
Use case 1 - Results CVR
Training Data
Dates
Test Date Accuracy %
Improvement over
Linear Model
T1-T7 T7 21%
T1-T7 T8 14%
T2-T8 T8 20%
T2-T8 T9 14%
% Improvement over Tree
model
Cut1 21.7%
Cut2 18.5%
Use case 2 - Results VCR
Error Ftn(AEPV -
Absolute Error Per
View):
(Σ(Viewsᵢ-Cmpltdᵢ) * abs(Ypredᵢ) +(Cmpltdᵢ) * abs(1 - Ypredᵢ))
ΣViewsᵢ
Model / % AEPV
Improvement By
Country OS Cut
over last 7 day
Avg Model
Logistic Reg Logistic Reg(2nd
order
Autoregressive
features)
LR (GBT based
Feature
Engineering)
NFFM
Cut1 -3.71% 2.30% 2.51% 3.00%
Cut2 -2.16% 3.05% 4.48% 28.83%
Cut3 -0.31% -0.56% 5.65% 12.47%
Use case 2 - Results VCR
● LR with L2 Regularisation
● 2nd Order features were selected based on Information Gain criteria
● GBT package in spark Mlib was used(numTrees = 400, maxDepth=8,
sampling=0.5 minInstancePerNode = 10).
○ Training process was too slow, even with large enough resources.
○ Xgboost with Spark(tried later) was faster , and resulted in further Improvements
● NFFM: Increasing the number of layers till 3 resulted in further 20%
improvement in the validation errors, no significant improvement after that
Content
1) The problem and context
2) The Motivation
3) Building the model theory: piece by piece
4) Results of the 2 use cases
5) Understanding exactly why it works
6) Implementation at InMobi scale
Building the full intuition
Factorisation machine:
● Handling categorical features and sparse data matrix
● Extracting latent variables, e.g., identifying non-explicit segment profiles in the population
Field-aware:
● Dimensionality reduction (high cardinality features to K dimension representation)
● Increases degrees of freedom (compared to FM in terms field-specific values) to enable exhaustive
set of second-order interactions
Neural network:
● Explores and weight higher order interactions - went up to 3 layers of interaction sucessfully
● Generates numerical prediction
● Training the factors based on performance of both FM machine and Neural Nets (instead of training
them separately causing latent vectors to only be limited by power of FM)
Content
1) The problem and context
2) The Motivation
3) Building the model theory: piece by piece
4) Results of the 2 use cases
5) Understanding exactly why it works
6) Implementation at InMobi scale
Implementation details
● Hyper params are k, lambda, num layers, num nodes in layers, activation
functions
● Implemented in Tensorflow
● Adam optimizer
● L2 regularization. No dropouts
● No batch-normalization
● 1 layer 100 nodes performs good enough and saves compute
● ReLU activations (converges faster)
● k=16 (try with powers of 2)
● Weighted RMSE as loss function for both use cases
Predicting for unseen feature values
ESPN CNBC SONY UNKNOWN?
XA
0
XA
1
XA
2
XG
0
XG
1
XG
2
● Avg latent feature interactions per feature for unknown values
YA
0
YA
1
YA
2
YG
0
YG
1
YG
2
ZA
0
ZA
1
ZA
2
ZG
0
ZG
1
ZG
2
(XA
0+YA
0+ZA
0)/3
(XA
1+YA
1+ZA
1)/3
(XA
2+YA
2+ZA
2)/3
(XG
0+YG
0+ZG
0)/3
(XG
1+YG
1+ZG
1)/3
(XG
2+YG
2+ZG
2)/3
Implementing @ low-latency, high-scale
● MLeap: MLeap framework provides support for models trained both in Spark
and Tensorflow. Helps us train models in Spark for Tree based models and
TF models for NN based models
● Offline training and challenges: We cannot train TF models on yarn cluster
hence we use a GPU machine as gateway to pull data and from HDFS and
train on GPU
● Online serving challenges: TF serving has pretty low throughput and wasn’t
scaling for our QPS. Hence we are using local LRU cache with decent TTL to
scale the TF serving
Future research that we are currently pursuing...
● Hybrid Binning NFFM
● Distributed training and serving
● Dropouts & Batch Normalization
● Methods to interpret the latent-vector (Using methods like t-Distributed
Stochastic Neighbour Embedding (t-SNE) etc)
References
FM: https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf
FFM: http://research.criteo.com/ctr-prediction-linear-model-field-aware-factorization-machines/
DeepFM: https://arxiv.org/pdf/1703.04247.pdf
NFM: https://arxiv.org/pdf/1708.05027.pdf
GBT Based Feature Engg: http://quinonero.net/Publications/predicting-clicks-facebook.pdf
Thank You!

More Related Content

What's hot

Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
Lei Guo
 

What's hot (20)

Recommendation System
Recommendation SystemRecommendation System
Recommendation System
 
Developing Movie Recommendation System
Developing Movie Recommendation SystemDeveloping Movie Recommendation System
Developing Movie Recommendation System
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 
The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Visualizing the Model Selection Process
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection Process
 
Machine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyMachine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 Sydney
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learning
 
Classifying Text using CNN
Classifying Text using CNNClassifying Text using CNN
Classifying Text using CNN
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
META-LEARNING.pptx
META-LEARNING.pptxMETA-LEARNING.pptx
META-LEARNING.pptx
 
RecSysOps: Best Practices for Operating a Large-Scale Recommender System
RecSysOps: Best Practices for Operating a Large-Scale Recommender SystemRecSysOps: Best Practices for Operating a Large-Scale Recommender System
RecSysOps: Best Practices for Operating a Large-Scale Recommender System
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithms
 

Similar to Neural Field aware Factorization Machine

Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
Databricks
 
Iwsm2014 sizing the entire development process (mauricio aguiar & luigi bug...
Iwsm2014   sizing the entire development process (mauricio aguiar & luigi bug...Iwsm2014   sizing the entire development process (mauricio aguiar & luigi bug...
Iwsm2014 sizing the entire development process (mauricio aguiar & luigi bug...
Nesma
 

Similar to Neural Field aware Factorization Machine (20)

Automated Speech Recognition
Automated Speech Recognition Automated Speech Recognition
Automated Speech Recognition
 
Bitcoin Price Prediction
Bitcoin Price PredictionBitcoin Price Prediction
Bitcoin Price Prediction
 
IRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET- American Sign Language Classification
IRJET- American Sign Language Classification
 
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
NS-CUK Seminar: S.T.Nguyen, Review on "Do We Really Need Complicated Model Ar...
 
Icbai 2018 ver_1
Icbai 2018 ver_1Icbai 2018 ver_1
Icbai 2018 ver_1
 
Intelligent Systems Project: Bike sharing service modeling
Intelligent Systems Project: Bike sharing service modelingIntelligent Systems Project: Bike sharing service modeling
Intelligent Systems Project: Bike sharing service modeling
 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural Networks
 
STOCK MARKET PREDICTION USING NEURAL NETWORKS
STOCK MARKET PREDICTION USING NEURAL NETWORKSSTOCK MARKET PREDICTION USING NEURAL NETWORKS
STOCK MARKET PREDICTION USING NEURAL NETWORKS
 
Web-Based Online Embedded Security System And Alertness Via Social Media
Web-Based Online Embedded Security System And Alertness Via Social MediaWeb-Based Online Embedded Security System And Alertness Via Social Media
Web-Based Online Embedded Security System And Alertness Via Social Media
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSFACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
 
Realtime selenium interview questions
Realtime selenium interview questionsRealtime selenium interview questions
Realtime selenium interview questions
 
IRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry PiIRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
 
Jc nov.07.2019
Jc nov.07.2019Jc nov.07.2019
Jc nov.07.2019
 
MN-3, MN-Core and HPL - SC21 Green500 BOF
MN-3, MN-Core and HPL - SC21 Green500 BOFMN-3, MN-Core and HPL - SC21 Green500 BOF
MN-3, MN-Core and HPL - SC21 Green500 BOF
 
Iwsm2014 sizing the entire development process (mauricio aguiar & luigi bug...
Iwsm2014   sizing the entire development process (mauricio aguiar & luigi bug...Iwsm2014   sizing the entire development process (mauricio aguiar & luigi bug...
Iwsm2014 sizing the entire development process (mauricio aguiar & luigi bug...
 
Choosing the Right Transformer for Your Data Challenge
Choosing the Right Transformer for Your Data ChallengeChoosing the Right Transformer for Your Data Challenge
Choosing the Right Transformer for Your Data Challenge
 

More from InMobi

More from InMobi (20)

Responding to Coronavirus: How marketers can leverage digital responsibly
Responding to Coronavirus: How marketers can leverage digital responsiblyResponding to Coronavirus: How marketers can leverage digital responsibly
Responding to Coronavirus: How marketers can leverage digital responsibly
 
2020: Celebrating the Era of the Connected Consumer
2020: Celebrating the Era of the Connected Consumer2020: Celebrating the Era of the Connected Consumer
2020: Celebrating the Era of the Connected Consumer
 
Winning the Indian Festive Shopper in 2019
Winning the Indian Festive Shopper in 2019Winning the Indian Festive Shopper in 2019
Winning the Indian Festive Shopper in 2019
 
The Changing Face of the Indian Mobile User
The Changing Face of the Indian Mobile UserThe Changing Face of the Indian Mobile User
The Changing Face of the Indian Mobile User
 
Unlocking the True Potential of Data on Mobile
Unlocking the True Potential of Data on MobileUnlocking the True Potential of Data on Mobile
Unlocking the True Potential of Data on Mobile
 
InMobi State of Mobile Video Advertising Report 2018
InMobi State of Mobile Video Advertising Report 2018InMobi State of Mobile Video Advertising Report 2018
InMobi State of Mobile Video Advertising Report 2018
 
The Essential Mediation Toolkit - Korean
The Essential Mediation Toolkit - KoreanThe Essential Mediation Toolkit - Korean
The Essential Mediation Toolkit - Korean
 
A Comprehensive Guide for App Marketers
A Comprehensive Guide for App MarketersA Comprehensive Guide for App Marketers
A Comprehensive Guide for App Marketers
 
A Cure for Ad-Fraud: Turning Fraud Detection into Fraud Prevention
A Cure for Ad-Fraud: Turning Fraud Detection into Fraud PreventionA Cure for Ad-Fraud: Turning Fraud Detection into Fraud Prevention
A Cure for Ad-Fraud: Turning Fraud Detection into Fraud Prevention
 
[Webinar] driving accountability in mobile advertising
[Webinar] driving accountability in mobile advertising[Webinar] driving accountability in mobile advertising
[Webinar] driving accountability in mobile advertising
 
The Brand Marketer's Guide to Mobile Video Viewability
The Brand Marketer's Guide to Mobile Video ViewabilityThe Brand Marketer's Guide to Mobile Video Viewability
The Brand Marketer's Guide to Mobile Video Viewability
 
Top 2017 Mobile Advertising Trends in Indonesia
Top 2017 Mobile Advertising Trends in IndonesiaTop 2017 Mobile Advertising Trends in Indonesia
Top 2017 Mobile Advertising Trends in Indonesia
 
Mobile marketing strategy guide
Mobile marketing strategy guide Mobile marketing strategy guide
Mobile marketing strategy guide
 
InMobi Yearbook 2016
InMobi Yearbook 2016InMobi Yearbook 2016
InMobi Yearbook 2016
 
Boost Retention on Mobile and Keep Users Coming Back for More!
Boost Retention on Mobile and Keep Users Coming Back for More!Boost Retention on Mobile and Keep Users Coming Back for More!
Boost Retention on Mobile and Keep Users Coming Back for More!
 
Building Mobile Creatives that Deliver Real Results
Building Mobile Creatives that Deliver Real ResultsBuilding Mobile Creatives that Deliver Real Results
Building Mobile Creatives that Deliver Real Results
 
Everything you need to know about mobile video ads in india and apac
Everything you need to know about mobile video ads in india and apacEverything you need to know about mobile video ads in india and apac
Everything you need to know about mobile video ads in india and apac
 
The Golden Age of Mobile Video Advertising | Global
The Golden Age of Mobile Video Advertising | GlobalThe Golden Age of Mobile Video Advertising | Global
The Golden Age of Mobile Video Advertising | Global
 
Everything a developer needs to know about the mobile video ads
Everything a developer needs to know about the mobile video ads Everything a developer needs to know about the mobile video ads
Everything a developer needs to know about the mobile video ads
 
Programmatically Speaking with InMobi and Rubicon Project
Programmatically Speaking with InMobi and Rubicon ProjectProgrammatically Speaking with InMobi and Rubicon Project
Programmatically Speaking with InMobi and Rubicon Project
 

Recently uploaded

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 

Neural Field aware Factorization Machine

  • 1. Why Neural Net Field Aware Factorization Machines are able to break ground in digital behaviours prediction Presenter: Gunjan Sharma Co-Author: Varun Kumar Modi
  • 2. About the Authors Presenter: Gunjan Sharma System Architect @ InMobi (3 years) SE @Facebook (2.5 Years) DPE @Google (1 year) Twitter Handle: @gunjan_1409 LinkedIn: https://www.linkedin.com/in/gunjan- sharma-a6794414/ Co-author: Varun Kumar Modi Sr Research Scientist @ InMobi(5 years) LinkedIn: https://www.linkedin.com/in/varun- modi-33800652/
  • 3. Content 1) The problem and context 2) The Motivation 3) Building the model theory: piece by piece 4) Results of the 2 use cases 5) Understanding exactly why it works 6) Implementation at InMobi scale
  • 4. Content 1) The problem and context 2) The Motivation 3) Building the model theory: piece by piece 4) Results of the 2 use cases 5) Understanding exactly why it works 6) Implementation at InMobi scale
  • 5. InMobi is one of the largest advertising platform at scale globally InMobi reaches >2 billion MAU across the world - specialised in mobile In-app advertising JAPA N INDIA+ SEA CHINA Afri ca ANZ NORTH AMERICA KOREA EMEA Latin America LATIN AMERICA Afri ca AfricaAFRICA China APAC Consolidation has taken place to clean up the ecosystem few advertising platforms at scale exist North America (only Video) Very limited number of players have presence in Asia, InMobi is dominating Few players control each component of the chain; No presence of global players, except InMobi
  • 6. Problem stmt and why it matters ● What are the problems: Use case 1 - Conversion ratio (CVR) prediction: - CVR = Install rate of users = Probability of a install given a click - Usage: CPM = CTR * CVR * CPI Use case 2 - Video completion rate (VCR) prediction: - Video completion rate of users watching advertising videos given click ● Why are they important: ○ Performance business - based on arbitrage, so the model directly determines the margin/profit of the business and the ability of the campaign to achieve significant scale = > multi-million dollar businesses!
  • 7. Existing context and challenges ● Models traditionally used Linear/Logistic Regression and Tree-based models ● Both have their strengths and weaknesses when used in production ● What we need is an awesome model that sits somewhere in the middle and can bring in the best of both worlds LR Tree Based Generalise for unseen combinations Our use cases could not Potentially Underfit at times Potentially can overfit at times Requires lesser RAM Can at times bloat RAM usage specially with high cardinality features
  • 8. Content 1) The problem and context 2) The Motivation 3) Building the model theory: piece by piece 4) Results of the 2 use cases 5) Understanding exactly why it works 6) Implementation at InMobi scale
  • 9. Why think of NN for CVR/VCR prediction ● Using cross features in LR wasn’t cutting it for us. ● Plus at some point it starts to become cumbersome both at training and prediction time. ● All the major predictions noted here follow a complex curve ● LR left much to desire compared to Tree based models for example because interaction-terms are limited ● We tried couple of awesome models that were also not able to beat Tree based models We all agreed that Neural Nets are a suitable technology to find higher order interactions between our features At the same time they have the power of generalising to unseen combinations.
  • 10. Challenges Involved ● Traditionally NNs are more utilized for Classification problems ● We want to model our predictions as regression problem ● Most of the features are categorical which means we need to use one-hot encoding ● This causes NN to spew very bad results as they need a lot of data to train efficiently. ● Plus cardinality of some features is very high and it makes life more troublesome. ● Model should be easy to productionised both for training and serving ● Spark isn’t suited for custom NN networks. ● Model should be debuggable as much as possible to be able to explain the Business changes ● The resistance to using NN for a long time came because of the lack of understanding into their internals
  • 11. Content 1) The problem and context 2) The Motivation 3) Building the model theory: piece by piece 4) Results of the 2 use cases 5) Understanding exactly why it works 6) Implementation at InMobi scale
  • 12. Consider the following dummy dataset Publisher Advertiser Gender CVR ESPN Nike Male 0.01 CNBC Nike Male 0.0004 ESPN Adidas Female 0.008 Sony Coke Female 0.0005 Sony P&G Male 0.002
  • 13. Factorization Machine (FM) - What are those ESPN CNBC SONY Adi Nike Coke P&G Male Female X0 X1 X2 Y0 Y1 Y2 Z0 Z1 Z2 Publisher Advertiser Gender CVR ESPN Nike Male 0.01 1 0 0 0 1 0 0 1 0 = Publisher Latent Vector (PV) = Advertiser Latent Vector (AV) = Gender Latent Vector (GV) PVT*AV + AVT*GV + GVT*PV = pCVR NOTE: All vectors are K dimensional which is hyper parameter for the algorithm
  • 14. Factorization Machine (FM) - What are those ● K dimensional representation for every feature value ● Captures second order interactions across all the features (ATB = |A|*|B|*cos(Θ)) ● Essentially a combination of hyperbolas summed up to form the final prediction ● Works better than LR but tree based models are still more powerful. ● EG: Predict movie’s revenue: Features Movie City Gender Latent Features Horror Comedy Action Romance Second Order Intuition ● For every latent feature ● For every pair of original feature ● How much this latent feature affect revenue when considering these pair Final predicted revenue is linear sum over all latent features
  • 15. Field aware Factorization Machine (FFM) ESPN CNBC SONY Adi Nike Coke P&G Male Female XA 0 XA 1 XA 2 Publisher Advertiser Gender CVR ESPN Nike Male 0.01 1 0 0 0 1 0 0 1 0 PVA PVA T*AVP + AVG T*GVA + GVP T*PVG = pCVR NOTE: All vectors are K dimensional which is hyper parameter for the algorithm XG 0 XG 1 XG 2 PVG YP 0 YP 1 YP 2 AVP YG 0 YG 1 YG 2 AVG ZP 0 ZP 1 ZP 2 GVP ZA 0 ZA 1 ZA 2 GVA
  • 16. Field aware Factorization Machine (FFM) ● We have a K dimensional vector for every feature value for every other feature type ● Still second order interactions but with more degrees of freedom than FM ● Intuition: Latent features interact with every other cross feature differently Works significantly better than FM, but at certain cuts was still not able to beat Tree based model
  • 17. Deep neural-net with Factorisation Machine: DeepFM Sigmoid(FM + NeuralNet(PV :+ AV :+ GV)) = pCVR
  • 18. DeepFM ● Now we are entering the neural net world ● This model is a combination of FM and NN and the final prediction is sum of the output from the 2 models ● Here we optimize the entire graph together. ● It performs better than using the latent vectors from FM and then running them through neural net as a secondary optimization (FNN) ● It performs better than FM but not better than FFM ● Intuition: FM finds the second order interactions while neural net uses the latent vectors to find the higher order nonlinear interactions.
  • 19. Neural Factorization Machine: NFM NeuralNet((PV.*AV .+ AV.*GV .+ GV.*PV)T) = pCVR
  • 20. NFM ● In this architecture you only run the second order features through NN instead of the raw latent vectors ● Intuition: The neural net takes the second order interactions and uses them to find the higher order nonlinear interactions ● Performs better than DeepFM mostly attributed to the 2 facts ○ The size of the net is smaller hence converges faster. ○ The neural net can take the second order interactions and convert them easily to higher order interactions. ● Results were better than DeepFM as well. But still not better than FFM
  • 21. InMobi Spec: DeepFFM Feature1 F2E Dense Embeddings F3E F1E F3E F1E F2E Hidden Layers Act FF Machine Ypred Feature2 Feature3 Spare Features
  • 22. InMobi Spec: DeepFFM ● A simple upgrade to deepFM ● Performs better than both DeepFM and FFM ● Training is slower ● FFM part of things does the majority of the prediction heavy lifting. Evidently due to faster gradient convergence. ● Intuition: Take the latent vectors run them through NN for higher order interactions and use FFM for second order interactions.
  • 23. InMobi Spec: NFFM Feature1 F2E Dense Embeddings F3E F1E F3E F1E F2E Feature2 Feature3 Sparse Features FF Machine Hidden Layers ….... K inputs Ypred
  • 24. InMobi Spec: NFFM ● A simple upgrade to NFM ● Does better than everyone significantly. ● Converges faster than DeepFFM ● Intuition: Take the second order interactions from FFM and run them through neural net to find higher order nonlinear interactions.
  • 25. Content 1) The problem and context 2) The Motivation 3) Building the model theory: piece by piece 4) Results of the 2 use cases 5) Understanding exactly why it works 6) Implementation at InMobi scale
  • 26. Use case 1 - Results CVR Accuracy function: (ΣWᵢ * abs(Yactᵢ - Ypredᵢ)) ΣWᵢ Model FFM DeepFM DeepFFM NFFM Accuracy % Improvement over Linear model (small DS) 44% 35% 48% 64%
  • 27. Use case 1 - Results CVR Training Data Dates Test Date Accuracy % Improvement over Linear Model T1-T7 T7 21% T1-T7 T8 14% T2-T8 T8 20% T2-T8 T9 14% % Improvement over Tree model Cut1 21.7% Cut2 18.5%
  • 28. Use case 2 - Results VCR Error Ftn(AEPV - Absolute Error Per View): (Σ(Viewsᵢ-Cmpltdᵢ) * abs(Ypredᵢ) +(Cmpltdᵢ) * abs(1 - Ypredᵢ)) ΣViewsᵢ Model / % AEPV Improvement By Country OS Cut over last 7 day Avg Model Logistic Reg Logistic Reg(2nd order Autoregressive features) LR (GBT based Feature Engineering) NFFM Cut1 -3.71% 2.30% 2.51% 3.00% Cut2 -2.16% 3.05% 4.48% 28.83% Cut3 -0.31% -0.56% 5.65% 12.47%
  • 29. Use case 2 - Results VCR ● LR with L2 Regularisation ● 2nd Order features were selected based on Information Gain criteria ● GBT package in spark Mlib was used(numTrees = 400, maxDepth=8, sampling=0.5 minInstancePerNode = 10). ○ Training process was too slow, even with large enough resources. ○ Xgboost with Spark(tried later) was faster , and resulted in further Improvements ● NFFM: Increasing the number of layers till 3 resulted in further 20% improvement in the validation errors, no significant improvement after that
  • 30. Content 1) The problem and context 2) The Motivation 3) Building the model theory: piece by piece 4) Results of the 2 use cases 5) Understanding exactly why it works 6) Implementation at InMobi scale
  • 31. Building the full intuition Factorisation machine: ● Handling categorical features and sparse data matrix ● Extracting latent variables, e.g., identifying non-explicit segment profiles in the population Field-aware: ● Dimensionality reduction (high cardinality features to K dimension representation) ● Increases degrees of freedom (compared to FM in terms field-specific values) to enable exhaustive set of second-order interactions Neural network: ● Explores and weight higher order interactions - went up to 3 layers of interaction sucessfully ● Generates numerical prediction ● Training the factors based on performance of both FM machine and Neural Nets (instead of training them separately causing latent vectors to only be limited by power of FM)
  • 32. Content 1) The problem and context 2) The Motivation 3) Building the model theory: piece by piece 4) Results of the 2 use cases 5) Understanding exactly why it works 6) Implementation at InMobi scale
  • 33. Implementation details ● Hyper params are k, lambda, num layers, num nodes in layers, activation functions ● Implemented in Tensorflow ● Adam optimizer ● L2 regularization. No dropouts ● No batch-normalization ● 1 layer 100 nodes performs good enough and saves compute ● ReLU activations (converges faster) ● k=16 (try with powers of 2) ● Weighted RMSE as loss function for both use cases
  • 34. Predicting for unseen feature values ESPN CNBC SONY UNKNOWN? XA 0 XA 1 XA 2 XG 0 XG 1 XG 2 ● Avg latent feature interactions per feature for unknown values YA 0 YA 1 YA 2 YG 0 YG 1 YG 2 ZA 0 ZA 1 ZA 2 ZG 0 ZG 1 ZG 2 (XA 0+YA 0+ZA 0)/3 (XA 1+YA 1+ZA 1)/3 (XA 2+YA 2+ZA 2)/3 (XG 0+YG 0+ZG 0)/3 (XG 1+YG 1+ZG 1)/3 (XG 2+YG 2+ZG 2)/3
  • 35. Implementing @ low-latency, high-scale ● MLeap: MLeap framework provides support for models trained both in Spark and Tensorflow. Helps us train models in Spark for Tree based models and TF models for NN based models ● Offline training and challenges: We cannot train TF models on yarn cluster hence we use a GPU machine as gateway to pull data and from HDFS and train on GPU ● Online serving challenges: TF serving has pretty low throughput and wasn’t scaling for our QPS. Hence we are using local LRU cache with decent TTL to scale the TF serving
  • 36. Future research that we are currently pursuing... ● Hybrid Binning NFFM ● Distributed training and serving ● Dropouts & Batch Normalization ● Methods to interpret the latent-vector (Using methods like t-Distributed Stochastic Neighbour Embedding (t-SNE) etc)
  • 37. References FM: https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf FFM: http://research.criteo.com/ctr-prediction-linear-model-field-aware-factorization-machines/ DeepFM: https://arxiv.org/pdf/1703.04247.pdf NFM: https://arxiv.org/pdf/1708.05027.pdf GBT Based Feature Engg: http://quinonero.net/Publications/predicting-clicks-facebook.pdf