SlideShare a Scribd company logo
1 of 36
Download to read offline
Customer Churn Analytics
using Microsoft R Open
Malaysia R User Group Meet Up
16th February 2017
Poo Kuan Hoong
https://github.com/kuanhoong/churn-r
Disclaimer: The views and opinions expressed in this slides are those of
the author and do not necessarily reflect the official policy or position
of Nielsen Malaysia. Examples of analysis performed within this slides
are only examples. They should not be utilized in real-world analytic
products as they are based only on very limited and dated open source
information. Assumptions made within the analysis are not reflective of
the position of Nielsen Malaysia.
Agenda
• Introduction
• Customer Churn Analytics
• Machine Learning Framework
• Microsoft R Open and Visual Studio
• Model Performance Comparison
• Demo
Malaysia R User Group (MyRUG)
• The Malaysia R User Group (MyRUG) was formed on June 2016.
• It is a diverse group that come together to discuss anything related to
the R programming language.
• The main aim of MyRUG is to provide members ranging from
beginners to R professionals and experts to share and learn about R
programming and gain competency as well as share new ideas or
knowledge.
https://www.meetup.com/MY-RUserGroup/
https://www.facebook.com/rusergroupmalaysia/
Introduction
• Customer churn can be defined
simply as the rate at which a
company is losing its customers
• Imagine the business as a bucket
with holes, the water flowing from
the top is the growth rate, while the
holes at the bottom is churn
• While a certain level of churn is
unavoidable, it is important to keep
it under control, as high churn rate
can potentially kill your business
Churn analytics
• Predicting who will switch mobile operator
Customer churn - who do customers change
operators?
• The top 3 reasons why
subscribers change providers:
• They want a new handset
• They believe they pay too
much for calls/data
• Providers do not offer
additional loyalty benefits
Data Collection
Data Preprocessing
Attributes selection
• Attribute 1
• Attribute 2
• Attribute 3
Algorithm
Training Model Score Model
Apply Data
/Test Data
Predicting Output
Initialization Step Learn Step Apply Step
Machine Learning Framework
Correlation Matrix
• correlation matrix, which
is used to investigate the
dependence between
multiple variables at the
same time.
Microsoft R Open
• Microsoft R Open, formerly known as Revolution R Open (RRO), is
the enhanced distribution of R from Microsoft Corporation.
• It is a complete open source platform for statistical analysis and
data science.
Key enhancement
• Multi-threaded math libraries that brings multi-threaded
computations to R.
• A high-performance default CRAN repository that provide a
consistent and static set of packages to all Microsoft R Open users.
• The checkpoint package that make it easy to share R code and
replicate results using specific R package versions.
R Tools for Visual Studio
• Turn Visual Studio into a powerful R development environment
• Download R Tools for Visual Studio
R Tools for Visual Studio
• Visual Studio IDE
• Intellisense
• Enhanced multi-threaded math libs, cluster scale computing, and a
high performance CRAN repo with checkpoint capabilities.
• Learn more about R Tools from here:
https://microsoft.github.io/RTVS-docs/
Data Collection
Data Preprocessing
Attributes selection
• Attribute 1
• Attribute 2
• Attribute 3
Algorithm
Training Model Score Model
Apply Data
/Test Data
Predicting Output
Initialization Step Learn Step Apply Step
Machine Learning Framework
Data Preprocessing
• Assign missing values
as zero
• Detect outliers
• Remove unwanted
variables
• Recode variables
• Convert categorical
variables
Data Collection
Data Preprocessing
Attributes selection
• Attribute 1
• Attribute 2
• Attribute 3
Algorithm
Training Model Score Model
Apply Data
/Test Data
Predicting Output
Initialization Step Learn Step Apply Step
Machine Learning Framework
Features selection
• The process of selecting a subset of relevant features (variables,
predictors) for use in model construction.
• Feature selection techniques are used for three reasons:
• simplification of models to make them easier to interpret by researchers/users,
• shorter training times,
• enhanced generalization by reducing overfitting
Correlation Matrix
Models Performance Comparison
• Logistic Regression
• is a regression model where the dependent variable (DV) is categorical.
• Support Vector Machine
• SVM is a supervised learning model with associated learning algorithms that
analyze data used for classification and regression analysis.
• RandomForest
• is an ensemble learning method for classification, regression and other tasks,
that operate by constructing a multitude of decision trees at training time and
outputting the class that is the mode of the classes (classification) or mean
prediction (regression) of the individual trees.
Data Collection
Data Preprocessing
Attributes selection
• Attribute 1
• Attribute 2
• Attribute 3
Algorithm
Training Model Score Model
Apply Data
/Test Data
Predicting Output
Initialization Step Learn Step Apply Step
Machine Learning Framework
Training Model and Algorithm
• Split the data set into 80:20
using library(caret)
• Apply the algorithms: GLM,
SVM and RF
Data Collection
Data Preprocessing
Attributes selection
• Attribute 1
• Attribute 2
• Attribute 3
Algorithm
Training Model Score Model
Apply Data
/Test Data
Predicting Output
Initialization Step Learn Step Apply Step
Machine Learning Framework
Score Model
• Confusion Matrix: a table that is often used to describe the
performance of a classification model (or "classifier") on a set of test
data for which the true values are known.
• true positives (TP): These are cases in which we predicted yes (they have the
disease), and they do churn.
• true negatives (TN): We predicted no, and they don't churn.
• false positives (FP): We predicted yes, but they don't actually churn. (Also
known as a "Type I error.")
• false negatives (FN): We predicted no, but they actually do churn. (Also
known as a "Type II error.")
Confusion Matrix: Generalized Linear Model
(glm)
n=1407
Predicted:
NO
Predicted:
YES
Actual:
NO
TN = 919
(0.653)
FP = 115
(0.082)
1034
Actual:
YES
FN = 167
(0.119)
TP = 206
(0.146)
373
1086 321
Confusion Matrix: Support Vector Machine
(SVM)
n=1407
Predicted:
NO
Predicted:
YES
Actual:
NO
TN= 929
(0.660)
FP= 105
(0.075)
1034
Actual:
YES
FN= 183
(0.130)
TP= 190
(0.135)
373
1112 295
Confusion Matrix: RandomForest
n=1407
Predicted:
NO
Predicted:
YES
Actual:
NO
TN= 939
(0.667)
FP= 95
(0.068)
1034
Actual:
YES
FN= 181
(0.129)
TP= 192
(0.136)
373
1120 287
Receiver Operating Characteristic (ROC) curve
• ROC curve is a graphical plot that illustrates the performance of a
binary classifier system as its discrimination threshold is varied. The
curve is created by plotting the true positive rate (TPR) against the
false positive rate (FPR) at various threshold settings.
Models comparison
• ROC illustrates the
performance of a binary
classifier system as its
discrimination threshold is
varied.
Microsoft R Open vs R
Data Collection
Data Preprocessing
Attributes selection
• Attribute 1
• Attribute 2
• Attribute 3
Algorithm
Training Model Score Model
Apply Data
/Test Data
Predicting Output
Initialization Step Learn Step Apply Step
Machine Learning Framework
Predict test data
• Based on the training
model, select the best
model to be used for test
data prediction
Thanks!
Questions?
@kuanhoong
https://www.linkedin.com/in/kuanhoong
kuanhoong@gmail.com
DEMO

More Related Content

What's hot

Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom IndustrySatyam Barsaiyan
 
Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.Kuldeep Mahani
 
churn prediction in telecom
churn prediction in telecom churn prediction in telecom
churn prediction in telecom Hong Bui Van
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaRahul Bhatia
 
Customer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomCustomer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomChris Chen
 
Churn customer analysis
Churn customer analysisChurn customer analysis
Churn customer analysisDr.Bechoo Lal
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationVishva Abeyrathne
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data ScienceCarolyn Knight
 
Customer Churn Prevention Powerpoint Presentation Slides
Customer Churn Prevention Powerpoint Presentation SlidesCustomer Churn Prevention Powerpoint Presentation Slides
Customer Churn Prevention Powerpoint Presentation SlidesSlideTeam
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptxAniket Patil
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industryskewdlogix
 
Customer attrition and churn modeling
Customer attrition and churn modelingCustomer attrition and churn modeling
Customer attrition and churn modelingMariya Korsakova
 
Customer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesCustomer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesSindhujanDhayalan
 
Information Management Strategy
Information Management StrategyInformation Management Strategy
Information Management StrategyDoug Devitre
 

What's hot (20)

Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom Industry
 
Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.
 
churn prediction in telecom
churn prediction in telecom churn prediction in telecom
churn prediction in telecom
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 
Customer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomCustomer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in Telecom
 
Churn modelling
Churn modellingChurn modelling
Churn modelling
 
Churn customer analysis
Churn customer analysisChurn customer analysis
Churn customer analysis
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using Classification
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
 
Customer Churn Prevention Powerpoint Presentation Slides
Customer Churn Prevention Powerpoint Presentation SlidesCustomer Churn Prevention Powerpoint Presentation Slides
Customer Churn Prevention Powerpoint Presentation Slides
 
Telecom customer churn prediction
Telecom customer churn predictionTelecom customer churn prediction
Telecom customer churn prediction
 
DTH Case Study
DTH Case StudyDTH Case Study
DTH Case Study
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industry
 
Customer attrition and churn modeling
Customer attrition and churn modelingCustomer attrition and churn modeling
Customer attrition and churn modeling
 
Customer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesCustomer churn classification using machine learning techniques
Customer churn classification using machine learning techniques
 
Customer churn prediction in banking
Customer churn prediction in bankingCustomer churn prediction in banking
Customer churn prediction in banking
 
Information Management Strategy
Information Management StrategyInformation Management Strategy
Information Management Strategy
 
Churn analysis presentation
Churn analysis presentationChurn analysis presentation
Churn analysis presentation
 

Similar to Customer Churn Prediction using R and Machine Learning

Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusoneDotNetCampus
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATADotNetCampus
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaDatabricks
 
Building largescalepredictionsystemv1
Building largescalepredictionsystemv1Building largescalepredictionsystemv1
Building largescalepredictionsystemv1arthi v
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptxDr.Shweta
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxnarmeen11
 
Understanding Mahout classification documentation
Understanding Mahout  classification documentationUnderstanding Mahout  classification documentation
Understanding Mahout classification documentationNaveen Kumar
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detectionjagan477830
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark MLAhmet Bulut
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxkprasad8
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 antimo musone
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerDatabricks
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
Testing of Object-Oriented Software
Testing of Object-Oriented SoftwareTesting of Object-Oriented Software
Testing of Object-Oriented SoftwarePraveen Penumathsa
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 

Similar to Customer Churn Prediction using R and Machine Learning (20)

Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
 
Building largescalepredictionsystemv1
Building largescalepredictionsystemv1Building largescalepredictionsystemv1
Building largescalepredictionsystemv1
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
Understanding Mahout classification documentation
Understanding Mahout  classification documentationUnderstanding Mahout  classification documentation
Understanding Mahout classification documentation
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Testing of Object-Oriented Software
Testing of Object-Oriented SoftwareTesting of Object-Oriented Software
Testing of Object-Oriented Software
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 

More from Poo Kuan Hoong

Build an efficient Machine Learning model with LightGBM
Build an efficient Machine Learning model with LightGBMBuild an efficient Machine Learning model with LightGBM
Build an efficient Machine Learning model with LightGBMPoo Kuan Hoong
 
Tensor flow 2.0 what's new
Tensor flow 2.0  what's newTensor flow 2.0  what's new
Tensor flow 2.0 what's newPoo Kuan Hoong
 
The future outlook and the path to be Data Scientist
The future outlook and the path to be Data ScientistThe future outlook and the path to be Data Scientist
The future outlook and the path to be Data ScientistPoo Kuan Hoong
 
Data Driven Organization and Data Commercialization
Data Driven Organization and Data CommercializationData Driven Organization and Data Commercialization
Data Driven Organization and Data CommercializationPoo Kuan Hoong
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewPoo Kuan Hoong
 
Explore and Have Fun with TensorFlow: Transfer Learning
Explore and Have Fun with TensorFlow: Transfer LearningExplore and Have Fun with TensorFlow: Transfer Learning
Explore and Have Fun with TensorFlow: Transfer LearningPoo Kuan Hoong
 
Explore and have fun with TensorFlow: An introductory to TensorFlow
Explore and have fun with TensorFlow: An introductory	to TensorFlowExplore and have fun with TensorFlow: An introductory	to TensorFlow
Explore and have fun with TensorFlow: An introductory to TensorFlowPoo Kuan Hoong
 
The path to be a Data Scientist
The path to be a Data ScientistThe path to be a Data Scientist
The path to be a Data ScientistPoo Kuan Hoong
 
Deep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenDeep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenPoo Kuan Hoong
 
Microsoft APAC Machine Learning & Data Science Community Bootcamp
Microsoft APAC Machine Learning & Data Science Community BootcampMicrosoft APAC Machine Learning & Data Science Community Bootcamp
Microsoft APAC Machine Learning & Data Science Community BootcampPoo Kuan Hoong
 
Machine Learning and Deep Learning with R
Machine Learning and Deep Learning with RMachine Learning and Deep Learning with R
Machine Learning and Deep Learning with RPoo Kuan Hoong
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientistPoo Kuan Hoong
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningPoo Kuan Hoong
 
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RHandwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RPoo Kuan Hoong
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep LearningPoo Kuan Hoong
 
Machine learning and big data
Machine learning and big dataMachine learning and big data
Machine learning and big dataPoo Kuan Hoong
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learningPoo Kuan Hoong
 
Context Aware Road Traffic Speech Information System from Social Media
Context Aware Road Traffic Speech Information System from Social MediaContext Aware Road Traffic Speech Information System from Social Media
Context Aware Road Traffic Speech Information System from Social MediaPoo Kuan Hoong
 

More from Poo Kuan Hoong (20)

Build an efficient Machine Learning model with LightGBM
Build an efficient Machine Learning model with LightGBMBuild an efficient Machine Learning model with LightGBM
Build an efficient Machine Learning model with LightGBM
 
Tensor flow 2.0 what's new
Tensor flow 2.0  what's newTensor flow 2.0  what's new
Tensor flow 2.0 what's new
 
The future outlook and the path to be Data Scientist
The future outlook and the path to be Data ScientistThe future outlook and the path to be Data Scientist
The future outlook and the path to be Data Scientist
 
Data Driven Organization and Data Commercialization
Data Driven Organization and Data CommercializationData Driven Organization and Data Commercialization
Data Driven Organization and Data Commercialization
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
Explore and Have Fun with TensorFlow: Transfer Learning
Explore and Have Fun with TensorFlow: Transfer LearningExplore and Have Fun with TensorFlow: Transfer Learning
Explore and Have Fun with TensorFlow: Transfer Learning
 
Deep Learning with R
Deep Learning with RDeep Learning with R
Deep Learning with R
 
Explore and have fun with TensorFlow: An introductory to TensorFlow
Explore and have fun with TensorFlow: An introductory	to TensorFlowExplore and have fun with TensorFlow: An introductory	to TensorFlow
Explore and have fun with TensorFlow: An introductory to TensorFlow
 
The path to be a Data Scientist
The path to be a Data ScientistThe path to be a Data Scientist
The path to be a Data Scientist
 
Deep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenDeep Learning with Microsoft R Open
Deep Learning with Microsoft R Open
 
Microsoft APAC Machine Learning & Data Science Community Bootcamp
Microsoft APAC Machine Learning & Data Science Community BootcampMicrosoft APAC Machine Learning & Data Science Community Bootcamp
Microsoft APAC Machine Learning & Data Science Community Bootcamp
 
Machine Learning and Deep Learning with R
Machine Learning and Deep Learning with RMachine Learning and Deep Learning with R
Machine Learning and Deep Learning with R
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientist
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep Learning
 
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RHandwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with R
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Machine learning and big data
Machine learning and big dataMachine learning and big data
Machine learning and big data
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
 
Context Aware Road Traffic Speech Information System from Social Media
Context Aware Road Traffic Speech Information System from Social MediaContext Aware Road Traffic Speech Information System from Social Media
Context Aware Road Traffic Speech Information System from Social Media
 

Recently uploaded

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 

Recently uploaded (20)

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 

Customer Churn Prediction using R and Machine Learning

  • 1. Customer Churn Analytics using Microsoft R Open Malaysia R User Group Meet Up 16th February 2017 Poo Kuan Hoong https://github.com/kuanhoong/churn-r
  • 2. Disclaimer: The views and opinions expressed in this slides are those of the author and do not necessarily reflect the official policy or position of Nielsen Malaysia. Examples of analysis performed within this slides are only examples. They should not be utilized in real-world analytic products as they are based only on very limited and dated open source information. Assumptions made within the analysis are not reflective of the position of Nielsen Malaysia.
  • 3. Agenda • Introduction • Customer Churn Analytics • Machine Learning Framework • Microsoft R Open and Visual Studio • Model Performance Comparison • Demo
  • 4. Malaysia R User Group (MyRUG) • The Malaysia R User Group (MyRUG) was formed on June 2016. • It is a diverse group that come together to discuss anything related to the R programming language. • The main aim of MyRUG is to provide members ranging from beginners to R professionals and experts to share and learn about R programming and gain competency as well as share new ideas or knowledge.
  • 7.
  • 8. Introduction • Customer churn can be defined simply as the rate at which a company is losing its customers • Imagine the business as a bucket with holes, the water flowing from the top is the growth rate, while the holes at the bottom is churn • While a certain level of churn is unavoidable, it is important to keep it under control, as high churn rate can potentially kill your business
  • 9.
  • 10. Churn analytics • Predicting who will switch mobile operator
  • 11. Customer churn - who do customers change operators? • The top 3 reasons why subscribers change providers: • They want a new handset • They believe they pay too much for calls/data • Providers do not offer additional loyalty benefits
  • 12. Data Collection Data Preprocessing Attributes selection • Attribute 1 • Attribute 2 • Attribute 3 Algorithm Training Model Score Model Apply Data /Test Data Predicting Output Initialization Step Learn Step Apply Step Machine Learning Framework
  • 13. Correlation Matrix • correlation matrix, which is used to investigate the dependence between multiple variables at the same time.
  • 14. Microsoft R Open • Microsoft R Open, formerly known as Revolution R Open (RRO), is the enhanced distribution of R from Microsoft Corporation. • It is a complete open source platform for statistical analysis and data science. Key enhancement • Multi-threaded math libraries that brings multi-threaded computations to R. • A high-performance default CRAN repository that provide a consistent and static set of packages to all Microsoft R Open users. • The checkpoint package that make it easy to share R code and replicate results using specific R package versions.
  • 15. R Tools for Visual Studio • Turn Visual Studio into a powerful R development environment • Download R Tools for Visual Studio
  • 16. R Tools for Visual Studio • Visual Studio IDE • Intellisense • Enhanced multi-threaded math libs, cluster scale computing, and a high performance CRAN repo with checkpoint capabilities. • Learn more about R Tools from here: https://microsoft.github.io/RTVS-docs/
  • 17. Data Collection Data Preprocessing Attributes selection • Attribute 1 • Attribute 2 • Attribute 3 Algorithm Training Model Score Model Apply Data /Test Data Predicting Output Initialization Step Learn Step Apply Step Machine Learning Framework
  • 18. Data Preprocessing • Assign missing values as zero • Detect outliers • Remove unwanted variables • Recode variables • Convert categorical variables
  • 19. Data Collection Data Preprocessing Attributes selection • Attribute 1 • Attribute 2 • Attribute 3 Algorithm Training Model Score Model Apply Data /Test Data Predicting Output Initialization Step Learn Step Apply Step Machine Learning Framework
  • 20. Features selection • The process of selecting a subset of relevant features (variables, predictors) for use in model construction. • Feature selection techniques are used for three reasons: • simplification of models to make them easier to interpret by researchers/users, • shorter training times, • enhanced generalization by reducing overfitting
  • 22. Models Performance Comparison • Logistic Regression • is a regression model where the dependent variable (DV) is categorical. • Support Vector Machine • SVM is a supervised learning model with associated learning algorithms that analyze data used for classification and regression analysis. • RandomForest • is an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.
  • 23. Data Collection Data Preprocessing Attributes selection • Attribute 1 • Attribute 2 • Attribute 3 Algorithm Training Model Score Model Apply Data /Test Data Predicting Output Initialization Step Learn Step Apply Step Machine Learning Framework
  • 24. Training Model and Algorithm • Split the data set into 80:20 using library(caret) • Apply the algorithms: GLM, SVM and RF
  • 25. Data Collection Data Preprocessing Attributes selection • Attribute 1 • Attribute 2 • Attribute 3 Algorithm Training Model Score Model Apply Data /Test Data Predicting Output Initialization Step Learn Step Apply Step Machine Learning Framework
  • 26. Score Model • Confusion Matrix: a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known. • true positives (TP): These are cases in which we predicted yes (they have the disease), and they do churn. • true negatives (TN): We predicted no, and they don't churn. • false positives (FP): We predicted yes, but they don't actually churn. (Also known as a "Type I error.") • false negatives (FN): We predicted no, but they actually do churn. (Also known as a "Type II error.")
  • 27. Confusion Matrix: Generalized Linear Model (glm) n=1407 Predicted: NO Predicted: YES Actual: NO TN = 919 (0.653) FP = 115 (0.082) 1034 Actual: YES FN = 167 (0.119) TP = 206 (0.146) 373 1086 321
  • 28. Confusion Matrix: Support Vector Machine (SVM) n=1407 Predicted: NO Predicted: YES Actual: NO TN= 929 (0.660) FP= 105 (0.075) 1034 Actual: YES FN= 183 (0.130) TP= 190 (0.135) 373 1112 295
  • 29. Confusion Matrix: RandomForest n=1407 Predicted: NO Predicted: YES Actual: NO TN= 939 (0.667) FP= 95 (0.068) 1034 Actual: YES FN= 181 (0.129) TP= 192 (0.136) 373 1120 287
  • 30. Receiver Operating Characteristic (ROC) curve • ROC curve is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.
  • 31. Models comparison • ROC illustrates the performance of a binary classifier system as its discrimination threshold is varied.
  • 33. Data Collection Data Preprocessing Attributes selection • Attribute 1 • Attribute 2 • Attribute 3 Algorithm Training Model Score Model Apply Data /Test Data Predicting Output Initialization Step Learn Step Apply Step Machine Learning Framework
  • 34. Predict test data • Based on the training model, select the best model to be used for test data prediction
  • 36. DEMO