This study attempted to formulate a predictive model that identifies whether a customer is probable to switch telecommunications providers (Churn) or stay with the company. We started with a Logistic Regression classifier, and moved on to methods such as Decision Tree, Random Forest, XGBoost, Adaboost, SVM, KNN and Naive- Bayes. We concluded that the best predictive model we could find was XGBoost, which manages to identify correctly almost all the non-churners and the vast majority of the churners. Closely trailing was the Decision Tree model, which is more easily interpretable and applicable in real business problems.
On the other hand, Cluster Analysis was a bit more challenging. The Hierarchical Clustering methods we used weren’t very effective. Using the Mahalanobis distance and the Gower distance, we managed to produce 2 clustering methods with Silhouette values equal to 0.2. Using the K-Means method, the results became a little bit better, especially using Principal Components and creating 4 clusters.
Predicting Customer Churn in Telecom (Corporate Presentation)
1. SOTIRIOS BARATSAS
MSc in Business Analytics
sotbaratsas@gmail.com
PREDICTING
CUSTOMER CHURN
USING CLASSIFICATION & CLUSTERING
2. The Problem
Over the previous period
~15%
Churn Rate
*based on a sample of 3333 customers
How can we predict which
customers are probable to churn?
3. Classifying Churners & Non-Churners
M E T H O D S C O M P A R I S O N
We attempted to formulate a model with good predictive
abilities, that identifies both churners and non-churners.
Model_ID Description McFaddens R^2
Cox and Snell
R^2
Nagelkerke R^2
Hosmer
Lemeshow
p-value
Model15
Unifying the charges under one variable “Domestic.Charge” > Stepwise with
AIC (multicollinearity fixed) 0.257 0.185 0.338 0.247
Model2
Starting with all the variables, and performing Stepwise Selection with AIC
(multicollinearity fixed)
0.258 0.186 0.338 0.117
Model3 Starting with all the variables, and performing Stepwise Selection with BIC 0.258 0.186 0.338 0.117
Model9
Unifying the number of calls under one variable “Domestic.Calls” > Stepwise
with AIC (multicollinearity fixed)
0.258 0.186 0.338 0.117
Model6
Unifying the minutes under one variable “Domestic.Mins” > Stepwise Selection
with AIC (multicollinearity fixed)
0.258 0.186 0.338 0.108
Model12
Domestic.Calls + Domestic.Mins (aggregates) > Stepwise with AIC
(multicollinearity fixed)
0.258 0.186 0.338 0.108
4. Classifying Churners & Non-Churners
M E T H O D S C O M P A R I S O N
We attempted to formulate a model with good predictive
abilities, that identifies both churners and non-churners.
Classification Model Accuracy
Sensitivity
(Recall)
Specificity AUROC Kappa Value
XGBoost 0.977 0.849 0.997 0.922 0.89
Adaboost 0.975 0.833 0.997 0.915 0.88
Decision Tree 0.974 0.826 0.998 0.912 0.88
Random Forest 0.969 0.788 0.997 0.892 0.85
Support Vector Machines (SVM) 0.893 0.235 0.994 0.614 0.33
Logistic Regression Classifier 0.873 0.265 0.966 0.616 0.26
K-Means (6) 0.837 0.083 0.954 0.518 0.05
Naive Bayes 0.273 0.765 0.197 0.481 -0.01
5. XGBoost was the best performing model
in 4 out of 5 key metrics
Best Performing Model
ü It classified correctly 97,7% of
the validation customers
ü It managed to classify almost all the
non-churners correctly (99,7%)
ü It was the best-performer in classifying
churners correctly (84,9%)
Reference
Prediction
Not Churn Churn
Not Churn 855 20
Churn 3 112
6. Using the Decision Tree classifier we can have equally
good performance and great interpretability
Good performance & intepretability
ü It classified correctly 97,4% of
the validation customers
ü It was the best model at classifying
non-churners correctly (99,8%)
ü It also performed good in classifying
churners correctly (82,6%)
Reference
Prediction
Not Churn Churn
Not Churn 856 23
Churn 2 109
7. Decision Tree Example
Q3: “Are his/her Total Charges over
the previous period greater than or
equal to 54?”
Answer: YES
Q1: “Are his/her total charges over
the previous period, less than 72?”
Answer: YES
Q2: “Did he/she perform less than
4 customer service calls during the
previous period?”
Answer: NO
We can predict that this customer
is probable to Churn.
8. Hierarchical Clustering
We can identify 2 or 3 clusters of customers, but the
separation between them is not very distinct.
9. A Better Approach: Principal Components & K-Means
Transform the data we have about the customers, using Principal Components
and then perform K-Means clustering to identify 4 clusters
10. Thank You
for your attention
Sotiris Baratsas
sotbaratsas@gmail.com
MSc in Business Analytics