SlideShare a Scribd company logo
1 of 46
CAPSTONE PROJECT TITLE: Customer
Churn Analysis. Presented by :- PALLAVI MOHANTY
PROJECT CONTENT
I. Introduction and Problem Statement
II. Data Loading
III. Data Exploring
IV. Data Cleaning
IV.1. Binning
V. Data Visualization
V.1. Univariate Analysis
V.2. Bivariate Analysis
VI. Feature Engineering
VII. Data Preprocessing
VIII. Train – Test Split
IX. Feature Scaling
X. Smoteenn
XI. Model Building and Evaluation
XII. Model Comparison
CUSTOMER
CHURN
I. INTRODUCTION
Q. What is Customer Churn?
• Customer churn is defined as when customers or subscribers
discontinue doing business with a firm or service
• Each row represents a customer, each column contains
customer’s attributes described on the column Metadata.
The data set includes information about:
• Customers who left within the last month – the column is called
Churn .
• Services that each customer has signed up for – phone, multiple
lines, internet, online security, online backup, device protection,
tech support, and streaming TV and movies.
• Customer account information – how long they’ve been a
customer, contract, payment method, paperless billing, monthly
charges, and total charges.
• Demographic information about customers – Customer ID,
gender, and if they have partners and dependents.
THIS IS A CLASSIC TELECOM CHURN USECASE.
PROBLEM STATEMENT
The target variable Telco Churn dataset typically revolves
around predicting customer churn. It has only two possible
outcomes: churn or not churn (Binary Classification). "Churn" refers
to the scenario where customers who are likely to cancel their
contracts soon. In the telecom industry, customer churn can be a
significant issue, as it can lead to revenue loss. If the company can
predict that, it can handle users before churn.
APPROACH TO SOLVE PROBLEM
STATEMENT
1. Exploratory Data Analysis (EDA) to understand data patterns
and relationships.
2. Data preprocessing, including handling missing values,
encoding categorical variables, and feature scaling.
3. Splitting the dataset into training and testing sets.
4. Building and training machine learning models for churn
prediction.
5. Evaluating model performance using metrics like accuracy,
precision, recall, and F1-score.
6. Good accuracy model is chosen.
7. Providing recommendations based on model insights.
The ultimate goal is to help the telecom company proactively
identify customers at risk of leaving, allowing them to implement
targeted retention strategies and improve customer satisfaction.
II. DATA LOADING
• Importing the necessary libraries for data analysis and visualization,
ensuring that visualizations are displayed inline.
• Reading a CSV file located at the specified path and assigning it to a
pandas DataFrame called ‘telco_churn’ for further analysis.
• It is commonly used at the beginning of a data analysis and
machine learning project to set up the environment, loading the
dataset, and preparing for exploration and visualization. It is
particularly useful for interactive data analysis.
Displaying dataset of “telco_churn”
• The primary goals is to uncover patterns, relationships, anomalies, and
insights that can inform subsequent analysis.
• Looking at the dataset by using head( ), tail( ), sample( ), size( )
III. DATA EXPLORING
• Checking the various attributes of dataset like Shape (Total number of
Rows and Columns), Columns name, Datatypes of columns,
Dimensionality, Information(Memory size, Datatypes, NAN values),
Describe(Min,Max,Median,25 %,75 %,and so on...)
• describe() method is useful for quickly understanding the
distribution and central tendency of your numerical data.
We can see that the TotalCharges
is in numerical form but its
datatype shown as object.
• Checking value_counts(), nunique(), Duplicated().sum() ,isnull().sum()
OBSERVATION - In all the above shows that,
there was no column with name issue but
No internet service and No phone service
means the same as 'NO
nunique() - Returning a
series object that displays
the count of unique
values of each columns
OBSERVATION - There
is no missing values in
the above dataset
1. The TotalCharges should be float or int but it is object so their
might be some missing values in this columns i.e we need to
change it into float or int.
• As There are whites spaces in the TotalCharges Column therefore
we cannot see the missing values.
1. In SeniorCitizen columns, It is actually a categorical, hence the
25%-50%-75% distribution is not proper.
2. In MonthlyCharges columns,Average Monthly charges are USD
64.76 whereas 75% customers pay more than USD 89.85 per
month.
3. No duplicated values.
OBSERVATION
1. Creating a copy of telco_churn for manipulation & processing. So,
there is no data leakage.
2. Churn Column (Target Column)
Converting churn column a Categorical value to Numerical Value
IV. DATA CLEANING
• Displaying values of maximum and minimum
• Finding the percentage of the Churn Column
OBSERVATION -
• Data is highly Imbalanced, ratio = 73:27
• So we analyze the data with other features while taking the target values
• separately to get some insights.
3. TotalCharges Column
Total Charges should be numeric amount. Converting it to numerical
data type.
OBSERVATION -
• top: " " (the most frequent value in the "Totalcharges" column is
white spaces)
• freq: 11 (the count of " " occurrences in the "TotalCharges" column
Here we will be filling the white spaces with NAN values.
Calculating the percentage of NAN values with respect to the total number
of rows.
As we can see there are 11 missing
values in TotalCharges column.
Let's check its records
OSERVATION - Since the % of these records compared to total dataset is very low i.e
0.16%, it is safe to fill them with 0 for further processing.
Missing Value Treatment
Checking the data type of the 'TotalCharges' column
OBSERVATION – Now treating the missing
values with 0 value. There is no missing
value left
4. Tenure Column
Dividing customers into bins based on tenure. for e.g. for tenure < 12
months: assign a tenure group if 1-12, for tenure between 1 to 2 Years,
tenure group of 13-24; so on... (i.e - Grouping the tenure in bins of 12
months)
Dropping tenure column as we
already created a tenure_group.
IV.1. BINNING
5. Customer-ID Column
6. Modifying Column
'No internet service' and 'No phone service' are not different from No
and can be replaced with "No"
Data visualization is the representation of data in graphical or visual
formats to communicate information effectively. It involves using charts,
graphs, maps, and other visual elements to convey patterns, trends, and
insights present in the data. It is a powerful tool for exploring,
interpreting, and presenting data in a way that is easily understandable.
Types of Data Visualization:
1. Univariate Analysis: Univariate analysis involves the examination of a
single variable or feature in isolation.
2. Bivariate Analysis: Bivariate analysis helps uncover patterns,
correlations, and dependencies between two variables.
V. DATA VISUALIZATION
V.1. UNIVARIATE ANALYSIS
1. 2.
3. 4.
OBSERVATIION - Customers with Fiber optic
Internet service type has churned more DSL is the
most popular internet service type.
OBSERVATION -Maximum Customers has not churned
i.e No-5174 & Less number of Customers has churned
i.e Yes-1869
OBSERVATION - Electronic check is 33.58% that is
more than other payment method OBSERVATION - Very less outliers in MonthlyCharges
5.
OBSERVATION - The distribution appears to be right-skewed, with a
longer tail on the right side. This indicates that there are fewer
senior citizens in the dataset.
OBSERVATIION –
Customers with 1-12
tenure_group has
churned more
6.
7.
OBSERVATION - Male has 50.48 %
and Female has 49.52%
V.2. BIVARIATE ANALYSIS
1.
OBSERVATION - Tenure_group from Female
Category within 12 month (i.e 1 year) has
churned highly
2.
OBSERVATION – ’Month-to-month' contract has a
significantly higher bar, it suggests a higher churn rate
for customers mostly in gender female Because of no
contract terms, as they are free to go
3.
OBSERVATION - Surprising insight as higher Churn at
lower Total Charges
OBSERVATION - Total Charges increase as Monthly Charges increase as
expected
5.
OBSERVATION - Churn is high when Monthly Charges are high
4.
• Tenure_group within 12 month (i.e 1 year) and Non senior Citizens
from female category has churned highly.
• 'Month-to-month' contract has a higher churn rate for customers
mostly in gender female. Because of no contract terms, as they are free
to go customers.
• Churn is high when Monthly Charges are high and Total Charges is low
but we see that between Total and Monthly charges when Total
Charges increase also Monthly Charges increases as well.
• Less number of Customers has churned i.e Yes - Count: 1869. Therefore
Data is highly Imbalanced in ratio = 73:27.
• Electronic check is 33.58% as it is the most common payment method
of churning more customers.
• The gender distribution is roughly balanced.
• Customers with Fiber optic Internet service type has churned more DSL
is the most popular internet service type.
• PhoneServices and Paperless billing customer that is chosen by a
significant number of customers has churned is less and not churned is
more.
CONCLUSION FOR DATA
VISUALIZATION
1.Creating Binary Features: Converting categorical features like 'Partner',
'Dependents' into binary features (0 or 1).
2. Creating a Feature for Family Size: Combining information from
'Partner' and 'Dependents' to create a feature representing the size of the
customer's family.
VI. FEATURE ENGINEERING
3. Creating a plot : To see which family size has churned more.
The goal of data preprocessing is to enhance the quality of the data,
remove any inconsistencies or errors, and prepare it for further analysis
or modeling.
Two Techniques of Feature Encoding are:
1. One-Hot Encoding - One-hot encoding is a method used to convert
categorical variables into a binary matrix (0s and 1s).
2. Label Encoding - Label encoding is another technique for
converting categorical data into a numerical format.
VII. DATA PREPROCESSING
FEATURE ENCODING
One-Hot
Encoding
Label
Encoding
1. One-Hot Encoding
2. Label Encoding
Data Displayed
4. Correlation of the features with 'Churn‘
IDENTIFYING BEST FEATURE
This ‘Month-to-Month Contract‘ feature has the greatest influence among all features
5. using HEATMAP, Correlation of the features with 'Churn‘ .
OBSERVATION -
• HIGH Churn seen in case of Month to month contracts.
• LOW Churn is seen in case of Long term contracts
• Factors like Gender, Availability of PhoneService and Number of multiple lines have
almost NO impact on Churn.
MULTIVARIATE ANALYSIS
This code randomly splits the dataset X (features) and y
(labels) into two separate sets: the training set (X_train and y_train) and the
testing set (X_test and y_test). The split is done with a test size of 0.2,
meaning that 20% of the data will be allocated for testing, while the
remaining 80% will be used for training. The random_state parameter is set
to ensure reproducibility of the split.
1. Splitting the telco_copy into X and y and then doing Train-Test Split.
VIII. TRAIN – TEST SPLIT
Scaling is performed to ensure that all numerical features in a
dataset are on a similar scale, avoiding biases, enabling fair comparisons,
and facilitating the convergence. It is a technique used in machine
learning to standardize or normalize the range of independent variables or
features of the dataset.
Methods of feature scaling
1. Standardization (Z-score Normalization):This code is an
implementation of the standardization (Z-score normalization) method
for feature scaling. Standardization scales the features so that they
have a mean of 0 and a standard deviation of 1.
IX. FEATURE SCALING
1. Standard Scaling Analysis
• Scaling the numerical features
• Extracting numerical features for scaling
2. Fitting and transforming the training data, saving the scaling
parameters for future use in test data.
• Display the scaled training and test sets
1. Before Scaling on Numerical_features
2. After Scaling
on Numerical_Features
• SMOTEENN is used to address imbalanced datasets by generating
synthetic examples for the minority class (SMOTE) and cleaning the
dataset to remove noise (ENN), ultimately leading to a more
balanced and representative dataset for model training. For instance,
in a binary classification problem, one class may have significantly
fewer instances than the other.
X. SMOTEENN
XI. MODEL BUILDING & EVALUATION
Random Forest
XGBoost Classifier
K-Nearest Neighbors
Classifier (KNN)
Decision Tree
Support Vector Classifier
(SVC)
• In Imbalanced data accuracy is cursed.
• As you can see that the accuracy is quite low, and as it's an
imbalanced dataset. Hence, we need to check recall, precision &
f1 score for the minority class, and it's quite evident that the
precision, recall & f1 score is too low for Class 1, i.e. churned
customers. Hence, moving ahead to call SMOTEENN
(OverSampling + ENN)
• After using SMOTEENN
XII. MODEL COMPARISON
• After evaluating different models for Churn detection, including Decision Tree, Random
Forest, K-Nearest Neighbors, Naïve Baye’s, XGBoost and SVC, it can be concluded that
the XGBoost model achieved the highest accuracy among the evaluated models, with
an accuracy score of 0.9689. XGBoost model is an ensemble learning method that
combines the predictions of multiple weak learners (typically decision trees) to create a
strong learner. This helps capture complex relationships in the data.
• The key importance lies in its ability to handle complex relationships in data, prevent
overfitting, handle missing values, and provide flexibility and customization for various
machine learning tasks.
• Combining XGBoost with SMOTEENN may enhance the model's performance on
imbalanced datasets. It helps the model better capture patterns in the minority class by
oversampling and cleaning the dataset.
CONCLUSION OF MODEL
COMPARISON
The best model is the XGBoost Classifier with highest
accuracy score of 0.9689
• Looking for maximum and minimum Models name with
Accuracy score
1. As MonthlyCharges increases also TotalCharges Increases.
2. Customers with 'Month-to-month' contract has a higher churn
rate. Because of no contract terms, as they are free to go
customers.
3. Churn is high when Monthly Charges are high and Total
Charges is low
4. Electronic check is the most common payment method of
churning more customers.
5. Customers with Fiber optic Internet service type has churned
more DSL is the most popular internet service type.
6. PhoneServices and Paperless billing customer that is chosen
by a significant number of customers has churned very less.
7. XGBoost model achieved the highest accuracy among the
evaluated models.
OVERALL CONCLUSION
DASHBOARD
THANK YOU

More Related Content

Similar to Decoding Patterns: Customer Churn Prediction Data Analysis Project

Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationVishva Abeyrathne
 
Computing Ratings and Rankings by Mining Feedback Comments
Computing Ratings and Rankings by Mining Feedback CommentsComputing Ratings and Rankings by Mining Feedback Comments
Computing Ratings and Rankings by Mining Feedback CommentsIRJET Journal
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersMohitMhapuskar
 
Case Study: It’s All About Data – And the Customer
Case Study: It’s All About Data – And the CustomerCase Study: It’s All About Data – And the Customer
Case Study: It’s All About Data – And the CustomerJill Kirkpatrick
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Data mining and analysis of customer churn dataset
Data mining and analysis of customer churn datasetData mining and analysis of customer churn dataset
Data mining and analysis of customer churn datasetRohan Choksi
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersIRJET Journal
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfAmmarAhmedSiddiqui2
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data ScienceCarolyn Knight
 
a hybrid approach to power theft detection
a hybrid approach to power theft detectiona hybrid approach to power theft detection
a hybrid approach to power theft detectionINFOGAIN PUBLICATION
 
Cross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customersCross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customersSaurabh Singh
 
Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonaliSonali Gupta
 
Online Service Rating Prediction by Removing Paid Users and Jaccard Coefficient
Online Service Rating Prediction by Removing Paid Users and Jaccard CoefficientOnline Service Rating Prediction by Removing Paid Users and Jaccard Coefficient
Online Service Rating Prediction by Removing Paid Users and Jaccard CoefficientIRJET Journal
 
2014 cs data collection guide (1)
2014 cs data collection guide (1)2014 cs data collection guide (1)
2014 cs data collection guide (1)Tamer Turgut
 
Chap7-Multidimensional data modeling.pptx
Chap7-Multidimensional data modeling.pptxChap7-Multidimensional data modeling.pptx
Chap7-Multidimensional data modeling.pptxMOHDAIMANFARHANBINMO
 
2012 cs-data-collection-guide
2012 cs-data-collection-guide2012 cs-data-collection-guide
2012 cs-data-collection-guidev_rajsingh
 

Similar to Decoding Patterns: Customer Churn Prediction Data Analysis Project (20)

Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using Classification
 
Computing Ratings and Rankings by Mining Feedback Comments
Computing Ratings and Rankings by Mining Feedback CommentsComputing Ratings and Rankings by Mining Feedback Comments
Computing Ratings and Rankings by Mining Feedback Comments
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco Churners
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Classification Problem with KNN
Classification Problem with KNNClassification Problem with KNN
Classification Problem with KNN
 
Case Study: It’s All About Data – And the Customer
Case Study: It’s All About Data – And the CustomerCase Study: It’s All About Data – And the Customer
Case Study: It’s All About Data – And the Customer
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data mining and analysis of customer churn dataset
Data mining and analysis of customer churn datasetData mining and analysis of customer churn dataset
Data mining and analysis of customer churn dataset
 
Clustering
ClusteringClustering
Clustering
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting Defaulters
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdf
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
 
a hybrid approach to power theft detection
a hybrid approach to power theft detectiona hybrid approach to power theft detection
a hybrid approach to power theft detection
 
Cross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customersCross selling credit card to existing debit card customers
Cross selling credit card to existing debit card customers
 
Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonali
 
Online Service Rating Prediction by Removing Paid Users and Jaccard Coefficient
Online Service Rating Prediction by Removing Paid Users and Jaccard CoefficientOnline Service Rating Prediction by Removing Paid Users and Jaccard Coefficient
Online Service Rating Prediction by Removing Paid Users and Jaccard Coefficient
 
2014 cs data collection guide (1)
2014 cs data collection guide (1)2014 cs data collection guide (1)
2014 cs data collection guide (1)
 
Chap7-Multidimensional data modeling.pptx
Chap7-Multidimensional data modeling.pptxChap7-Multidimensional data modeling.pptx
Chap7-Multidimensional data modeling.pptx
 
2012 cs-data-collection-guide
2012 cs-data-collection-guide2012 cs-data-collection-guide
2012 cs-data-collection-guide
 

More from Boston Institute of Analytics

NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesBoston Institute of Analytics
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionBoston Institute of Analytics
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachBoston Institute of Analytics
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationBoston Institute of Analytics
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Boston Institute of Analytics
 
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Boston Institute of Analytics
 
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Boston Institute of Analytics
 
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Boston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...Boston Institute of Analytics
 

More from Boston Institute of Analytics (20)

E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile Prices
 
Analyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning projectAnalyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning project
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning Approach
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project Presentation
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
 
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
 
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
 
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
 

Recently uploaded

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

Decoding Patterns: Customer Churn Prediction Data Analysis Project

  • 1.
  • 2. CAPSTONE PROJECT TITLE: Customer Churn Analysis. Presented by :- PALLAVI MOHANTY
  • 3. PROJECT CONTENT I. Introduction and Problem Statement II. Data Loading III. Data Exploring IV. Data Cleaning IV.1. Binning V. Data Visualization V.1. Univariate Analysis V.2. Bivariate Analysis VI. Feature Engineering VII. Data Preprocessing VIII. Train – Test Split IX. Feature Scaling X. Smoteenn XI. Model Building and Evaluation XII. Model Comparison CUSTOMER CHURN
  • 4. I. INTRODUCTION Q. What is Customer Churn? • Customer churn is defined as when customers or subscribers discontinue doing business with a firm or service • Each row represents a customer, each column contains customer’s attributes described on the column Metadata. The data set includes information about: • Customers who left within the last month – the column is called Churn . • Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies. • Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges. • Demographic information about customers – Customer ID, gender, and if they have partners and dependents. THIS IS A CLASSIC TELECOM CHURN USECASE.
  • 5. PROBLEM STATEMENT The target variable Telco Churn dataset typically revolves around predicting customer churn. It has only two possible outcomes: churn or not churn (Binary Classification). "Churn" refers to the scenario where customers who are likely to cancel their contracts soon. In the telecom industry, customer churn can be a significant issue, as it can lead to revenue loss. If the company can predict that, it can handle users before churn.
  • 6. APPROACH TO SOLVE PROBLEM STATEMENT 1. Exploratory Data Analysis (EDA) to understand data patterns and relationships. 2. Data preprocessing, including handling missing values, encoding categorical variables, and feature scaling. 3. Splitting the dataset into training and testing sets. 4. Building and training machine learning models for churn prediction. 5. Evaluating model performance using metrics like accuracy, precision, recall, and F1-score. 6. Good accuracy model is chosen. 7. Providing recommendations based on model insights. The ultimate goal is to help the telecom company proactively identify customers at risk of leaving, allowing them to implement targeted retention strategies and improve customer satisfaction.
  • 7. II. DATA LOADING • Importing the necessary libraries for data analysis and visualization, ensuring that visualizations are displayed inline. • Reading a CSV file located at the specified path and assigning it to a pandas DataFrame called ‘telco_churn’ for further analysis. • It is commonly used at the beginning of a data analysis and machine learning project to set up the environment, loading the dataset, and preparing for exploration and visualization. It is particularly useful for interactive data analysis.
  • 8. Displaying dataset of “telco_churn”
  • 9. • The primary goals is to uncover patterns, relationships, anomalies, and insights that can inform subsequent analysis. • Looking at the dataset by using head( ), tail( ), sample( ), size( ) III. DATA EXPLORING
  • 10. • Checking the various attributes of dataset like Shape (Total number of Rows and Columns), Columns name, Datatypes of columns, Dimensionality, Information(Memory size, Datatypes, NAN values), Describe(Min,Max,Median,25 %,75 %,and so on...) • describe() method is useful for quickly understanding the distribution and central tendency of your numerical data. We can see that the TotalCharges is in numerical form but its datatype shown as object.
  • 11. • Checking value_counts(), nunique(), Duplicated().sum() ,isnull().sum() OBSERVATION - In all the above shows that, there was no column with name issue but No internet service and No phone service means the same as 'NO nunique() - Returning a series object that displays the count of unique values of each columns OBSERVATION - There is no missing values in the above dataset
  • 12. 1. The TotalCharges should be float or int but it is object so their might be some missing values in this columns i.e we need to change it into float or int. • As There are whites spaces in the TotalCharges Column therefore we cannot see the missing values. 1. In SeniorCitizen columns, It is actually a categorical, hence the 25%-50%-75% distribution is not proper. 2. In MonthlyCharges columns,Average Monthly charges are USD 64.76 whereas 75% customers pay more than USD 89.85 per month. 3. No duplicated values. OBSERVATION
  • 13. 1. Creating a copy of telco_churn for manipulation & processing. So, there is no data leakage. 2. Churn Column (Target Column) Converting churn column a Categorical value to Numerical Value IV. DATA CLEANING
  • 14. • Displaying values of maximum and minimum • Finding the percentage of the Churn Column OBSERVATION - • Data is highly Imbalanced, ratio = 73:27 • So we analyze the data with other features while taking the target values • separately to get some insights.
  • 15. 3. TotalCharges Column Total Charges should be numeric amount. Converting it to numerical data type. OBSERVATION - • top: " " (the most frequent value in the "Totalcharges" column is white spaces) • freq: 11 (the count of " " occurrences in the "TotalCharges" column
  • 16. Here we will be filling the white spaces with NAN values. Calculating the percentage of NAN values with respect to the total number of rows. As we can see there are 11 missing values in TotalCharges column. Let's check its records OSERVATION - Since the % of these records compared to total dataset is very low i.e 0.16%, it is safe to fill them with 0 for further processing.
  • 17. Missing Value Treatment Checking the data type of the 'TotalCharges' column OBSERVATION – Now treating the missing values with 0 value. There is no missing value left
  • 18. 4. Tenure Column Dividing customers into bins based on tenure. for e.g. for tenure < 12 months: assign a tenure group if 1-12, for tenure between 1 to 2 Years, tenure group of 13-24; so on... (i.e - Grouping the tenure in bins of 12 months) Dropping tenure column as we already created a tenure_group. IV.1. BINNING
  • 19. 5. Customer-ID Column 6. Modifying Column 'No internet service' and 'No phone service' are not different from No and can be replaced with "No"
  • 20. Data visualization is the representation of data in graphical or visual formats to communicate information effectively. It involves using charts, graphs, maps, and other visual elements to convey patterns, trends, and insights present in the data. It is a powerful tool for exploring, interpreting, and presenting data in a way that is easily understandable. Types of Data Visualization: 1. Univariate Analysis: Univariate analysis involves the examination of a single variable or feature in isolation. 2. Bivariate Analysis: Bivariate analysis helps uncover patterns, correlations, and dependencies between two variables. V. DATA VISUALIZATION
  • 21. V.1. UNIVARIATE ANALYSIS 1. 2. 3. 4. OBSERVATIION - Customers with Fiber optic Internet service type has churned more DSL is the most popular internet service type. OBSERVATION -Maximum Customers has not churned i.e No-5174 & Less number of Customers has churned i.e Yes-1869 OBSERVATION - Electronic check is 33.58% that is more than other payment method OBSERVATION - Very less outliers in MonthlyCharges
  • 22. 5. OBSERVATION - The distribution appears to be right-skewed, with a longer tail on the right side. This indicates that there are fewer senior citizens in the dataset. OBSERVATIION – Customers with 1-12 tenure_group has churned more 6. 7. OBSERVATION - Male has 50.48 % and Female has 49.52%
  • 23. V.2. BIVARIATE ANALYSIS 1. OBSERVATION - Tenure_group from Female Category within 12 month (i.e 1 year) has churned highly 2. OBSERVATION – ’Month-to-month' contract has a significantly higher bar, it suggests a higher churn rate for customers mostly in gender female Because of no contract terms, as they are free to go
  • 24. 3. OBSERVATION - Surprising insight as higher Churn at lower Total Charges OBSERVATION - Total Charges increase as Monthly Charges increase as expected 5. OBSERVATION - Churn is high when Monthly Charges are high 4.
  • 25. • Tenure_group within 12 month (i.e 1 year) and Non senior Citizens from female category has churned highly. • 'Month-to-month' contract has a higher churn rate for customers mostly in gender female. Because of no contract terms, as they are free to go customers. • Churn is high when Monthly Charges are high and Total Charges is low but we see that between Total and Monthly charges when Total Charges increase also Monthly Charges increases as well. • Less number of Customers has churned i.e Yes - Count: 1869. Therefore Data is highly Imbalanced in ratio = 73:27. • Electronic check is 33.58% as it is the most common payment method of churning more customers. • The gender distribution is roughly balanced. • Customers with Fiber optic Internet service type has churned more DSL is the most popular internet service type. • PhoneServices and Paperless billing customer that is chosen by a significant number of customers has churned is less and not churned is more. CONCLUSION FOR DATA VISUALIZATION
  • 26. 1.Creating Binary Features: Converting categorical features like 'Partner', 'Dependents' into binary features (0 or 1). 2. Creating a Feature for Family Size: Combining information from 'Partner' and 'Dependents' to create a feature representing the size of the customer's family. VI. FEATURE ENGINEERING
  • 27. 3. Creating a plot : To see which family size has churned more.
  • 28. The goal of data preprocessing is to enhance the quality of the data, remove any inconsistencies or errors, and prepare it for further analysis or modeling. Two Techniques of Feature Encoding are: 1. One-Hot Encoding - One-hot encoding is a method used to convert categorical variables into a binary matrix (0s and 1s). 2. Label Encoding - Label encoding is another technique for converting categorical data into a numerical format. VII. DATA PREPROCESSING FEATURE ENCODING One-Hot Encoding Label Encoding
  • 29. 1. One-Hot Encoding 2. Label Encoding
  • 31. 4. Correlation of the features with 'Churn‘ IDENTIFYING BEST FEATURE This ‘Month-to-Month Contract‘ feature has the greatest influence among all features
  • 32. 5. using HEATMAP, Correlation of the features with 'Churn‘ . OBSERVATION - • HIGH Churn seen in case of Month to month contracts. • LOW Churn is seen in case of Long term contracts • Factors like Gender, Availability of PhoneService and Number of multiple lines have almost NO impact on Churn. MULTIVARIATE ANALYSIS
  • 33. This code randomly splits the dataset X (features) and y (labels) into two separate sets: the training set (X_train and y_train) and the testing set (X_test and y_test). The split is done with a test size of 0.2, meaning that 20% of the data will be allocated for testing, while the remaining 80% will be used for training. The random_state parameter is set to ensure reproducibility of the split. 1. Splitting the telco_copy into X and y and then doing Train-Test Split. VIII. TRAIN – TEST SPLIT
  • 34. Scaling is performed to ensure that all numerical features in a dataset are on a similar scale, avoiding biases, enabling fair comparisons, and facilitating the convergence. It is a technique used in machine learning to standardize or normalize the range of independent variables or features of the dataset. Methods of feature scaling 1. Standardization (Z-score Normalization):This code is an implementation of the standardization (Z-score normalization) method for feature scaling. Standardization scales the features so that they have a mean of 0 and a standard deviation of 1. IX. FEATURE SCALING
  • 35. 1. Standard Scaling Analysis • Scaling the numerical features • Extracting numerical features for scaling 2. Fitting and transforming the training data, saving the scaling parameters for future use in test data. • Display the scaled training and test sets
  • 36. 1. Before Scaling on Numerical_features 2. After Scaling on Numerical_Features
  • 37. • SMOTEENN is used to address imbalanced datasets by generating synthetic examples for the minority class (SMOTE) and cleaning the dataset to remove noise (ENN), ultimately leading to a more balanced and representative dataset for model training. For instance, in a binary classification problem, one class may have significantly fewer instances than the other. X. SMOTEENN
  • 38. XI. MODEL BUILDING & EVALUATION Random Forest XGBoost Classifier K-Nearest Neighbors Classifier (KNN) Decision Tree Support Vector Classifier (SVC)
  • 39. • In Imbalanced data accuracy is cursed. • As you can see that the accuracy is quite low, and as it's an imbalanced dataset. Hence, we need to check recall, precision & f1 score for the minority class, and it's quite evident that the precision, recall & f1 score is too low for Class 1, i.e. churned customers. Hence, moving ahead to call SMOTEENN (OverSampling + ENN) • After using SMOTEENN
  • 41. • After evaluating different models for Churn detection, including Decision Tree, Random Forest, K-Nearest Neighbors, Naïve Baye’s, XGBoost and SVC, it can be concluded that the XGBoost model achieved the highest accuracy among the evaluated models, with an accuracy score of 0.9689. XGBoost model is an ensemble learning method that combines the predictions of multiple weak learners (typically decision trees) to create a strong learner. This helps capture complex relationships in the data. • The key importance lies in its ability to handle complex relationships in data, prevent overfitting, handle missing values, and provide flexibility and customization for various machine learning tasks. • Combining XGBoost with SMOTEENN may enhance the model's performance on imbalanced datasets. It helps the model better capture patterns in the minority class by oversampling and cleaning the dataset. CONCLUSION OF MODEL COMPARISON
  • 42. The best model is the XGBoost Classifier with highest accuracy score of 0.9689
  • 43. • Looking for maximum and minimum Models name with Accuracy score
  • 44. 1. As MonthlyCharges increases also TotalCharges Increases. 2. Customers with 'Month-to-month' contract has a higher churn rate. Because of no contract terms, as they are free to go customers. 3. Churn is high when Monthly Charges are high and Total Charges is low 4. Electronic check is the most common payment method of churning more customers. 5. Customers with Fiber optic Internet service type has churned more DSL is the most popular internet service type. 6. PhoneServices and Paperless billing customer that is chosen by a significant number of customers has churned very less. 7. XGBoost model achieved the highest accuracy among the evaluated models. OVERALL CONCLUSION