MySQL Stored Procedures: Building High Performance Web Applications
Final Case Study Churn (Autosaved)
1. FINAL CASE STUDY
12/08/2016
Pothireddy Marreddy
Mobicom is concerned that the market environment of rising churn rates and declining ARPU will hit
them even harder as churn rate at Mobicom is relatively high. Currently they have been focusing on
retaining their customers on a reactive basis when the subscriber calls in to close the account
Objective:
• What are the top five factors driving likelihood of churn at Mobicom
• Roll out targeted proactive retention programs, which include usage enhancing marketing
programs to increase minutes of usage (MOU), rate plan migration, and a bundling strategy
among others.
Key Attributes
• The Internet and recommendation of family and friends.
• falling ARPU
• Usage based promotions to increase minutes of usage (MOU) for both voice and data
• Bundling
• Optimal rate plan
• Artificial churn/spinners or serial churners
Top Line Questions of Interest to Senior Management:
1. What are the top five factors driving likelihood of churn at Mobicom?
2. Validation of survey findings. a) Whether “cost and billing” and “network and service quality”
are important factors influencing churn behavior. b) Are data usage connectivity issues turning
out to be costly? In other words, is it leading to churn?
3. Would you recommend rate plan migration as a proactive retention strategy?
4. What would be your recommendation on how to use this churn model for prioritization of
customers for a proactive retention campaigns in the future?
2. /* create a permanent library name*/
Libname final "Y:ProgramesSAS Graded AssignmentFinal Case Study";
run;
/*Import the dataset*/
Proc import datafile = "Z:AssignmentsGraded AssignmentTopic 13 - Final
Case Study Implementation/telecomfinal.csv"
DBMS = CSV OUT = final.telecom replace;
DATAROW = 2;
GUESSINGROWS = 2000;
GETNAMES = YES;
Mixed = yes;
SCANTEXT = Yes;
RUN;
DATA EXPLORATION:
/*understand the data what is contained*/
Proc contents data = final.telecom;
run;
Data set "FINAL.telecom" has 66297 observation(s) and 79 variable(s)-Among 45
variables are character in nature and 34 are numeric variables-Need to
convert the character variables to into numeric variables by creating dummy
variables and buckets/bins for the range variables
/*check if there are any missing values and statistical analysis*/
proc means data = final.telecom N NMISS MEAN STD MIN MAX MODE MEDIAN;
run;
/*check how many are dafaulters*/
proc freq data = final.telecom;
table churn;
run;
The WPS System
The FREQ Procedure
churn
churn Frequency Percent Cumulative Frequency Cumulative Percent
0 50438 76.08 50438 76.08
1 15859 23.92 66297 100.00
The data contains There are 76% (50438) of non-defaulters and 24% (15859)of
defaulters
DATA PREPARATION:
/*there are few missing values and can be dropped or done a missing value
treatment*/
3. DATA final.telecom1;
SET final.telecom;
if income NE 'NA';
inc = income*1;
RUN;
PROC MEANS MEAN DATA = final.telecom1;
CLASS CRCLSCOD;
VAR inc ;
OUTPUT OUT = final.PREP (drop = _type_ _freq_ _stat_ );
RUN ;
PROC SORT DATA = final.telecom;
BY CRCLSCOD;
RUN;
DATA final.telecom2;
MERGE final.telecom (in = a ) final.prep (in = b );
BY crclscod;
IF A and B ;
RUN;
DATA final.telecom2;
SET final.telecom2;
IF income = 'NA' THEN income = inc ;
new_income = income * 1 ;
RUN;
/*converting character variables into numeric variables*/
DATA final.telecom2;
SET final.telecom2;
/* create income bucket based on quartile values */
IF new_income LE 4.9 THEN income_bkt = 1;
ELSE IF new_income LE 6 THEN income_bkt = 2;
ELSE IF new_income LE 7 THEN income_bkt = 3;
ELSE income_bkt = 4 ;
/* create bucket for callwait based on quartile values */
IF callwait_mean LE 0 then callwait_bkt = 1 ;
ELSE IF callwait_mean LE 0.334 THEN callwait_bkt = 2;
ELSE IF callwait_mean LE 1.67 THEN callwait_bkt = 3;
ELSE callwait_bkt = 4;
/* create an indicator variable for roaming */
IF ROAM_MEAN > 0 THEN roam_ind = 1 ;
ELSE roam_ind = 0 ;
/* create drop or block calls buckets using quartile values */
IF DROP_BLK_MEAN LE 1.67 THEN drop_blk_bkt = 1 ;
ELSE IF DROP_BLK_MEAN LE 5.34 THEN drop_blk_bkt = 2;
ELSE IF DROP_BLK_MEAN LE 12.67 THEN drop_blk_bkt = 3;
4. ELSE drop_blk_bkt = 4;
/* create buckets for placed voie calls using quartile values */
IF PLCD_VCE_MEAN LE 40.67 THEN PLCD_VCE_bkt = 1 ;
ELSE IF PLCD_VCE_MEAN LE 103.67 THEN PLCD_VCE_bkt = 2;
ELSE IF PLCD_VCE_MEAN LE 202.67 THEN PLCD_VCE_bkt = 3;
ELSE PLCD_VCE_bkt = 4;
/* create charactor variables for area of customer */
area_iscity = 0;
area_ismount = 0;
area_isrural = 0;
IF INDEX(area, 'DALLAS') > 0 OR INDEX(area, 'YORK') > 0 OR INDEX(area,
'HOUSTON') > 0 OR INDEX(area, 'ANGELES') > 0 OR
INDEX(area, 'CHICAGO') > 0 OR INDEX(area, 'PHILA') > 0 THEN area_iscity = 1 ;
ELSE IF INDEX(area, 'ROCKY') > 0 THEN area_ismount = 1;
ELSE area_isrural = 1;
/* create charactor variable for asl-flag */
IF asl_flag = 'Y' THEN aslflag = 1 ;
ELSE aslflag = 0;
/* convert to numeric variables */
MRC = totmrc_mean * 1 ;
age = age1 * 1 ;
handset_price = hnd_price * 1 ;
mean_mou = mou_mean * 1;
MOU6AVG = avg6mou * 1 ;
changemou = change_mou * 1 ;
mean_ovrmou = ovrmou_mean * 1 ;
mean_roam = roam_mean * 1;
/* create numeric variable for ethnic */
isasian = 0;
ishisp = 0;
isgerman = 0;
isfrench = 0;
isafro = 0;
IF ETHNIC = 'O' THEN isasian = 1 ;
ELSE IF ETHNIC = 'H' THEN isHisp = 1;
ELSE IF ETHNIC = 'G' THEN isgerman = 1;
ELSE IF ETHNIC = 'F' THEN isfrench = 1;
ELSE IF ETHNIC = 'Z' THEN isafro = 1;
/* create numeric variable for working woman */
5. woman_ind = 0;
IF wrkwoman = 'Y' THEN woman_ind = 1 ;
/* create numeric variable for new handset */
hnd_new = 0 ;
IF refurb_new = 'N' THEN hnd_new = 1;
/* create numeric variable for new car buyer */
car_new = 0 ;
IF Car_buy = 'New' THEN car_new = 1;
/* create numeric variables for car types */
car_reg = 0 ;
car_up = 0 ;
IF CARTYPE = 'E' THEN car_reg = 1 ;
IF CARTYPE = 'F' THEN car_up = 1;
/* create numeric variables for no childern */
no_child = 0 ;
IF children = 'N' THEN no_child = 1;
/* create numeric variables for credit class A or AA */
credclas_a = 0;
IF STRIP(crclscod) = 'A' THEN credclas_a = 1 ;
IF STRIP(crclscod) = 'AA' THEN credclas_a = 1 ;
/* create numeric variable for dwelling size A */
DWELL_a = 0 ;
IF STRIP(dwllsize) = 'A' THEN dwell_a = 1;
/* create numeric variable for web capable handset */
webcap_ind =0;
IF STRIP(hnd_webcap) = 'WCMB' THEN webcap_ind = 1;
/* create indicator variable for if model used is only 1 */
one_model = 0 ;
IF models = 1 THEN one_model = 1;
/* create indicator variable for retention call made or not */
retcall_ind = 1;
IF STRIP(RETDAYS) = 'NA' THEN retcall_ind = 0 ;
/* create buckets for age of equipment based on quartile values */
IF eqpdays LE 202 THEN eqp_age = 1;
6. ELSE IF eqpdays LE 326 THEN eqp_age = 2;
ELSE IF eqpdays LE 512 then EQP_AGE = 3 ;
ELSE eqp_age = 4;
/* create buckets for length of relationship based on quartile values */
IF months LE 11 THEN rship_age = 1;
ELSE IF months LE 16 THEN rship_age = 2;
ELSE IF months LE 24 then rship_AGE = 3 ;
ELSE rship_age = 4;
/* create buckets for average minutes of use based on quartile values */
IF avgmou LE 176.67 THEN avgmou_bkt = 1 ;
ELSE IF avgmou LE 362.5 THEN avgmou_bkt = 2;
ELSE IF avgmou LE 660.9 THEN avgmou_bkt = 3;
ELSE avgmou_bkt = 4;
/* create buckets for total calls based on quartile values */
IF TOTCALLS LE 860 THEN totcalls_bkt = 1 ;
ELSE IF avgmou LE 1796 THEN totcalls_bkt = 2;
ELSE IF avgmou LE 3508 THEN totcalls_bkt = 3;
ELSE totcalls_bkt = 4;
/* create buckets for total revenue based on quartile values */
IF TOTREV LE 860 THEN totrev_bkt = 1 ;
ELSE IF TOTREV LE 1796 THEN totrev_bkt = 2;
ELSE IF TOTREV LE 3508 THEN totrev_bkt = 3;
ELSE totrev_bkt = 4;
RUN;
Proc means data = final.telecom2 N NMISS;
run;
SPLITTING THE DATA
/**splitting the data into 2 types validation dataset and training dataset*/
PROC SURVEYSELECT DATA = final.telecom2
METHOD = SRS
OUT = final.sample1
SAMPRATE = 0.5
OUTALL;
RUN;
DATA final.TRAINING final.VALIDATE;
SET final.sample1;
IF selected = 0 THEN OUTPUT final.TRAINING;
ELSE IF selected = 1 THEN OUTPUT final.VALIDATE;
RUN;
MODEL BUILDING: LOGISTIC REGRESSION
7. /*Model Building -Logistic regression since the target variable is a binary
variable*/
PROC LOGISTIC DATA = final.training DESCENDING;
MODEL churn = AVGMOU AVG3MOU plcd_vce_bkt drop_blk_bkt iwylis_vce_mean
changemou drop_vce_range DROP_VCE_MEAN
area_ismount aslflag mrc age isasian ishisp isafro handset_price mean_ovrmou
mean_roam hnd_new no_child
webcap_ind models actvsubs uniqsubs retcall_ind rship_age eqp_age
totcalls_bkt income_bkt callwait_bkt roam_ind
area_iscity area_isrural mou6avg changemou woman_ind car_new car_reg car_up
credclas_a dwell_a one_model
/ selection = forward ctable lackfit;
run;
OUTPUT
The WPS System
The LOGISTIC Procedure
Model Information
Data Set FINAL.training
Response Variable churn
Number of Response Levels 2
Model binary logit
Optimisation Technique Fisher's scoring
Number of Observations Read 33142
Number of Observations Used 31124
Response Profile
Ordered Value churn Total Frequency
1 1 7533
2 0 23591
Probability modeled is churn='1'.
Forward Selection Procedure
Step 0. Intercept entered:
8. Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
-2 Log L = 34448.7
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
1230.2406 40 <.0001
Step 1. Effect eqp_age entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 34053.568
SC 34459.057 34070.259
-2 Log L 34448.711 34049.568
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 399.1436 1 <.0001
Score 395.7067 1 <.0001
Wald 391.0589 1 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
869.4213 39 <.0001
Step 2. Effect retcall_ind entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
9. Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33903.597
SC 34459.057 33928.634
-2 Log L 34448.711 33897.597
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 551.1139 2 <.0001
Score 562.4510 2 <.0001
Wald 544.8388 2 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
702.2174 38 <.0001
Step 3. Effect age entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33840.976
SC 34459.057 33874.359
-2 Log L 34448.711 33832.976
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 615.7356 3 <.0001
10. Score 624.9492 3 <.0001
Wald 604.6355 3 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
635.7610 37 <.0001
Step 4. Effect aslflag entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33768.823
SC 34459.057 33810.552
-2 Log L 34448.711 33758.823
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 689.8879 4 <.0001
Score 693.0572 4 <.0001
Wald 669.3955 4 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
565.7904 36 <.0001
Step 5. Effect handset_price entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
11. Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33705.967
SC 34459.057 33756.041
-2 Log L 34448.711 33693.967
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 754.7446 5 <.0001
Score 757.9076 5 <.0001
Wald 731.6708 5 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
501.6692 35 <.0001
Step 6. Effect rship_age entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33660.584
SC 34459.057 33719.004
-2 Log L 34448.711 33646.584
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 802.1271 6 <.0001
Score 794.9744 6 <.0001
12. Wald 765.1791 6 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
458.2472 34 <.0001
Step 7. Effect totcalls_bkt entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33622.403
SC 34459.057 33689.169
-2 Log L 34448.711 33606.403
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 842.3080 7 <.0001
Score 834.0448 7 <.0001
Wald 802.5407 7 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
410.6622 33 <.0001
Step 8. Effect MRC entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
13. Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33575.554
SC 34459.057 33650.665
-2 Log L 34448.711 33557.554
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 891.1575 8 <.0001
Score 880.3880 8 <.0001
Wald 846.4889 8 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
366.4575 32 <.0001
Step 9. Effect changemou entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33548.686
SC 34459.057 33632.143
-2 Log L 34448.711 33528.686
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 920.0252 9 <.0001
Score 906.2857 9 <.0001
14. Wald 870.2710 9 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
336.0364 31 <.0001
Step 10. Effect uniqsubs entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33527.000
SC 34459.057 33618.803
-2 Log L 34448.711 33505.000
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 943.7114 10 <.0001
Score 930.9318 10 <.0001
Wald 893.3402 10 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
311.0257 30 <.0001
Step 11. Effect PLCD_VCE_bkt entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
15. Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33506.811
SC 34459.057 33606.960
-2 Log L 34448.711 33482.811
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 965.9002 11 <.0001
Score 952.5994 11 <.0001
Wald 913.4428 11 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
290.1271 29 <.0001
Step 12. Effect drop_vce_Range entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33474.223
SC 34459.057 33582.717
-2 Log L 34448.711 33448.223
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1000.4882 12 <.0001
Score 986.8861 12 <.0001
16. Wald 944.0560 12 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
252.7692 28 <.0001
Step 13. Effect mean_ovrmou entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33452.631
SC 34459.057 33569.471
-2 Log L 34448.711 33424.631
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1024.0805 13 <.0001
Score 1011.7401 13 <.0001
Wald 965.7763 13 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
226.4365 27 <.0001
Step 14. Effect avg3mou entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
17. Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33432.064
SC 34459.057 33557.250
-2 Log L 34448.711 33402.064
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1046.6470 14 <.0001
Score 1032.0648 14 <.0001
Wald 983.6727 14 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
206.7282 26 <.0001
Step 15. Effect avgmou entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33377.519
SC 34459.057 33511.051
-2 Log L 34448.711 33345.519
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1103.1921 15 <.0001
Score 1083.4464 15 <.0001
18. Wald 1029.5633 15 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
151.0034 25 <.0001
Step 16. Effect isafro entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33356.790
SC 34459.057 33498.668
-2 Log L 34448.711 33322.790
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1125.9208 16 <.0001
Score 1103.2109 16 <.0001
Wald 1047.7040 16 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
129.2814 24 <.0001
Step 17. Effect hnd_new entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
19. Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33344.370
SC 34459.057 33494.593
-2 Log L 34448.711 33308.370
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1140.3415 17 <.0001
Score 1116.1373 17 <.0001
Wald 1059.3577 17 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
114.7804 23 <.0001
Step 18. Effect area_ismount entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33333.877
SC 34459.057 33492.446
-2 Log L 34448.711 33295.877
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1152.8345 18 <.0001
Score 1129.0573 18 <.0001
20. Wald 1071.1611 18 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
101.9676 22 <.0001
Step 19. Effect isasian entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33323.823
SC 34459.057 33490.737
-2 Log L 34448.711 33283.823
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1164.8884 19 <.0001
Score 1141.2488 19 <.0001
Wald 1082.3016 19 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
89.7274 21 <.0001
Step 20. Effect drop_blk_bkt entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
21. Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33313.558
SC 34459.057 33488.819
-2 Log L 34448.711 33271.558
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1177.1529 20 <.0001
Score 1153.5902 20 <.0001
Wald 1094.5598 20 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
77.6578 20 <.0001
Step 21. Effect mean_roam entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33307.704
SC 34459.057 33491.310
-2 Log L 34448.711 33263.704
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1185.0076 21 <.0001
Score 1161.6724 21 <.0001
22. Wald 1100.4173 21 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
66.8699 19 <.0001
Step 22. Effect actvsubs entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33299.275
SC 34459.057 33491.227
-2 Log L 34448.711 33253.275
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1195.4365 22 <.0001
Score 1173.2964 22 <.0001
Wald 1110.5652 22 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
56.4051 18 <.0001
Step 23. Effect ishisp entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
23. Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33293.307
SC 34459.057 33493.604
-2 Log L 34448.711 33245.307
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1203.4046 23 <.0001
Score 1180.8883 23 <.0001
Wald 1117.5361 23 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
48.3565 17 <.0001
Step 24. Effect webcap_ind entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33287.710
SC 34459.057 33496.353
-2 Log L 34448.711 33237.710
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1211.0016 24 <.0001
Score 1192.3206 24 <.0001
24. Wald 1127.6673 24 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
40.7115 16 0.0006
Step 25. Effect models entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33281.750
SC 34459.057 33498.739
-2 Log L 34448.711 33229.750
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1218.9616 25 <.0001
Score 1200.0611 25 <.0001
Wald 1134.4058 25 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
32.6353 15 0.0053
Step 26. Effect car_reg entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
25. Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33277.031
SC 34459.057 33502.366
-2 Log L 34448.711 33223.031
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1225.6798 26 <.0001
Score 1206.1531 26 <.0001
Wald 1140.2571 26 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
26.0477 14 0.0255
Step 27. Effect drop_vce_Mean entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33273.335
SC 34459.057 33507.015
-2 Log L 34448.711 33217.335
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1231.3766 27 <.0001
Score 1211.3233 27 <.0001
26. Wald 1144.6124 27 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
20.1387 13 0.0918
Step 28. Effect no_child entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33270.459
SC 34459.057 33512.485
-2 Log L 34448.711 33212.459
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1236.2524 28 <.0001
Score 1216.0508 28 <.0001
Wald 1149.1918 28 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
15.3170 12 0.2246
Step 29. Effect income_bkt entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
27. Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 33268.335
SC 34459.057 33518.707
-2 Log L 34448.711 33208.335
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1240.3764 29 <.0001
Score 1220.1970 29 <.0001
Wald 1153.1296 29 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
11.2005 11 0.4266
NOTE: No further effects met the 0.05 significance level for entry into the model.
Summary of Forward Selection
Step Effect Entered DF Number of Effects
In Model
Score Chi-
Square
Pr > Chi-
Square
Variable Label
1 eqp_age 1 1 395.7067 <.0001
2 retcall_ind 1 2 170.9108 <.0001
3 age 1 3 64.8426 <.0001
4 aslflag 1 4 71.1051 <.0001
5 handset_price 1 5 64.1500 <.0001
6 rship_age 1 6 47.0868 <.0001
7 totcalls_bkt 1 7 39.8896 <.0001
8 MRC 1 8 47.6251 <.0001
9 changemou 1 9 28.8981 <.0001
10 uniqsubs 1 10 24.2311 <.0001 uniqsubs
11 PLCD_VCE_bkt 1 11 22.2594 <.0001
33. area_ismount aslflag mrc age isasian ishisp isafro handset_price mean_ovrmou
mean_roam hnd_new no_child
webcap_ind models actvsubs uniqsubs retcall_ind rship_age eqp_age
totcalls_bkt income_bkt callwait_bkt roam_ind
area_iscity area_isrural mou6avg changemou woman_ind car_new car_reg car_up
credclas_a dwell_a one_model
/ selection = forward ctable lackfit;
run;
The WPS System
The LOGISTIC Procedure
Model Information
Data Set FINAL.validate
Response Variable churn
Number of Response Levels 2
Model binary logit
Optimisation Technique Fisher's scoring
Number of Observations Read 33143
Number of Observations Used 31049
Response Profile
Ordered Value churn Total Frequency
1 1 7451
2 0 23598
Probability modeled is churn='1'.
Forward Selection Procedure
Step 0. Intercept entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
-2 Log L = 34219.2
34. Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
1177.5278 40 <.0001
Step 1. Effect eqp_age entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33920.358
SC 34229.553 33937.045
-2 Log L 34219.209 33916.358
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 302.8509 1 <.0001
Score 300.8859 1 <.0001
Wald 298.2020 1 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
910.4723 39 <.0001
Step 2. Effect retcall_ind entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33831.040
35. SC 34229.553 33856.070
-2 Log L 34219.209 33825.040
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 394.1691 2 <.0001
Score 399.1679 2 <.0001
Wald 391.3533 2 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
810.8576 38 <.0001
Step 3. Effect uniqsubs entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33767.651
SC 34229.553 33801.024
-2 Log L 34219.209 33759.651
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 459.5581 3 <.0001
Score 465.5142 3 <.0001
Wald 455.2268 3 <.0001
36. Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
741.9184 37 <.0001
Step 4. Effect hnd_new entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33710.449
SC 34229.553 33752.166
-2 Log L 34219.209 33700.449
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 518.7601 4 <.0001
Score 524.4062 4 <.0001
Wald 512.1788 4 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
682.1505 36 <.0001
Step 5. Effect age entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33652.194
37. SC 34229.553 33702.254
-2 Log L 34219.209 33640.194
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 579.0155 5 <.0001
Score 583.4429 5 <.0001
Wald 568.6121 5 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
620.2075 35 <.0001
Step 6. Effect isafro entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33610.162
SC 34229.553 33668.566
-2 Log L 34219.209 33596.162
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 623.0468 6 <.0001
Score 621.8572 6 <.0001
Wald 604.7424 6 <.0001
38. Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
580.9746 34 <.0001
Step 7. Effect MRC entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33571.264
SC 34229.553 33638.010
-2 Log L 34219.209 33555.264
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 663.9455 7 <.0001
Score 660.2436 7 <.0001
Wald 641.6473 7 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
549.5600 33 <.0001
Step 8. Effect totcalls_bkt entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33521.110
39. SC 34229.553 33596.200
-2 Log L 34219.209 33503.110
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 716.0991 8 <.0001
Score 712.0775 8 <.0001
Wald 691.3940 8 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
488.3136 32 <.0001
Step 9. Effect rship_age entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33482.650
SC 34229.553 33566.084
-2 Log L 34219.209 33462.650
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 756.5589 9 <.0001
Score 745.7204 9 <.0001
Wald 722.2831 9 <.0001
40. Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
448.0189 31 <.0001
Step 10. Effect aslflag entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33428.934
SC 34229.553 33520.710
-2 Log L 34219.209 33406.934
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 812.2756 10 <.0001
Score 795.7781 10 <.0001
Wald 770.1978 10 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
394.1559 30 <.0001
Step 11. Effect webcap_ind entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33394.309
41. SC 34229.553 33494.429
-2 Log L 34219.209 33370.309
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 848.9001 11 <.0001
Score 837.0318 11 <.0001
Wald 808.4814 11 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
357.0971 29 <.0001
Step 12. Effect isasian entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33364.242
SC 34229.553 33472.705
-2 Log L 34219.209 33338.242
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 880.9674 12 <.0001
Score 871.6385 12 <.0001
Wald 840.3767 12 <.0001
42. Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
323.6032 28 <.0001
Step 13. Effect mean_ovrmou entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33339.703
SC 34229.553 33456.510
-2 Log L 34219.209 33311.703
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 907.5062 13 <.0001
Score 898.1250 13 <.0001
Wald 863.6643 13 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
291.9235 27 <.0001
Step 14. Effect avg3mou entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33291.647
43. SC 34229.553 33416.797
-2 Log L 34219.209 33261.647
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 957.5622 14 <.0001
Score 945.0097 14 <.0001
Wald 905.9672 14 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
249.6822 26 <.0001
Step 15. Effect avgmou entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33225.748
SC 34229.553 33359.241
-2 Log L 34219.209 33193.748
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1025.4613 15 <.0001
Score 1006.0867 15 <.0001
Wald 960.8460 15 <.0001
44. Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
181.8526 25 <.0001
Step 16. Effect drop_vce_Range entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33202.698
SC 34229.553 33344.535
-2 Log L 34219.209 33168.698
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1050.5109 16 <.0001
Score 1030.8308 16 <.0001
Wald 982.0499 16 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
155.1997 24 <.0001
Step 17. Effect changemou entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33180.918
45. SC 34229.553 33331.098
-2 Log L 34219.209 33144.918
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1074.2908 17 <.0001
Score 1044.9558 17 <.0001
Wald 1000.0754 17 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
191.6930 23 <.0001
Step 18. Effect mean_roam entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33150.774
SC 34229.553 33309.297
-2 Log L 34219.209 33112.774
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1106.4351 18 <.0001
Score 1082.5402 18 <.0001
Wald 1025.8762 18 <.0001
46. Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
93.8315 22 <.0001
Step 19. Effect actvsubs entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33134.679
SC 34229.553 33301.545
-2 Log L 34219.209 33094.679
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1124.5304 19 <.0001
Score 1102.7041 19 <.0001
Wald 1042.4785 19 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
75.7138 21 <.0001
Step 20. Effect one_model entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33123.382
47. SC 34229.553 33298.591
-2 Log L 34219.209 33081.382
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1137.8275 20 <.0001
Score 1115.2578 20 <.0001
Wald 1053.8346 20 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
62.3528 20 <.0001
Step 21. Effect PLCD_VCE_bkt entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33113.067
SC 34229.553 33296.620
-2 Log L 34219.209 33069.067
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1150.1426 21 <.0001
Score 1131.0524 21 <.0001
Wald 1066.8021 21 <.0001
48. Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
50.0641 19 0.0001
Step 22. Effect drop_blk_bkt entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33102.376
SC 34229.553 33294.273
-2 Log L 34219.209 33056.376
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1162.8329 22 <.0001
Score 1143.5578 22 <.0001
Wald 1079.6047 22 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
37.4052 18 0.0046
Step 23. Effect area_ismount entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33094.139
49. SC 34229.553 33294.378
-2 Log L 34219.209 33046.139
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1173.0707 23 <.0001
Score 1153.7958 23 <.0001
Wald 1089.2678 23 <.0001
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
26.8637 17 0.0601
Step 24. Effect handset_price entered:
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34221.209 33089.320
SC 34229.553 33297.903
-2 Log L 34219.209 33039.320
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 1179.8891 24 <.0001
Score 1159.0711 24 <.0001
Wald 1094.2510 24 <.0001
55. 1.000 0 23598 0 7451 76.0 0.0 100.0 . 24.0
/**Generating gain chart*/
PROC LOGISTIC DATA = final.telecom2 DESCENDING OUTMODEL = DMM ;
MODEL CHURN = AVGMOU AVG3MOU plcd_vce_bkt drop_blk_bkt changemou
drop_vce_range DROP_VCE_MEAN
area_ismount aslflag mrc age isasian ishisp isafro handset_price mean_ovrmou
mean_roam hnd_new no_child
webcap_ind actvsubs uniqsubs retcall_ind rship_age eqp_age totcalls_bkt
one_model / ctable lackfit;
SCORE OUT = DMP;
RUN;
The WPS System
The LOGISTIC Procedure
Model Information
Data Set FINAL.telecom2
Response Variable churn
Number of Response Levels 2
Model binary logit
Optimisation Technique Fisher's scoring
Model: This is Binary logit regression model that was fit to the data
Optimization Technique: Fishers’ scoring is the iterative method of estimating the regression parameters
Number of Observations Read 66285
Number of Observations Used 64123
The number of observations are used less than the number of observations read since there are missing
values for variables used
Response Profile
Ordered Value churn Total Frequency
1 1 15317
2 0 48806
Probability modeled is churn='1'.
56. Ordered value: A descending order high to low is treated such that when the logit regression coefficients
corresponds to a positive relationship for high write status, and a negative coefficients has a negative
relationship with high write status . The total frequency of probability of churn is 15317 (24%) and non-
probability of churn is 48806 (76%)
Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
The default criterion is used to assess is the relative gradient convergence criterion (GCONV) and the
default precision is 10^-8. In this model the convergence criterion satisfied
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 70508.162 68071.173
SC 70517.230 68325.093
-2 Log L 70506.162 68015.173
AIC: Akaike Information Criterion is used for the comparison of non nested models on the same sample.
The model with low AIC will be the best.
SC: if The Schwarz Criterion is small and that is most desirable
AIC and SC penalize the log- likelihood by the number of predictors in the model
The -2 Log L is used in hypothesis tests for nested models and this is negative two times the log likelihood
Intercept and Covariates: A fitted model includes all independent variables and the intercept. This can be
compared the values in this column with the criteria corresponding intercept only value to assess
modelfit/significance
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 2490.9885 27 <.0001
Score 2453.9078 27 <.0001
Wald 2316.9133 27 <.0001
The Global null hypothesis test against null hypothesis that at least one of the predictor’s regression
coefficient is not equal to zero
Likelihood Ratio, score and wald test that at least one of the predictor’s regression coefficient is not equal to
zero
Calculations: - 2 Log L (Null model which is intercept only) – 2 Log L (fitted model which is intercept and
covariates) which 2490.9885
PR>Chi-Square: it is compared to specified alpha level and accept the a type 1 error. The small p value lead
to conclude that at least one of the regression coefficients in the model is not equal to zero
58. DF: Degrees of Freedom corresponding to the parameter. Each parameter estimated in the model requires
one DF(which is estimated in this model as one) and defines the Chi-square distribution to test whether the
individual regression coefficient is zero, given the other variables are in the model.
Estimate: These are the binary logit regression estimates for the parameters in the model. The logistic
regression model models the log odds of a positive response (churn probability = 1) as a linear combination
the predictor variables.
Log(P/1-p) = b0+b1*pv+b2*pv+b3*pv…
Log (p/1-p) = -1.1379+(0.000602*avgmou)+(-0.00073*avg3mou)+…….
Interpretation: For a unit change in the predictor variable, the difference in log-odds for a positive outcome
is expected to change by the respective coefficient given the other variables in the model are held constant.
Intercept = -1.1379 (logistic regression estimate when all variables in the model are evaluated at zero.
Standard error – these are the standard errors of the individual regression coefficients.
Pr>ChiSq : Testing the null hypothesis that an individual predictor’s regression coefficient is zero,given the
other predictor variables are in the model. The chi square test statistic is the squared ration of the estimate
to the standard error of the respective predictor. In this model all variables are significant at 0.05 %
Odds Ratio Estimates
Effect Point
Estimate
Lower 95% Wald Confidence
Limit
Upper 95% Wald Confidence
Limit
avgmou 1.001 1.000 1.001
avg3mou 0.999 0.999 0.999
PLCD_VCE_bkt 0.891 0.863 0.919
drop_blk_bkt 1.068 1.037 1.099
changemou 1.000 1.000 1.000
drop_vce_Range 1.004 1.001 1.007
drop_vce_Mean 1.005 1.001 1.009
area_ismount 1.257 1.149 1.375
aslflag 0.666 0.625 0.710
MRC 0.997 0.996 0.998
age 0.996 0.995 0.997
isasian 1.389 1.270 1.519
ishisp 1.071 1.013 1.131
isafro 0.689 0.625 0.760
handset_price 0.999 0.998 0.999
mean_ovrmou 1.002 1.001 1.002
mean_roam 1.004 1.003 1.006
59. hnd_new 0.861 0.809 0.917
no_child 0.925 0.867 0.987
webcap_ind 0.857 0.806 0.910
actvsubs 0.882 0.841 0.925
uniqsubs 1.167 1.128 1.207
retcall_ind 2.028 1.842 2.231
rship_age 0.860 0.838 0.883
eqp_age 1.288 1.254 1.323
totcalls_bkt 1.300 1.230 1.373
one_model 0.858 0.808 0.910
Points Effect: The odds ratio is obtained by exponentiation the estimate. These difference in the log of two
odds is equal to the log of the ration of these two odds. The log of the ration of the two odds is the log odds
ratio.
Interpretation: For a one unit change in the predictor variable, the odds ratio for a positive outcome is
expected to change by the respective coefficient, given the other variables in the model are held constant.
95% Wald Confidence Limits: This is the Wald Confidence Interval (CI) of an individual odds ratio, given the
other predictor’s variables are in the model. For a given predictor variable with a level of 95% confidence,
that we are 95% confident that upon repeated trials, 95% of the CI’s include the true population odds ratio.
If the CI includes one we would fail to reject the null hypothesis that a particular regression coefficient
equals to zero and the odds ration equals one, given the other predictors are in the model.
Association of Predicted Probabilities and Observed Responses
Percent Concordant 63.0 Somer's D 0.267
Percent Discordant 36.3 Gamma 0.268
Percent Tied 0.7 Tau-a 0.097
Pairs 7.4756E8 c 0.633
Percent Concordant: A pair of observations with different observed responses is said to be concordant if
the observation with the lower ordered response value “0” has a lower predicted mean score than the
observation with the higher ordered response value “1”. The concordant percent is 63 %. The higher the
concordant the better the model is. This looks a good model since the concordant value is higher than the
discordant percent.
Percent Discordant: If the observation with the lower ordered response value has a higher predicted mean
score than the observation with the higher ordered response value. The discordant percent is 36.3%
Percent Tied: It is ties since a pair of observations is neither concordant nor discordant.
Pairs: The total number of distinct pairs in which one case has an observed outcome different from the other
member of the pair.
In the response profile table we have 15317 observations with honcomp = 1 and 48806 observations with
honcomp = 0; This the total number of pairs with different coutcomes is 15317*48806 = 747,561,502
(7.4756E8)
60. Somer’s D: Is used to determine the strength and direction of relation between pairs of variables values
ranges from -1 (all pairs disagree) to 1 (all pairs agree). It is defined as (concordant – discordant)/number
of total number of pairs with different response
Somer’s D = (63-36.3)/100 = 0.267
Gamma : The Good man – Kruskal Gamma method does not penalize for ties on either variable. It value
ranges from -1(no association) to 1 (perfect association). It generally is greater than Somer’s D
Tau-a: Kendall’s Tau-a is a modification of Somer’s D that takes into account the difference between the
number of possible paired observations and the number of paired observations with a different response. It
is defined to be the ratio of the difference between the number of concordant pairs and the number of
discordant pairs to the number of possible pairs. Usually Tau-a is much smaller than Somer’s D since there
would be many paired observations with the same response.
C-C: This is equivalent to the well known measure ROC. C ranges from 0.5 to 1 where 0.5 corresponds to
the model randomly predicting the response, and a 1 corresponds to the model perfectly discriminating the
response.
Partition for the Hosmer and Lemeshow Test
Group TotalObserved Events Expected Events Observed Nonevents Expected Nonevents
1 6413 750 731.23 5663 5681.77
2 6412 905 980.93 5507 5431.07
3 6413 1117 1143.90 5296 5269.10
4 6412 1228 1284.09 5184 5127.91
5 6412 1423 1416.34 4989 4995.66
6 6412 1555 1549.39 4857 4862.61
7 6412 1702 1691.44 4710 4720.56
8 6412 1914 1854.05 4498 4557.95
9 6412 2107 2064.37 4305 4347.63
10 6413 2616 2601.28 3797 3811.72
Hosmer and Lemeshow Goodness-of-Fit Test
Chi-Square DF Pr > Chi-Square
15.6395 8 0.0478
1. Hosmer and Lemeshow Goodness-of-Fit Test –
Pr > Chi-square value is 0.0478 which is higher than that of the training dataset and validation dataset
respectively 0.0430 and 0.0428 as well as lower than 0.05 indicating a good model.
2. Association of Predicted Probabilities and Observed Responses
63. /*check accuracy**/
Data final.testacc;
set DMP;
if f_churn = 0 and i_churn = 0 then out = "True Negative";
else if f_churn = 1 and i_churn = 1 then out = "True Positive";
else if f_churn = 0 and i_churn = 1 then out = "False Positive";
else if f_churn = 1 and i_churn = 0 then out = "False Negative";
run;
Proc freq;
tables out;
run;
Cut off probability = 0.5
out Frequen
cy
Percent Cumulat
ive
Frequen
cy
Cumulat
ive
Percent
Predicted Actual Expected
False
Negativ
15055 23.48 15055 23.48
64123 Churn = 0 Churn = 1
False
Positiv
293 0.46 15348 23.94
Churn = 0 48511 15057
True
Negativ
e
48513 75.66 63861 99.59
Churn = 1 260 295
True
Positive
262 0.41 64123 100
True NegativeTrue PositiveFalse Positve False Negative
76% 0.46% 23.48% 0.41%
P_Churn = 0 and A_Churn = 0 then True Negative
%yes captured correctly 0.46% P_Churn = 1 and A_Churn = 1 then True Positive
%of actual nos missclassfied as yes 76% P_Churn = 0 and A_Churn = 1 then False Positive
%of Nos captured correctly 0.41% P_Churn = 1 and A_Churn = 0 then False Negative
% of actual yes missclassified as Nos = 24%
Frequency Missing = 2162
out Frequency Percent Cumulative Frequency Cumulative Percent
False Negativ 15055 23.48 15055 23.48
False Positiv 293 0.46 15348 23.94
True Negative 48513 75.66 63861 99.59
True Positive 262 0.41 64123 100.00
Frequency Missing = 2162
Gain Chart:
64. Sum of churn Average of P_1 Count of P_1 Sum of P_1_2 Expected ChurnCumulative expcted churnPredicted churnCumulative predicted churnLift by mode
2616 0.405634825 6412 2600.930497 1531.7 10% 0.169807 17% 1.707906
2107 0.321959928 6412 2064.407059 1531.7 20% 0.134779 30% 1.375596
1914 0.289155258 6413 1854.352672 1531.7 30% 0.121065 43% 1.249592
1702 0.26379269 6412 1691.43873 1531.7 40% 0.110429 54% 1.111184
1555 0.241637261 6413 1549.619755 1531.7 50% 0.10117 64% 1.015212
1423 0.220886158 6412 1416.322048 1531.7 60% 0.092467 73% 0.929033
1228 0.200260533 6412 1284.070538 1531.7 70% 0.083833 81% 0.801724
1117 0.178368073 6413 1143.874452 1531.7 80% 0.07468 89% 0.729255
906 0.152979779 6412 980.9063447 1531.7 90% 0.06404 95% 0.5915
749 0.114019516 6412 731.0931359 1531.7 100% 0.047731 100% 0.488999
15317 0.238869286 64123 15317.01523 15317
Cumulative Expected Cumulative Predicted
10% 17%
20% 30%
30% 43%
40% 54%
50% 64%
60% 73%
70% 81%
80% 89%
90% 95%
100% 100%
From the gain chart, the model appears to be a better predictor than a standard average probability
calculation.
At 50% probability level, the model has correctly predicted 260 churn of 15057 actual, which is only a
1.7% success rate. But it has correctly predicted 48513 out of 48775 non churn, which is 99.5% success
rate.
65. At 50% probability level, the model has correctly predicted 295 churn of 15057 actual, which is very
healthy a 1.9% success rate. It has correctly predicted 48513 out of 48808 non churn, which is 99.4%
success rate. This probability level can be chosen as an appropriate level for prediction.