Final Case Study Churn (Autosaved)

FINAL CASE STUDY
12/08/2016
Pothireddy Marreddy
Mobicom is concerned that the market environment of rising churn rates and declining ARPU will hit
them even harder as churn rate at Mobicom is relatively high. Currently they have been focusing on
retaining their customers on a reactive basis when the subscriber calls in to close the account
Objective:
• What are the top five factors driving likelihood of churn at Mobicom
• Roll out targeted proactive retention programs, which include usage enhancing marketing
programs to increase minutes of usage (MOU), rate plan migration, and a bundling strategy
among others.
Key Attributes
• The Internet and recommendation of family and friends.
• falling ARPU
• Usage based promotions to increase minutes of usage (MOU) for both voice and data
• Bundling
• Optimal rate plan
• Artificial churn/spinners or serial churners
Top Line Questions of Interest to Senior Management:
1. What are the top five factors driving likelihood of churn at Mobicom?
2. Validation of survey findings. a) Whether “cost and billing” and “network and service quality”
are important factors influencing churn behavior. b) Are data usage connectivity issues turning
out to be costly? In other words, is it leading to churn?
3. Would you recommend rate plan migration as a proactive retention strategy?
4. What would be your recommendation on how to use this churn model for prioritization of
customers for a proactive retention campaigns in the future?

/* create a permanent library name*/
Libname final "Y:ProgramesSAS Graded AssignmentFinal Case Study";
run;
/*Import the dataset*/
Proc import datafile = "Z:AssignmentsGraded AssignmentTopic 13 - Final
Case Study Implementation/telecomfinal.csv"
DBMS = CSV OUT = final.telecom replace;
DATAROW = 2;
GUESSINGROWS = 2000;
GETNAMES = YES;
Mixed = yes;
SCANTEXT = Yes;
RUN;
DATA EXPLORATION:
/*understand the data what is contained*/
Proc contents data = final.telecom;
run;
Data set "FINAL.telecom" has 66297 observation(s) and 79 variable(s)-Among 45
variables are character in nature and 34 are numeric variables-Need to
convert the character variables to into numeric variables by creating dummy
variables and buckets/bins for the range variables
/*check if there are any missing values and statistical analysis*/
proc means data = final.telecom N NMISS MEAN STD MIN MAX MODE MEDIAN;
run;
/*check how many are dafaulters*/
proc freq data = final.telecom;
table churn;
run;
The WPS System
The FREQ Procedure
churn
churn Frequency Percent Cumulative Frequency Cumulative Percent
0 50438 76.08 50438 76.08
1 15859 23.92 66297 100.00
The data contains There are 76% (50438) of non-defaulters and 24% (15859)of
defaulters
DATA PREPARATION:
/*there are few missing values and can be dropped or done a missing value
treatment*/

DATA final.telecom1;
SET final.telecom;
if income NE 'NA';
inc = income*1;
RUN;
PROC MEANS MEAN DATA = final.telecom1;
CLASS CRCLSCOD;
VAR inc ;
OUTPUT OUT = final.PREP (drop = _type_ _freq_ _stat_ );
RUN ;
PROC SORT DATA = final.telecom;
BY CRCLSCOD;
RUN;
MERGE final.telecom (in = a ) final.prep (in = b );
BY crclscod;
IF A and B ;
RUN;
SET final.telecom2;
IF income = 'NA' THEN income = inc ;
new_income = income * 1 ;
RUN;
/*converting character variables into numeric variables*/
SET final.telecom2;
/* create income bucket based on quartile values */
IF new_income LE 4.9 THEN income_bkt = 1;
ELSE IF new_income LE 6 THEN income_bkt = 2;
ELSE IF new_income LE 7 THEN income_bkt = 3;
ELSE income_bkt = 4 ;
/* create bucket for callwait based on quartile values */
IF callwait_mean LE 0 then callwait_bkt = 1 ;
ELSE IF callwait_mean LE 0.334 THEN callwait_bkt = 2;
ELSE IF callwait_mean LE 1.67 THEN callwait_bkt = 3;
ELSE callwait_bkt = 4;
/* create an indicator variable for roaming */
IF ROAM_MEAN > 0 THEN roam_ind = 1 ;
ELSE roam_ind = 0 ;
/* create drop or block calls buckets using quartile values */
IF DROP_BLK_MEAN LE 1.67 THEN drop_blk_bkt = 1 ;
ELSE IF DROP_BLK_MEAN LE 5.34 THEN drop_blk_bkt = 2;
ELSE IF DROP_BLK_MEAN LE 12.67 THEN drop_blk_bkt = 3;

ELSE drop_blk_bkt = 4;
/* create buckets for placed voie calls using quartile values */
IF PLCD_VCE_MEAN LE 40.67 THEN PLCD_VCE_bkt = 1 ;
ELSE IF PLCD_VCE_MEAN LE 103.67 THEN PLCD_VCE_bkt = 2;
ELSE IF PLCD_VCE_MEAN LE 202.67 THEN PLCD_VCE_bkt = 3;
ELSE PLCD_VCE_bkt = 4;
/* create charactor variables for area of customer */
area_iscity = 0;
area_ismount = 0;
area_isrural = 0;
IF INDEX(area, 'DALLAS') > 0 OR INDEX(area, 'YORK') > 0 OR INDEX(area,
'HOUSTON') > 0 OR INDEX(area, 'ANGELES') > 0 OR
INDEX(area, 'CHICAGO') > 0 OR INDEX(area, 'PHILA') > 0 THEN area_iscity = 1 ;
ELSE IF INDEX(area, 'ROCKY') > 0 THEN area_ismount = 1;
ELSE area_isrural = 1;
/* create charactor variable for asl-flag */
IF asl_flag = 'Y' THEN aslflag = 1 ;
ELSE aslflag = 0;
/* convert to numeric variables */
MRC = totmrc_mean * 1 ;
age = age1 * 1 ;
handset_price = hnd_price * 1 ;
mean_mou = mou_mean * 1;
MOU6AVG = avg6mou * 1 ;
changemou = change_mou * 1 ;
mean_ovrmou = ovrmou_mean * 1 ;
mean_roam = roam_mean * 1;
/* create numeric variable for ethnic */
isasian = 0;
ishisp = 0;
isgerman = 0;
isfrench = 0;
isafro = 0;
IF ETHNIC = 'O' THEN isasian = 1 ;
ELSE IF ETHNIC = 'H' THEN isHisp = 1;
ELSE IF ETHNIC = 'G' THEN isgerman = 1;
ELSE IF ETHNIC = 'F' THEN isfrench = 1;
ELSE IF ETHNIC = 'Z' THEN isafro = 1;
/* create numeric variable for working woman */

woman_ind = 0;
IF wrkwoman = 'Y' THEN woman_ind = 1 ;
/* create numeric variable for new handset */
hnd_new = 0 ;
IF refurb_new = 'N' THEN hnd_new = 1;
/* create numeric variable for new car buyer */
car_new = 0 ;
IF Car_buy = 'New' THEN car_new = 1;
/* create numeric variables for car types */
car_reg = 0 ;
car_up = 0 ;
IF CARTYPE = 'E' THEN car_reg = 1 ;
IF CARTYPE = 'F' THEN car_up = 1;
/* create numeric variables for no childern */
no_child = 0 ;
IF children = 'N' THEN no_child = 1;
/* create numeric variables for credit class A or AA */
credclas_a = 0;
IF STRIP(crclscod) = 'A' THEN credclas_a = 1 ;
IF STRIP(crclscod) = 'AA' THEN credclas_a = 1 ;
/* create numeric variable for dwelling size A */
DWELL_a = 0 ;
IF STRIP(dwllsize) = 'A' THEN dwell_a = 1;
/* create numeric variable for web capable handset */
webcap_ind =0;
IF STRIP(hnd_webcap) = 'WCMB' THEN webcap_ind = 1;
/* create indicator variable for if model used is only 1 */
one_model = 0 ;
IF models = 1 THEN one_model = 1;
/* create indicator variable for retention call made or not */
retcall_ind = 1;
IF STRIP(RETDAYS) = 'NA' THEN retcall_ind = 0 ;
/* create buckets for age of equipment based on quartile values */
IF eqpdays LE 202 THEN eqp_age = 1;

ELSE IF eqpdays LE 326 THEN eqp_age = 2;
ELSE IF eqpdays LE 512 then EQP_AGE = 3 ;
ELSE eqp_age = 4;
/* create buckets for length of relationship based on quartile values */
IF months LE 11 THEN rship_age = 1;
ELSE IF months LE 16 THEN rship_age = 2;
ELSE IF months LE 24 then rship_AGE = 3 ;
ELSE rship_age = 4;
/* create buckets for average minutes of use based on quartile values */
IF avgmou LE 176.67 THEN avgmou_bkt = 1 ;
ELSE IF avgmou LE 362.5 THEN avgmou_bkt = 2;
ELSE IF avgmou LE 660.9 THEN avgmou_bkt = 3;
ELSE avgmou_bkt = 4;
/* create buckets for total calls based on quartile values */
IF TOTCALLS LE 860 THEN totcalls_bkt = 1 ;
ELSE IF avgmou LE 1796 THEN totcalls_bkt = 2;
ELSE IF avgmou LE 3508 THEN totcalls_bkt = 3;
ELSE totcalls_bkt = 4;
/* create buckets for total revenue based on quartile values */
IF TOTREV LE 860 THEN totrev_bkt = 1 ;
ELSE IF TOTREV LE 1796 THEN totrev_bkt = 2;
ELSE IF TOTREV LE 3508 THEN totrev_bkt = 3;
ELSE totrev_bkt = 4;
RUN;
Proc means data = final.telecom2 N NMISS;
run;
SPLITTING THE DATA
/**splitting the data into 2 types validation dataset and training dataset*/
PROC SURVEYSELECT DATA = final.telecom2
METHOD = SRS
OUT = final.sample1
SAMPRATE = 0.5
OUTALL;
RUN;
DATA final.TRAINING final.VALIDATE;
SET final.sample1;
IF selected = 0 THEN OUTPUT final.TRAINING;
ELSE IF selected = 1 THEN OUTPUT final.VALIDATE;
RUN;
MODEL BUILDING: LOGISTIC REGRESSION

/*Model Building -Logistic regression since the target variable is a binary
variable*/
PROC LOGISTIC DATA = final.training DESCENDING;
MODEL churn = AVGMOU AVG3MOU plcd_vce_bkt drop_blk_bkt iwylis_vce_mean
changemou drop_vce_range DROP_VCE_MEAN
area_ismount aslflag mrc age isasian ishisp isafro handset_price mean_ovrmou
mean_roam hnd_new no_child
webcap_ind models actvsubs uniqsubs retcall_ind rship_age eqp_age
totcalls_bkt income_bkt callwait_bkt roam_ind
area_iscity area_isrural mou6avg changemou woman_ind car_new car_reg car_up
credclas_a dwell_a one_model
/ selection = forward ctable lackfit;
run;
OUTPUT
The WPS System
The LOGISTIC Procedure
Model Information
Data Set FINAL.training
Response Variable churn
Number of Response Levels 2
Model binary logit
Optimisation Technique Fisher's scoring
Number of Observations Read 33142
Number of Observations Used 31124
Response Profile
Ordered Value churn Total Frequency
1 1 7533
2 0 23591
Probability modeled is churn='1'.
Forward Selection Procedure
Step 0. Intercept entered:

Model Convergence Status
Convergence criterion (GCONV=1e-008) satisfied.
-2 Log L = 34448.7
Residual Chi-Square Test
Chi-Square DF Pr > Chi-Square
1230.2406 40 <.0001
Step 1. Effect eqp_age entered:
Model Fit Statistics
Criterion Intercept only Intercept and Covariates
AIC 34450.711 34053.568
SC 34459.057 34070.259
-2 Log L 34448.711 34049.568
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > Chi-Square
Likelihood Ratio 399.1436 1 <.0001
Score 395.7067 1 <.0001
Wald 391.0589 1 <.0001
869.4213 39 <.0001
Step 2. Effect retcall_ind entered:

AIC 34450.711 33903.597
SC 34459.057 33928.634
-2 Log L 34448.711 33897.597
Score 562.4510 2 <.0001
Wald 544.8388 2 <.0001
702.2174 38 <.0001
Step 3. Effect age entered:
AIC 34450.711 33840.976
SC 34459.057 33874.359
-2 Log L 34448.711 33832.976

Score 624.9492 3 <.0001
Wald 604.6355 3 <.0001
635.7610 37 <.0001
Step 4. Effect aslflag entered:
AIC 34450.711 33768.823
SC 34459.057 33810.552
-2 Log L 34448.711 33758.823
Score 693.0572 4 <.0001
Wald 669.3955 4 <.0001
565.7904 36 <.0001
Step 5. Effect handset_price entered:

AIC 34450.711 33705.967
SC 34459.057 33756.041
-2 Log L 34448.711 33693.967
Score 757.9076 5 <.0001
Wald 731.6708 5 <.0001
501.6692 35 <.0001
Step 6. Effect rship_age entered:
AIC 34450.711 33660.584
SC 34459.057 33719.004
-2 Log L 34448.711 33646.584
Score 794.9744 6 <.0001

Wald 765.1791 6 <.0001
458.2472 34 <.0001
Step 7. Effect totcalls_bkt entered:
AIC 34450.711 33622.403
SC 34459.057 33689.169
-2 Log L 34448.711 33606.403
Score 834.0448 7 <.0001
Wald 802.5407 7 <.0001
410.6622 33 <.0001
Step 8. Effect MRC entered:

AIC 34450.711 33575.554
SC 34459.057 33650.665
-2 Log L 34448.711 33557.554
Score 880.3880 8 <.0001
Wald 846.4889 8 <.0001
366.4575 32 <.0001
Step 9. Effect changemou entered:
AIC 34450.711 33548.686
SC 34459.057 33632.143
-2 Log L 34448.711 33528.686
Score 906.2857 9 <.0001

Wald 870.2710 9 <.0001
336.0364 31 <.0001
Step 10. Effect uniqsubs entered:
AIC 34450.711 33527.000
SC 34459.057 33618.803
-2 Log L 34448.711 33505.000
Score 930.9318 10 <.0001
Wald 893.3402 10 <.0001
311.0257 30 <.0001
Step 11. Effect PLCD_VCE_bkt entered:

AIC 34450.711 33506.811
SC 34459.057 33606.960
-2 Log L 34448.711 33482.811
Score 952.5994 11 <.0001
Wald 913.4428 11 <.0001
290.1271 29 <.0001
Step 12. Effect drop_vce_Range entered:
AIC 34450.711 33474.223
SC 34459.057 33582.717
-2 Log L 34448.711 33448.223
Score 986.8861 12 <.0001

Wald 944.0560 12 <.0001
252.7692 28 <.0001
Step 13. Effect mean_ovrmou entered:
AIC 34450.711 33452.631
SC 34459.057 33569.471
-2 Log L 34448.711 33424.631
Score 1011.7401 13 <.0001
Wald 965.7763 13 <.0001
226.4365 27 <.0001
Step 14. Effect avg3mou entered:

AIC 34450.711 33432.064
SC 34459.057 33557.250
-2 Log L 34448.711 33402.064
Score 1032.0648 14 <.0001
Wald 983.6727 14 <.0001
206.7282 26 <.0001
Step 15. Effect avgmou entered:
AIC 34450.711 33377.519
SC 34459.057 33511.051
-2 Log L 34448.711 33345.519
Score 1083.4464 15 <.0001

Wald 1029.5633 15 <.0001
151.0034 25 <.0001
Step 16. Effect isafro entered:
AIC 34450.711 33356.790
SC 34459.057 33498.668
-2 Log L 34448.711 33322.790
Score 1103.2109 16 <.0001
Wald 1047.7040 16 <.0001
129.2814 24 <.0001
Step 17. Effect hnd_new entered:

AIC 34450.711 33344.370
SC 34459.057 33494.593
-2 Log L 34448.711 33308.370
Score 1116.1373 17 <.0001
Wald 1059.3577 17 <.0001
114.7804 23 <.0001
Step 18. Effect area_ismount entered:
AIC 34450.711 33333.877
SC 34459.057 33492.446
-2 Log L 34448.711 33295.877
Score 1129.0573 18 <.0001

Wald 1071.1611 18 <.0001
101.9676 22 <.0001
Step 19. Effect isasian entered:
AIC 34450.711 33323.823
SC 34459.057 33490.737
-2 Log L 34448.711 33283.823
Score 1141.2488 19 <.0001
Wald 1082.3016 19 <.0001
89.7274 21 <.0001
Step 20. Effect drop_blk_bkt entered:

AIC 34450.711 33313.558
SC 34459.057 33488.819
-2 Log L 34448.711 33271.558
Score 1153.5902 20 <.0001
Wald 1094.5598 20 <.0001
77.6578 20 <.0001
Step 21. Effect mean_roam entered:
AIC 34450.711 33307.704
SC 34459.057 33491.310
-2 Log L 34448.711 33263.704
Score 1161.6724 21 <.0001

Wald 1100.4173 21 <.0001
66.8699 19 <.0001
Step 22. Effect actvsubs entered:
AIC 34450.711 33299.275
SC 34459.057 33491.227
-2 Log L 34448.711 33253.275
Score 1173.2964 22 <.0001
Wald 1110.5652 22 <.0001
56.4051 18 <.0001
Step 23. Effect ishisp entered:

AIC 34450.711 33293.307
SC 34459.057 33493.604
-2 Log L 34448.711 33245.307
Score 1180.8883 23 <.0001
Wald 1117.5361 23 <.0001
48.3565 17 <.0001
Step 24. Effect webcap_ind entered:
AIC 34450.711 33287.710
SC 34459.057 33496.353
-2 Log L 34448.711 33237.710
Score 1192.3206 24 <.0001

Wald 1127.6673 24 <.0001
40.7115 16 0.0006
Step 25. Effect models entered:
AIC 34450.711 33281.750
SC 34459.057 33498.739
-2 Log L 34448.711 33229.750
Score 1200.0611 25 <.0001
Wald 1134.4058 25 <.0001
32.6353 15 0.0053
Step 26. Effect car_reg entered:

AIC 34450.711 33277.031
SC 34459.057 33502.366
-2 Log L 34448.711 33223.031
Score 1206.1531 26 <.0001
Wald 1140.2571 26 <.0001
26.0477 14 0.0255
Step 27. Effect drop_vce_Mean entered:
AIC 34450.711 33273.335
SC 34459.057 33507.015
-2 Log L 34448.711 33217.335
Score 1211.3233 27 <.0001

Wald 1144.6124 27 <.0001
20.1387 13 0.0918
Step 28. Effect no_child entered:
AIC 34450.711 33270.459
SC 34459.057 33512.485
-2 Log L 34448.711 33212.459
Score 1216.0508 28 <.0001
Wald 1149.1918 28 <.0001
15.3170 12 0.2246
Step 29. Effect income_bkt entered:

AIC 34450.711 33268.335
SC 34459.057 33518.707
-2 Log L 34448.711 33208.335
Score 1220.1970 29 <.0001
Wald 1153.1296 29 <.0001
11.2005 11 0.4266
NOTE: No further effects met the 0.05 significance level for entry into the model.
Summary of Forward Selection
Step Effect Entered DF Number of Effects
In Model
Score Chi-
Square
Pr > Chi-
Square
Variable Label
1 eqp_age 1 1 395.7067 <.0001
2 retcall_ind 1 2 170.9108 <.0001
3 age 1 3 64.8426 <.0001
4 aslflag 1 4 71.1051 <.0001
5 handset_price 1 5 64.1500 <.0001
6 rship_age 1 6 47.0868 <.0001
7 totcalls_bkt 1 7 39.8896 <.0001
8 MRC 1 8 47.6251 <.0001
9 changemou 1 9 28.8981 <.0001
10 uniqsubs 1 10 24.2311 <.0001 uniqsubs
11 PLCD_VCE_bkt 1 11 22.2594 <.0001

12 drop_vce_Range 1 12 36.9328 <.0001 drop_vce_Range
13 mean_ovrmou 1 13 24.9213 <.0001
14 avg3mou 1 14 22.1009 <.0001 avg3mou
15 avgmou 1 15 57.5227 <.0001 avgmou
16 isafro 1 16 21.6597 <.0001
17 hnd_new 1 17 14.5750 0.0001
18 area_ismount 1 18 12.9007 0.0003
19 isasian 1 19 12.4371 0.0004
20 drop_blk_bkt 1 20 12.2865 0.0005
21 mean_roam 1 21 10.8246 0.0010
22 actvsubs 1 22 10.4765 0.0012 actvsubs
23 ishisp 1 23 8.0629 0.0045
24 webcap_ind 1 24 7.6003 0.0058
25 models 1 25 8.0903 0.0045 models
26 car_reg 1 26 6.5676 0.0104
27 drop_vce_Mean 1 27 5.8710 0.0154 drop_vce_Mean
28 no_child 1 28 4.8187 0.0282
29 income_bkt 1 29 4.1281 0.0422
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard Error Wald Chi-Square Pr > ChiSq
Intercept 1 -1.3167 0.1085 147.3225 <.0001
avgmou 1 0.000567 0.000078 53.2636 <.0001
avg3mou 1 -0.00068 0.000075 82.8997 <.0001
PLCD_VCE_bkt 1 -0.1141 0.0230 24.5171 <.0001
drop_blk_bkt 1 0.0611 0.0215 8.0988 0.0044
changemou 1 -0.00038 0.000059 42.5105 <.0001
drop_vce_Range 1 0.00392 0.00245 2.5628 0.1094
drop_vce_Mean 1 0.00604 0.00253 5.7054 0.0169
area_ismount 1 0.2440 0.0659 13.6847 0.0002
aslflag 1 -0.4300 0.0478 80.7744 <.0001
MRC 1 -0.00317 0.000745 18.0453 <.0001
age 1 -0.00447 0.000658 46.0351 <.0001

isasian 1 0.2457 0.0663 13.7231 0.0002
ishisp 1 0.1124 0.0402 7.8147 0.0052
isafro 1 -0.2919 0.0699 17.4148 <.0001
handset_price 1 -0.00135 0.000337 15.9370 <.0001
mean_ovrmou 1 0.00144 0.000178 66.1632 <.0001
mean_roam 1 0.00334 0.00128 6.7700 0.0093
hnd_new 1 -0.1426 0.0445 10.2935 0.0013
no_child 1 -0.0995 0.0476 4.3705 0.0366
webcap_ind 1 -0.1293 0.0436 8.8068 0.0030
models 1 0.0647 0.0228 8.0781 0.0045
actvsubs 1 -0.1140 0.0353 10.4459 0.0012
uniqsubs 1 0.1315 0.0252 27.1989 <.0001
retcall_ind 1 0.7776 0.0678 131.5201 <.0001
rship_age 1 -0.1470 0.0187 61.9290 <.0001
eqp_age 1 0.2469 0.0192 165.0991 <.0001
totcalls_bkt 1 0.2419 0.0401 36.3146 <.0001
income_bkt 1 0.0285 0.0140 4.1276 0.0422
car_reg 1 -0.1449 0.0612 5.6024 0.0179
Odds Ratio Estimates
Effect Point
Estimate
Lower 95% Wald Confidence
Limit
Upper 95% Wald Confidence
Limit
avgmou 1.001 1.000 1.001
avg3mou 0.999 0.999 0.999
PLCD_VCE_bkt 0.892 0.853 0.933
drop_blk_bkt 1.063 1.019 1.109
changemou 1.000 1.000 1.000
drop_vce_Range 1.004 0.999 1.009
drop_vce_Mean 1.006 1.001 1.011
area_ismount 1.276 1.122 1.452
aslflag 0.650 0.592 0.714
MRC 0.997 0.995 0.998
age 0.996 0.994 0.997
isasian 1.279 1.123 1.456

ishisp 1.119 1.034 1.211
isafro 0.747 0.651 0.857
handset_price 0.999 0.998 0.999
mean_ovrmou 1.001 1.001 1.002
mean_roam 1.003 1.001 1.006
hnd_new 0.867 0.795 0.946
no_child 0.905 0.825 0.994
webcap_ind 0.879 0.807 0.957
models 1.067 1.020 1.116
actvsubs 0.892 0.833 0.956
uniqsubs 1.141 1.086 1.198
retcall_ind 2.176 1.905 2.486
rship_age 0.863 0.832 0.896
eqp_age 1.280 1.233 1.329
totcalls_bkt 1.274 1.177 1.378
income_bkt 1.029 1.001 1.058
car_reg 0.865 0.767 0.975
Association of Predicted Probabilities and Observed Responses
Percent Concordant 63.1 Somer's D 0.268
Percent Discordant 36.3 Gamma 0.269
Percent Tied 0.7 Tau-a 0.098
Pairs 1.7771E8 c 0.634
Partition for the Hosmer and Lemeshow Test
Group TotalObserved Events Expected Events Observed Nonevents Expected Nonevents
1 3112 364 356.00 2748 2756.00
2 3112 453 479.31 2659 2632.69
3 3112 559 560.51 2553 2551.49
4 3112 598 632.22 2514 2479.78
5 3112 679 698.03 2433 2413.97
6 3112 781 764.87 2331 2347.13

7 3112 825 834.14 2287 2277.86
8 3112 947 913.01 2165 2198.99
9 3112 1028 1016.43 2084 2095.57
10 3116 1299 1278.48 1817 1837.52
Hosmer and Lemeshow Goodness-of-Fit Test
8.0413 8 0.4294
Classification Table
Prob
Level
Correct
Events
Correct
Non-
events
Incorrect
Events
Incorrect
Non-
events
Percentage
Correct
Sensi-
tivity
Speci-
ficity
False
POS
False
NEG
0.020 7533 0 23591 0 24.2 100.0 0.0 75.8 .
0.040 7531 2 23589 2 24.2 100.0 0.0 75.8 50.0
0.060 7526 31 23560 7 24.3 99.9 0.1 75.8 18.4
0.080 7510 172 23419 23 24.7 99.7 0.7 75.7 11.8
0.100 7456 589 23002 77 25.8 99.0 2.5 75.5 11.6
0.120 7340 1438 22153 193 28.2 97.4 6.1 75.1 11.8
0.140 7147 2811 20780 386 32.0 94.9 11.9 74.4 12.1
0.160 6842 4615 18976 691 36.8 90.8 19.6 73.5 13.0
0.180 6436 6644 16947 1097 42.0 85.4 28.2 72.5 14.2
0.200 5921 8809 14782 1612 47.3 78.6 37.3 71.4 15.5
0.220 5330 11159 12432 2203 53.0 70.8 47.3 70.0 16.5
0.240 4690 13448 10143 2843 58.3 62.3 57.0 68.4 17.5
0.260 3950 15571 8020 3583 62.7 52.4 66.0 67.0 18.7
0.280 3231 17517 6074 4302 66.7 42.9 74.3 65.3 19.7
0.300 2575 19105 4486 4958 69.7 34.2 81.0 63.5 20.6
0.320 1970 20406 3185 5563 71.9 26.2 86.5 61.8 21.4
0.340 1457 21401 2190 6076 73.4 19.3 90.7 60.0 22.1
0.360 1097 22124 1467 6436 74.6 14.6 93.8 57.2 22.5
0.380 780 22570 1021 6753 75.0 10.4 95.7 56.7 23.0
0.400 586 22897 694 6947 75.4 7.8 97.1 54.2 23.3

0.420 431 23091 500 7102 75.6 5.7 97.9 53.7 23.5
0.440 315 23219 372 7218 75.6 4.2 98.4 54.1 23.7
0.460 245 23305 286 7288 75.7 3.3 98.8 53.9 23.8
0.480 189 23375 216 7344 75.7 2.5 99.1 53.3 23.9
0.500 152 23421 170 7381 75.7 2.0 99.3 52.8 24.0
0.520 109 23465 126 7424 75.7 1.4 99.5 53.6 24.0
0.540 81 23501 90 7452 75.8 1.1 99.6 52.6 24.1
0.560 62 23525 66 7471 75.8 0.8 99.7 51.6 24.1
0.580 38 23537 54 7495 75.7 0.5 99.8 58.7 24.2
0.600 23 23554 37 7510 75.8 0.3 99.8 61.7 24.2
0.620 17 23563 28 7516 75.8 0.2 99.9 62.2 24.2
0.640 8 23571 20 7525 75.8 0.1 99.9 71.4 24.2
0.660 6 23578 13 7527 75.8 0.1 99.9 68.4 24.2
0.680 5 23579 12 7528 75.8 0.1 99.9 70.6 24.2
0.700 4 23580 11 7529 75.8 0.1 100.0 73.3 24.2
0.720 2 23582 9 7531 75.8 0.0 100.0 81.8 24.2
0.740 2 23584 7 7531 75.8 0.0 100.0 77.8 24.2
0.760 2 23586 5 7531 75.8 0.0 100.0 71.4 24.2
0.780 1 23587 4 7532 75.8 0.0 100.0 80.0 24.2
0.800 1 23590 1 7532 75.8 0.0 100.0 50.0 24.2
0.820 0 23590 1 7533 75.8 0.0 100.0 100.0 24.2
0.840 0 23590 1 7533 75.8 0.0 100.0 100.0 24.2
0.860 0 23590 1 7533 75.8 0.0 100.0 100.0 24.2
0.880 0 23590 1 7533 75.8 0.0 100.0 100.0 24.2
0.900 0 23590 1 7533 75.8 0.0 100.0 100.0 24.2
0.920 0 23590 1 7533 75.8 0.0 100.0 100.0 24.2
0.940 0 23590 1 7533 75.8 0.0 100.0 100.0 24.2
0.960 0 23590 1 7533 75.8 0.0 100.0 100.0 24.2
0.980 0 23591 0 7533 75.8 0.0 100.0 . 24.2
VALIDATING THE DATASET FOR THE MODEL BUILDING
/*checking the how good the model is on validate dataset**/
Proc logistic data = final.validate descending;
MODEL churn = AVGMOU AVG3MOU plcd_vce_bkt drop_blk_bkt iwylis_vce_mean
changemou drop_vce_range DROP_VCE_MEAN

webcap_ind models actvsubs uniqsubs retcall_ind rship_age eqp_age
totcalls_bkt income_bkt callwait_bkt roam_ind
area_iscity area_isrural mou6avg changemou woman_ind car_new car_reg car_up
credclas_a dwell_a one_model
/ selection = forward ctable lackfit;
run;
The WPS System
Model Information
Data Set FINAL.validate
Model binary logit
Response Profile
1 1 7451
2 0 23598
Forward Selection Procedure
Step 0. Intercept entered:
-2 Log L = 34219.2

1177.5278 40 <.0001
Step 1. Effect eqp_age entered:
AIC 34221.209 33920.358
SC 34229.553 33937.045
-2 Log L 34219.209 33916.358
Score 300.8859 1 <.0001
Wald 298.2020 1 <.0001
910.4723 39 <.0001
Step 2. Effect retcall_ind entered:
AIC 34221.209 33831.040

SC 34229.553 33856.070
-2 Log L 34219.209 33825.040
Score 399.1679 2 <.0001
Wald 391.3533 2 <.0001
810.8576 38 <.0001
Step 3. Effect uniqsubs entered:
AIC 34221.209 33767.651
SC 34229.553 33801.024
-2 Log L 34219.209 33759.651
Score 465.5142 3 <.0001
Wald 455.2268 3 <.0001

741.9184 37 <.0001
Step 4. Effect hnd_new entered:
AIC 34221.209 33710.449
SC 34229.553 33752.166
-2 Log L 34219.209 33700.449
Score 524.4062 4 <.0001
Wald 512.1788 4 <.0001
682.1505 36 <.0001
Step 5. Effect age entered:
AIC 34221.209 33652.194

SC 34229.553 33702.254
-2 Log L 34219.209 33640.194
Score 583.4429 5 <.0001
Wald 568.6121 5 <.0001
620.2075 35 <.0001
Step 6. Effect isafro entered:
AIC 34221.209 33610.162
SC 34229.553 33668.566
-2 Log L 34219.209 33596.162
Score 621.8572 6 <.0001
Wald 604.7424 6 <.0001

580.9746 34 <.0001
Step 7. Effect MRC entered:
AIC 34221.209 33571.264
SC 34229.553 33638.010
-2 Log L 34219.209 33555.264
Score 660.2436 7 <.0001
Wald 641.6473 7 <.0001
549.5600 33 <.0001
Step 8. Effect totcalls_bkt entered:
AIC 34221.209 33521.110

SC 34229.553 33596.200
-2 Log L 34219.209 33503.110
Score 712.0775 8 <.0001
Wald 691.3940 8 <.0001
488.3136 32 <.0001
Step 9. Effect rship_age entered:
AIC 34221.209 33482.650
SC 34229.553 33566.084
-2 Log L 34219.209 33462.650
Score 745.7204 9 <.0001
Wald 722.2831 9 <.0001

448.0189 31 <.0001
Step 10. Effect aslflag entered:
AIC 34221.209 33428.934
SC 34229.553 33520.710
-2 Log L 34219.209 33406.934
Score 795.7781 10 <.0001
Wald 770.1978 10 <.0001
394.1559 30 <.0001
Step 11. Effect webcap_ind entered:
AIC 34221.209 33394.309

SC 34229.553 33494.429
-2 Log L 34219.209 33370.309
Score 837.0318 11 <.0001
Wald 808.4814 11 <.0001
357.0971 29 <.0001
Step 12. Effect isasian entered:
AIC 34221.209 33364.242
SC 34229.553 33472.705
-2 Log L 34219.209 33338.242
Score 871.6385 12 <.0001
Wald 840.3767 12 <.0001

323.6032 28 <.0001
Step 13. Effect mean_ovrmou entered:
AIC 34221.209 33339.703
SC 34229.553 33456.510
-2 Log L 34219.209 33311.703
Score 898.1250 13 <.0001
Wald 863.6643 13 <.0001
291.9235 27 <.0001
Step 14. Effect avg3mou entered:
AIC 34221.209 33291.647

SC 34229.553 33416.797
-2 Log L 34219.209 33261.647
Score 945.0097 14 <.0001
Wald 905.9672 14 <.0001
249.6822 26 <.0001
Step 15. Effect avgmou entered:
AIC 34221.209 33225.748
SC 34229.553 33359.241
-2 Log L 34219.209 33193.748
Score 1006.0867 15 <.0001
Wald 960.8460 15 <.0001

181.8526 25 <.0001
Step 16. Effect drop_vce_Range entered:
AIC 34221.209 33202.698
SC 34229.553 33344.535
-2 Log L 34219.209 33168.698
Score 1030.8308 16 <.0001
Wald 982.0499 16 <.0001
155.1997 24 <.0001
Step 17. Effect changemou entered:
AIC 34221.209 33180.918

SC 34229.553 33331.098
-2 Log L 34219.209 33144.918
Score 1044.9558 17 <.0001
Wald 1000.0754 17 <.0001
191.6930 23 <.0001
Step 18. Effect mean_roam entered:
AIC 34221.209 33150.774
SC 34229.553 33309.297
-2 Log L 34219.209 33112.774
Score 1082.5402 18 <.0001
Wald 1025.8762 18 <.0001

93.8315 22 <.0001
Step 19. Effect actvsubs entered:
AIC 34221.209 33134.679
SC 34229.553 33301.545
-2 Log L 34219.209 33094.679
Score 1102.7041 19 <.0001
Wald 1042.4785 19 <.0001
75.7138 21 <.0001
Step 20. Effect one_model entered:
AIC 34221.209 33123.382

SC 34229.553 33298.591
-2 Log L 34219.209 33081.382
Score 1115.2578 20 <.0001
Wald 1053.8346 20 <.0001
62.3528 20 <.0001
Step 21. Effect PLCD_VCE_bkt entered:
AIC 34221.209 33113.067
SC 34229.553 33296.620
-2 Log L 34219.209 33069.067
Score 1131.0524 21 <.0001
Wald 1066.8021 21 <.0001

50.0641 19 0.0001
Step 22. Effect drop_blk_bkt entered:
AIC 34221.209 33102.376
SC 34229.553 33294.273
-2 Log L 34219.209 33056.376
Score 1143.5578 22 <.0001
Wald 1079.6047 22 <.0001
37.4052 18 0.0046
Step 23. Effect area_ismount entered:
AIC 34221.209 33094.139

SC 34229.553 33294.378
-2 Log L 34219.209 33046.139
Score 1153.7958 23 <.0001
Wald 1089.2678 23 <.0001
26.8637 17 0.0601
Step 24. Effect handset_price entered:
AIC 34221.209 33089.320
SC 34229.553 33297.903
-2 Log L 34219.209 33039.320
Score 1159.0711 24 <.0001
Wald 1094.2510 24 <.0001

20.0882 16 0.2163
NOTE: No further effects met the 0.05 significance level for entry into the model.
Summary of Forward Selection
Step Effect Entered DF Number of Effects
In Model
Score Chi-
Square
Pr > Chi-
Square
Variable Label
1 eqp_age 1 1 300.8859 <.0001
2 retcall_ind 1 2 101.1738 <.0001
3 uniqsubs 1 3 68.1115 <.0001 uniqsubs
4 hnd_new 1 4 61.1777 <.0001
5 age 1 5 60.4609 <.0001
6 isafro 1 6 40.8613 <.0001
7 MRC 1 7 39.8038 <.0001
8 totcalls_bkt 1 8 51.5586 <.0001
9 rship_age 1 9 40.1530 <.0001
10 aslflag 1 10 54.0726 <.0001
11 webcap_ind 1 11 36.9455 <.0001
12 isasian 1 12 33.6742 <.0001
13 mean_ovrmou 1 13 28.5869 <.0001
14 avg3mou 1 14 48.8272 <.0001 avg3mou
15 avgmou 1 15 69.2579 <.0001 avgmou
16 drop_vce_Range 1 16 25.9556 <.0001 drop_vce_Range
17 changemou 1 17 19.6575 <.0001
18 mean_roam 1 18 99.2732 <.0001
19 actvsubs 1 19 18.1593 <.0001 actvsubs
20 one_model 1 20 13.3775 0.0003
21 PLCD_VCE_bkt 1 21 12.3672 0.0004
22 drop_blk_bkt 1 22 12.7122 0.0004
23 area_ismount 1 23 10.5404 0.0012
24 handset_price 1 24 6.7774 0.0092

Intercept 1 -1.1731 0.1031 129.5428 <.0001
avgmou 1 0.000630 0.000077 66.5896 <.0001
avg3mou 1 -0.00079 0.000074 114.0789 <.0001
PLCD_VCE_bkt 1 -0.1139 0.0229 24.7906 <.0001
drop_blk_bkt 1 0.0767 0.0209 13.4598 0.0002
changemou 1 -0.00036 0.000057 40.3457 <.0001
drop_vce_Range 1 0.00729 0.00186 15.3337 <.0001
area_ismount 1 0.2089 0.0650 10.3340 0.0013
aslflag 1 -0.3740 0.0468 63.9150 <.0001
MRC 1 -0.00311 0.000745 17.4200 <.0001
age 1 -0.00415 0.000640 42.0982 <.0001
isasian 1 0.3933 0.0638 37.9813 <.0001
isafro 1 -0.4568 0.0741 38.0207 <.0001
handset_price 1 -0.00086 0.000332 6.7756 0.0092
mean_ovrmou 1 0.00169 0.000178 90.5169 <.0001
mean_roam 1 0.00456 0.00108 17.9054 <.0001
hnd_new 1 -0.1723 0.0453 14.4472 0.0001
webcap_ind 1 -0.1666 0.0441 14.2509 0.0002
actvsubs 1 -0.1461 0.0351 17.2755 <.0001
uniqsubs 1 0.1731 0.0252 47.1136 <.0001
retcall_ind 1 0.6046 0.0718 70.9121 <.0001
rship_age 1 -0.1571 0.0189 69.4638 <.0001
eqp_age 1 0.2441 0.0200 149.6371 <.0001
totcalls_bkt 1 0.3069 0.0401 58.5628 <.0001
one_model 1 -0.1718 0.0439 15.3481 <.0001
Effect Point
Estimate
Limit
Limit
avgmou 1.001 1.000 1.001
avg3mou 0.999 0.999 0.999

PLCD_VCE_bkt 0.892 0.853 0.933
drop_blk_bkt 1.080 1.036 1.125
changemou 1.000 1.000 1.000
drop_vce_Range 1.007 1.004 1.011
area_ismount 1.232 1.085 1.400
aslflag 0.688 0.628 0.754
MRC 0.997 0.995 0.998
age 0.996 0.995 0.997
isasian 1.482 1.308 1.679
isafro 0.633 0.548 0.732
handset_price 0.999 0.998 1.000
mean_ovrmou 1.002 1.001 1.002
mean_roam 1.005 1.002 1.007
hnd_new 0.842 0.770 0.920
webcap_ind 0.847 0.776 0.923
actvsubs 0.864 0.807 0.926
uniqsubs 1.189 1.132 1.249
retcall_ind 1.830 1.590 2.107
rship_age 0.855 0.824 0.887
eqp_age 1.277 1.228 1.327
totcalls_bkt 1.359 1.256 1.470
one_model 0.842 0.773 0.918
Pairs 1.7583E8 c 0.632
1 3105 365 356.19 2740 2748.81

2 3105 451 484.85 2654 2620.15
3 3105 511 562.65 2594 2542.35
4 3105 632 628.34 2473 2476.66
5 3106 722 691.36 2384 2414.64
6 3105 734 752.62 2371 2352.38
7 3105 816 820.63 2289 2284.37
8 3105 952 897.77 2153 2207.23
9 3105 1004 999.40 2101 2105.60
10 3103 1264 1257.20 1839 1845.80
15.9554 8 0.0430
Prob
Level
Correct
Events
Correct
Non-
events
Incorrect
Events
Incorrect
Non-
events
Percentage
Correct
Sensi-
tivity
Speci-
ficity
False
POS
False
NEG
0.020 7451 0 23598 0 24.0 100.0 0.0 76.0 .
0.040 7447 5 23593 4 24.0 99.9 0.0 76.0 44.4
0.060 7440 47 23551 11 24.1 99.9 0.2 76.0 19.0
0.080 7425 215 23383 26 24.6 99.7 0.9 75.9 10.8
0.100 7376 595 23003 75 25.7 99.0 2.5 75.7 11.2
0.120 7262 1403 22195 189 27.9 97.5 5.9 75.3 11.9
0.140 7091 2659 20939 360 31.4 95.2 11.3 74.7 11.9
0.160 6799 4430 19168 652 36.2 91.2 18.8 73.8 12.8
0.180 6408 6537 17061 1043 41.7 86.0 27.7 72.7 13.8
0.200 5863 8921 14677 1588 47.6 78.7 37.8 71.5 15.1
0.220 5210 11324 12274 2241 53.3 69.9 48.0 70.2 16.5
0.240 4481 13762 9836 2970 58.8 60.1 58.3 68.7 17.8
0.260 3770 15918 7680 3681 63.4 50.6 67.5 67.1 18.8
0.280 3047 17835 5763 4404 67.3 40.9 75.6 65.4 19.8
0.300 2358 19399 4199 5093 70.1 31.6 82.2 64.0 20.8

0.320 1784 20681 2917 5667 72.4 23.9 87.6 62.1 21.5
0.340 1346 21538 2060 6105 73.7 18.1 91.3 60.5 22.1
0.360 985 22179 1419 6466 74.6 13.2 94.0 59.0 22.6
0.380 719 22609 989 6732 75.1 9.6 95.8 57.9 22.9
0.400 525 22893 705 6926 75.4 7.0 97.0 57.3 23.2
0.420 383 23125 473 7068 75.7 5.1 98.0 55.3 23.4
0.440 281 23256 342 7170 75.8 3.8 98.6 54.9 23.6
0.460 211 23364 234 7240 75.9 2.8 99.0 52.6 23.7
0.480 158 23436 162 7293 76.0 2.1 99.3 50.6 23.7
0.500 125 23479 119 7326 76.0 1.7 99.5 48.8 23.8
0.520 91 23503 95 7360 76.0 1.2 99.6 51.1 23.8
0.540 65 23528 70 7386 76.0 0.9 99.7 51.9 23.9
0.560 49 23549 49 7402 76.0 0.7 99.8 50.0 23.9
0.580 34 23562 36 7417 76.0 0.5 99.8 51.4 23.9
0.600 25 23571 27 7426 76.0 0.3 99.9 51.9 24.0
0.620 17 23580 18 7434 76.0 0.2 99.9 51.4 24.0
0.640 14 23581 17 7437 76.0 0.2 99.9 54.8 24.0
0.660 11 23586 12 7440 76.0 0.1 99.9 52.2 24.0
0.680 8 23588 10 7443 76.0 0.1 100.0 55.6 24.0
0.700 6 23592 6 7445 76.0 0.1 100.0 50.0 24.0
0.720 4 23592 6 7447 76.0 0.1 100.0 60.0 24.0
0.740 4 23592 6 7447 76.0 0.1 100.0 60.0 24.0
0.760 4 23592 6 7447 76.0 0.1 100.0 60.0 24.0
0.780 2 23593 5 7449 76.0 0.0 100.0 71.4 24.0
0.800 2 23594 4 7449 76.0 0.0 100.0 66.7 24.0
0.820 1 23594 4 7450 76.0 0.0 100.0 80.0 24.0
0.840 1 23595 3 7450 76.0 0.0 100.0 75.0 24.0
0.860 0 23595 3 7451 76.0 0.0 100.0 100.0 24.0
0.880 0 23595 3 7451 76.0 0.0 100.0 100.0 24.0
0.900 0 23597 1 7451 76.0 0.0 100.0 100.0 24.0
0.920 0 23597 1 7451 76.0 0.0 100.0 100.0 24.0
0.940 0 23597 1 7451 76.0 0.0 100.0 100.0 24.0
0.960 0 23597 1 7451 76.0 0.0 100.0 100.0 24.0
0.980 0 23597 1 7451 76.0 0.0 100.0 100.0 24.0

1.000 0 23598 0 7451 76.0 0.0 100.0 . 24.0
/**Generating gain chart*/
PROC LOGISTIC DATA = final.telecom2 DESCENDING OUTMODEL = DMM ;
MODEL CHURN = AVGMOU AVG3MOU plcd_vce_bkt drop_blk_bkt changemou
drop_vce_range DROP_VCE_MEAN
webcap_ind actvsubs uniqsubs retcall_ind rship_age eqp_age totcalls_bkt
one_model / ctable lackfit;
SCORE OUT = DMP;
RUN;
The WPS System
Model Information
Data Set FINAL.telecom2
Model binary logit
Model: This is Binary logit regression model that was fit to the data
Optimization Technique: Fishers’ scoring is the iterative method of estimating the regression parameters
The number of observations are used less than the number of observations read since there are missing
values for variables used
Response Profile
1 1 15317
2 0 48806

Ordered value: A descending order high to low is treated such that when the logit regression coefficients
corresponds to a positive relationship for high write status, and a negative coefficients has a negative
relationship with high write status . The total frequency of probability of churn is 15317 (24%) and non-
probability of churn is 48806 (76%)
The default criterion is used to assess is the relative gradient convergence criterion (GCONV) and the
default precision is 10^-8. In this model the convergence criterion satisfied
AIC 70508.162 68071.173
SC 70517.230 68325.093
-2 Log L 70506.162 68015.173
AIC: Akaike Information Criterion is used for the comparison of non nested models on the same sample.
The model with low AIC will be the best.
SC: if The Schwarz Criterion is small and that is most desirable
AIC and SC penalize the log- likelihood by the number of predictors in the model
The -2 Log L is used in hypothesis tests for nested models and this is negative two times the log likelihood
Intercept and Covariates: A fitted model includes all independent variables and the intercept. This can be
compared the values in this column with the criteria corresponding intercept only value to assess
modelfit/significance
Score 2453.9078 27 <.0001
Wald 2316.9133 27 <.0001
The Global null hypothesis test against null hypothesis that at least one of the predictor’s regression
coefficient is not equal to zero
Likelihood Ratio, score and wald test that at least one of the predictor’s regression coefficient is not equal to
zero
Calculations: - 2 Log L (Null model which is intercept only) – 2 Log L (fitted model which is intercept and
covariates) which 2490.9885
PR>Chi-Square: it is compared to specified alpha level and accept the a type 1 error. The small p value lead
to conclude that at least one of the regression coefficients in the model is not equal to zero

Intercept 1 -1.1379 0.0715 253.0321 <.0001
avgmou 1 0.000602 0.000054 123.5508 <.0001
avg3mou 1 -0.00073 0.000053 193.1890 <.0001
PLCD_VCE_bkt 1 -0.1156 0.0160 52.0034 <.0001
drop_blk_bkt 1 0.0654 0.0149 19.2666 <.0001
changemou 1 -0.00035 0.000039 81.3616 <.0001
drop_vce_Range 1 0.00408 0.00166 6.0576 0.0138
drop_vce_Mean 1 0.00500 0.00182 7.5236 0.0061
area_ismount 1 0.2289 0.0458 25.0235 <.0001
aslflag 1 -0.4061 0.0324 156.7115 <.0001
MRC 1 -0.00308 0.000518 35.4324 <.0001
age 1 -0.00426 0.000457 86.9453 <.0001
isasian 1 0.3287 0.0455 52.0675 <.0001
ishisp 1 0.0683 0.0280 5.9410 0.0148
isafro 1 -0.3722 0.0500 55.3905 <.0001
handset_price 1 -0.00107 0.000233 21.1778 <.0001
mean_ovrmou 1 0.00155 0.000124 155.9362 <.0001
mean_roam 1 0.00421 0.000725 33.8267 <.0001
hnd_new 1 -0.1494 0.0319 21.9503 <.0001
no_child 1 -0.0777 0.0331 5.5112 0.0189
webcap_ind 1 -0.1548 0.0308 25.2265 <.0001
actvsubs 1 -0.1258 0.0243 26.7277 <.0001
uniqsubs 1 0.1546 0.0173 79.6071 <.0001
retcall_ind 1 0.7068 0.0489 209.1117 <.0001
rship_age 1 -0.1507 0.0132 130.8075 <.0001
eqp_age 1 0.2531 0.0137 340.3424 <.0001
totcalls_bkt 1 0.2621 0.0279 88.1317 <.0001
one_model 1 -0.1533 0.0304 25.4935 <.0001
Parameter – the predictor variables in the model and intercept

DF: Degrees of Freedom corresponding to the parameter. Each parameter estimated in the model requires
one DF(which is estimated in this model as one) and defines the Chi-square distribution to test whether the
individual regression coefficient is zero, given the other variables are in the model.
Estimate: These are the binary logit regression estimates for the parameters in the model. The logistic
regression model models the log odds of a positive response (churn probability = 1) as a linear combination
the predictor variables.
Log(P/1-p) = b0+b1*pv+b2*pv+b3*pv…
Log (p/1-p) = -1.1379+(0.000602*avgmou)+(-0.00073*avg3mou)+…….
Interpretation: For a unit change in the predictor variable, the difference in log-odds for a positive outcome
is expected to change by the respective coefficient given the other variables in the model are held constant.
Intercept = -1.1379 (logistic regression estimate when all variables in the model are evaluated at zero.
Standard error – these are the standard errors of the individual regression coefficients.
Pr>ChiSq : Testing the null hypothesis that an individual predictor’s regression coefficient is zero,given the
other predictor variables are in the model. The chi square test statistic is the squared ration of the estimate
to the standard error of the respective predictor. In this model all variables are significant at 0.05 %
Effect Point
Estimate
Limit
Limit
avgmou 1.001 1.000 1.001
avg3mou 0.999 0.999 0.999
PLCD_VCE_bkt 0.891 0.863 0.919
drop_blk_bkt 1.068 1.037 1.099
changemou 1.000 1.000 1.000
drop_vce_Range 1.004 1.001 1.007
drop_vce_Mean 1.005 1.001 1.009
area_ismount 1.257 1.149 1.375
aslflag 0.666 0.625 0.710
MRC 0.997 0.996 0.998
age 0.996 0.995 0.997
isasian 1.389 1.270 1.519
ishisp 1.071 1.013 1.131
isafro 0.689 0.625 0.760
handset_price 0.999 0.998 0.999
mean_ovrmou 1.002 1.001 1.002
mean_roam 1.004 1.003 1.006

hnd_new 0.861 0.809 0.917
no_child 0.925 0.867 0.987
webcap_ind 0.857 0.806 0.910
actvsubs 0.882 0.841 0.925
uniqsubs 1.167 1.128 1.207
retcall_ind 2.028 1.842 2.231
rship_age 0.860 0.838 0.883
eqp_age 1.288 1.254 1.323
totcalls_bkt 1.300 1.230 1.373
one_model 0.858 0.808 0.910
Points Effect: The odds ratio is obtained by exponentiation the estimate. These difference in the log of two
odds is equal to the log of the ration of these two odds. The log of the ration of the two odds is the log odds
ratio.
Interpretation: For a one unit change in the predictor variable, the odds ratio for a positive outcome is
expected to change by the respective coefficient, given the other variables in the model are held constant.
95% Wald Confidence Limits: This is the Wald Confidence Interval (CI) of an individual odds ratio, given the
other predictor’s variables are in the model. For a given predictor variable with a level of 95% confidence,
that we are 95% confident that upon repeated trials, 95% of the CI’s include the true population odds ratio.
If the CI includes one we would fail to reject the null hypothesis that a particular regression coefficient
equals to zero and the odds ration equals one, given the other predictors are in the model.
Pairs 7.4756E8 c 0.633
Percent Concordant: A pair of observations with different observed responses is said to be concordant if
the observation with the lower ordered response value “0” has a lower predicted mean score than the
observation with the higher ordered response value “1”. The concordant percent is 63 %. The higher the
concordant the better the model is. This looks a good model since the concordant value is higher than the
discordant percent.
Percent Discordant: If the observation with the lower ordered response value has a higher predicted mean
score than the observation with the higher ordered response value. The discordant percent is 36.3%
Percent Tied: It is ties since a pair of observations is neither concordant nor discordant.
Pairs: The total number of distinct pairs in which one case has an observed outcome different from the other
member of the pair.
In the response profile table we have 15317 observations with honcomp = 1 and 48806 observations with
honcomp = 0; This the total number of pairs with different coutcomes is 15317*48806 = 747,561,502
(7.4756E8)

Somer’s D: Is used to determine the strength and direction of relation between pairs of variables values
ranges from -1 (all pairs disagree) to 1 (all pairs agree). It is defined as (concordant – discordant)/number
of total number of pairs with different response
Somer’s D = (63-36.3)/100 = 0.267
Gamma : The Good man – Kruskal Gamma method does not penalize for ties on either variable. It value
ranges from -1(no association) to 1 (perfect association). It generally is greater than Somer’s D
Tau-a: Kendall’s Tau-a is a modification of Somer’s D that takes into account the difference between the
number of possible paired observations and the number of paired observations with a different response. It
is defined to be the ratio of the difference between the number of concordant pairs and the number of
discordant pairs to the number of possible pairs. Usually Tau-a is much smaller than Somer’s D since there
would be many paired observations with the same response.
C-C: This is equivalent to the well known measure ROC. C ranges from 0.5 to 1 where 0.5 corresponds to
the model randomly predicting the response, and a 1 corresponds to the model perfectly discriminating the
response.
1 6413 750 731.23 5663 5681.77
2 6412 905 980.93 5507 5431.07
3 6413 1117 1143.90 5296 5269.10
4 6412 1228 1284.09 5184 5127.91
5 6412 1423 1416.34 4989 4995.66
6 6412 1555 1549.39 4857 4862.61
7 6412 1702 1691.44 4710 4720.56
8 6412 1914 1854.05 4498 4557.95
9 6412 2107 2064.37 4305 4347.63
10 6413 2616 2601.28 3797 3811.72
15.6395 8 0.0478
1. Hosmer and Lemeshow Goodness-of-Fit Test –
Pr > Chi-square value is 0.0478 which is higher than that of the training dataset and validation dataset
respectively 0.0430 and 0.0428 as well as lower than 0.05 indicating a good model.
2. Association of Predicted Probabilities and Observed Responses

Percent Concordant is 63 (62.8 of training dataset) and percent discordant is 36.3 (36.5 of the training
dataset). This is an acceptable score. There is an improvement over the training dataset
Prob
Level
Correct
Events
Correct
Non-
events
Incorrect
Events
Incorrect
Non-
events
Percentage
Correct
Sensi-
tivity
Speci-
ficity
False
POS
False
NEG
0.020 15317 0 48806 0 23.9 100.0 0.0 76.1 .
0.040 15313 7 48799 4 23.9 100.0 0.0 76.1 36.4
0.060 15300 80 48726 17 24.0 99.9 0.2 76.1 17.5
0.080 15262 372 48434 55 24.4 99.6 0.8 76.0 12.9
0.100 15161 1195 47611 156 25.5 99.0 2.4 75.8 11.5
0.120 14921 3024 45782 396 28.0 97.4 6.2 75.4 11.6
0.140 14522 5955 42851 795 31.9 94.8 12.2 74.7 11.8
0.160 13870 9766 39040 1447 36.9 90.6 20.0 73.8 12.9
0.180 13020 14161 34645 2297 42.4 85.0 29.0 72.7 14.0
0.200 11967 18992 29814 3350 48.3 78.1 38.9 71.4 15.0
0.220 10673 23896 24910 4644 53.9 69.7 49.0 70.0 16.3
0.240 9246 28702 20104 6071 59.2 60.4 58.8 68.5 17.5
0.260 7758 33074 15732 7559 63.7 50.6 67.8 67.0 18.6
0.280 6338 36968 11838 8979 67.5 41.4 75.7 65.1 19.5
0.300 4929 40169 8637 10388 70.3 32.2 82.3 63.7 20.5
0.320 3733 42750 6056 11584 72.5 24.4 87.6 61.9 21.3
0.340 2770 44637 4169 12547 73.9 18.1 91.5 60.1 21.9
0.360 2057 45943 2863 13260 74.9 13.4 94.1 58.2 22.4
0.380 1521 46808 1998 13796 75.4 9.9 95.9 56.8 22.8
0.400 1122 47432 1374 14195 75.7 7.3 97.2 55.0 23.0
0.420 808 47806 1000 14509 75.8 5.3 98.0 55.3 23.3
0.440 605 48088 718 14712 75.9 3.9 98.5 54.3 23.4
0.460 451 48295 511 14866 76.0 2.9 99.0 53.1 23.5
0.480 340 48425 381 14977 76.0 2.2 99.2 52.8 23.6
0.500 260 48511 295 15057 76.1 1.7 99.4 53.2 23.7
0.520 193 48596 210 15124 76.1 1.3 99.6 52.1 23.7
0.540 140 48642 164 15177 76.1 0.9 99.7 53.9 23.8
0.560 93 48691 115 15224 76.1 0.6 99.8 55.3 23.8
0.580 67 48727 79 15250 76.1 0.4 99.8 54.1 23.8

0.600 47 48742 64 15270 76.1 0.3 99.9 57.7 23.9
0.620 30 48761 45 15287 76.1 0.2 99.9 60.0 23.9
0.640 22 48770 36 15295 76.1 0.1 99.9 62.1 23.9
0.660 16 48777 29 15301 76.1 0.1 99.9 64.4 23.9
0.680 11 48782 24 15306 76.1 0.1 100.0 68.6 23.9
0.700 7 48789 17 15310 76.1 0.0 100.0 70.8 23.9
0.720 6 48791 15 15311 76.1 0.0 100.0 71.4 23.9
0.740 5 48793 13 15312 76.1 0.0 100.0 72.2 23.9
0.760 3 48795 11 15314 76.1 0.0 100.0 78.6 23.9
0.780 2 48797 9 15315 76.1 0.0 100.0 81.8 23.9
0.800 2 48798 8 15315 76.1 0.0 100.0 80.0 23.9
0.820 1 48800 6 15316 76.1 0.0 100.0 85.7 23.9
0.840 1 48801 5 15316 76.1 0.0 100.0 83.3 23.9
0.860 1 48801 5 15316 76.1 0.0 100.0 83.3 23.9
0.880 1 48803 3 15316 76.1 0.0 100.0 75.0 23.9
0.900 1 48803 3 15316 76.1 0.0 100.0 75.0 23.9
0.920 0 48803 3 15317 76.1 0.0 100.0 100.0 23.9
0.940 0 48804 2 15317 76.1 0.0 100.0 100.0 23.9
0.960 0 48804 2 15317 76.1 0.0 100.0 100.0 23.9
0.980 0 48805 1 15317 76.1 0.0 100.0 100.0 23.9
1.000 0 48806 0 15317 76.1 0.0 100.0 . 23.9
/*Ranking the data*/
PROC RANK DATA = DMP OUT = final.DECILE GROUPS = 10 TIES = MEAN ;
VAR p_1;
RANKS decile;
RUN;
PROC SORT DATA = final.DECILE ;
BY DESCENDING p_1 ;
RUN;
EXPORT THE DATA
PROC EXPORT DATA = final.DECILE
OUTFILE = 'Y:ProgramesSAS Graded AssignmentFinal Case StudyGain.xlsx'
DBMS = EXCEL;
RUN;
Accuracy

/*check accuracy**/
Data final.testacc;
set DMP;
if f_churn = 0 and i_churn = 0 then out = "True Negative";
else if f_churn = 1 and i_churn = 1 then out = "True Positive";
else if f_churn = 0 and i_churn = 1 then out = "False Positive";
else if f_churn = 1 and i_churn = 0 then out = "False Negative";
run;
Proc freq;
tables out;
run;
Cut off probability = 0.5
out Frequen
cy
Percent Cumulat
ive
Frequen
cy
Cumulat
ive
Percent
Predicted Actual Expected
False
Negativ
15055 23.48 15055 23.48
64123 Churn = 0 Churn = 1
False
Positiv
293 0.46 15348 23.94
Churn = 0 48511 15057
True
Negativ
e
48513 75.66 63861 99.59
Churn = 1 260 295
True
Positive
262 0.41 64123 100
True NegativeTrue PositiveFalse Positve False Negative
76% 0.46% 23.48% 0.41%
P_Churn = 0 and A_Churn = 0 then True Negative
%yes captured correctly 0.46% P_Churn = 1 and A_Churn = 1 then True Positive
%of actual nos missclassfied as yes 76% P_Churn = 0 and A_Churn = 1 then False Positive
%of Nos captured correctly 0.41% P_Churn = 1 and A_Churn = 0 then False Negative
% of actual yes missclassified as Nos = 24%
Frequency Missing = 2162
out Frequency Percent Cumulative Frequency Cumulative Percent
False Negativ 15055 23.48 15055 23.48
False Positiv 293 0.46 15348 23.94
True Negative 48513 75.66 63861 99.59
True Positive 262 0.41 64123 100.00
Frequency Missing = 2162
Gain Chart:

Sum of churn Average of P_1 Count of P_1 Sum of P_1_2 Expected ChurnCumulative expcted churnPredicted churnCumulative predicted churnLift by mode
2616 0.405634825 6412 2600.930497 1531.7 10% 0.169807 17% 1.707906
2107 0.321959928 6412 2064.407059 1531.7 20% 0.134779 30% 1.375596
1914 0.289155258 6413 1854.352672 1531.7 30% 0.121065 43% 1.249592
1702 0.26379269 6412 1691.43873 1531.7 40% 0.110429 54% 1.111184
1555 0.241637261 6413 1549.619755 1531.7 50% 0.10117 64% 1.015212
1423 0.220886158 6412 1416.322048 1531.7 60% 0.092467 73% 0.929033
1228 0.200260533 6412 1284.070538 1531.7 70% 0.083833 81% 0.801724
1117 0.178368073 6413 1143.874452 1531.7 80% 0.07468 89% 0.729255
906 0.152979779 6412 980.9063447 1531.7 90% 0.06404 95% 0.5915
749 0.114019516 6412 731.0931359 1531.7 100% 0.047731 100% 0.488999
15317 0.238869286 64123 15317.01523 15317
Cumulative Expected Cumulative Predicted
10% 17%
20% 30%
30% 43%
40% 54%
50% 64%
60% 73%
70% 81%
80% 89%
90% 95%
100% 100%
From the gain chart, the model appears to be a better predictor than a standard average probability
calculation.
At 50% probability level, the model has correctly predicted 260 churn of 15057 actual, which is only a
1.7% success rate. But it has correctly predicted 48513 out of 48775 non churn, which is 99.5% success
rate.

At 50% probability level, the model has correctly predicted 295 churn of 15057 actual, which is very
healthy a 1.9% success rate. It has correctly predicted 48513 out of 48808 non churn, which is 99.4%
success rate. This probability level can be chosen as an appropriate level for prediction.

Final Case Study Churn (Autosaved)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Final Case Study Churn (Autosaved)

Similar to Final Case Study Churn (Autosaved) (20)

Final Case Study Churn (Autosaved)