SlideShare a Scribd company logo
1 of 17
Download to read offline
Data Analytics Course
Logistic regression(Version-1)
Venkat Reddy
Data Analysis Course
•   Data analysis design document
•   Introduction to statistical data analysis
•   Descriptive statistics
•   Data exploration, validation & sanitization
•   Probability distributions examples and applications




                                                                Venkat Reddy
                                                          Data Analysis Course
•   Simple correlation and regression analysis
•   Multiple liner regression analysis

• Logistic regression analysis
•   Testing of hypothesis
•   Clustering and decision trees
•   Time series analysis and forecasting
•   Credit Risk Model building-1                                 2
•   Credit Risk Model building-2
Note
• This presentation is just class notes. The course notes for Data
  Analysis Training is by written by me, as an aid for myself.
• The best way to treat this is as a high-level summary; the
  actual session went more in depth and contained other




                                                                           Venkat Reddy
                                                                     Data Analysis Course
  information.
• Most of this material was written as informal notes, not
  intended for publication
• Please send questions/comments/corrections to
  venkat@trenwiseanalytics.com or 21.venkat@gmail.com
• Please check my website for latest version of this document
                                         -Venkat Reddy                      3
Contents
•   Need of logistic regression?
•   The logistic regression model
•   Meaning of beta
•   Goodness of fit




                                           Venkat Reddy
                                    Data Analytics Course
•   Multicollinearity
•   Prediction
•   Stepwise regression




                                            4
What is the need of logistic regression?
• Remember the burger example? Number of burgers sold vs
  number of visitors?
• What if we are trying to find whether a person is going to buy
  a burger or not, based on person’s age.
• Download the data from here & fit a linear regression line.




                                                                          Venkat Reddy
                                                                   Data Analytics Course
• What is R squared
• If age increases, what happens to burger sales?
• 25 years old person, does he buy a burger?
• Can we fit a liner regression line to this data?


                                                                           5
Real-life Challenges
•   Gaming - Win vs Loss
•   Sales - Buying vs Not buying
•   Marketing – Response vs No Response
•   Credit card & Loans – Default vs Non Default




                                                          Venkat Reddy
                                                   Data Analytics Course
•   Operations – Attrition vs Retention
•   Websites – Click vs No click
•   Fraud identification –Fraud vs Non Frau
•   Healthcare –Cure vs No Cure



                                                           6
Bought burger?




                                            No




           Yes
      0
      10
      20
                                                 Why not liner ?




      30

Age
      40
      50
      60
      70
      80




                    Data Analytics Course
7




                           Venkat Reddy
Any better line?
                        1.2



                  No     1



                        0.8




                                                                                       Venkat Reddy
                                                                                Data Analytics Course
 Bought burger?




                        0.6



                        0.4



                        0.2



                  Yes    0
                              0   10   20   30         40   50   60   70   80

                                                 Age                                    8
Logistic function
• We want a model that predicts probabilities between 0 and 1, that
  is, S-shaped.
• There are lots of S-shaped curves. We use the logistic model:
• Probability = 1/[1 + exp(0+ 1X) ] or loge[P/(1-P)] = 0+ 1X
• The function on left, loge[P/(1-P)], is called the logistic function.




                                                                                 Venkat Reddy
                                                                          Data Analytics Course
        1.0
                           e  x
              P( y x ) 
        0.8              1  e  x

        0.6


        0.4


        0.2                                                                       9

        0.0
                                       x
Logistic regression function
Logistic regression models the logit of the outcome
=Natural logarithm of the odds of the outcome
=ln(Probability of the outcome (p)/Probability of not having the outcome (1-p))


                           P 




                                                                                         Venkat Reddy
                                                                                  Data Analytics Course
                       ln         α  β1x1  β2x2  ... βixi
                           1- P 


 = log odds ratio associated with predictors

    
e       = odds ratio


                                                                                      10
Curve fitting using MLE
• Remember OLS for linear models?
• Maximum Likelihood Estimator:
  • starts with arbitrary values of the regression coefficients and
    constructs an initial model for predicting the observed data.
  • Then evaluates errors in such prediction and changes the




                                                                               Venkat Reddy
                                                                        Data Analytics Course
    regression coefficients so as make the likelihood of the observed
    data greater under the new model.
  • Repeats until the model converges, meaning the differences
    between the newest model and the previous model are trivial.
• The idea is that you “find and report as statistics” the
  parameters that are most likely to have produced your data.

                                                                            11
Meaning of beta
• The betas themselves are log-odds ratios. Negative values
  indicate a negative relationship between the probability of
  "success" and the independent variable; positive values
  indicate a positive relationship.
• Increase in log-odds for a one unit increase in xi with all the




                                                                           Venkat Reddy
                                                                    Data Analytics Course
  other xi s constant
• Measures association between xi and log-odds adjusted for all
  other xi


                  P 
              ln         α  β1x1  β2x2  ... βixi
                  1- P 
                                                                        12
Goodness of fit for a logistic regression
Chi-Square
• The Chi-Square statistic and associated p-value (Sig.) tests whether
  the model coefficients as a group equal zero.
• Larger Chi-squares and smaller p-values indicate greater confidence
  in rejected the null hypothesis of no impact




                                                                                 Venkat Reddy
                                                                          Data Analytics Course
Percent Correct Predictions
• The "Percent Correct Predictions" statistic assumes that if the
  estimated p is greater than or equal to .5 then the event is expected
  to occur and not occur otherwise.
• By assigning these probabilities 0s and 1s and comparing these to
  the actual 0s and 1s, the % correct Yes, % correct No, and overall %
  correct scores are calculated.
• Note: subgroups for the % correctly predicted is also important,            13
  especially if most of the data are 0s or 1s
Goodness of fit for a logistic regression
• Hosmer and Lemeshow Goodness-of-Fit Test
  • Chisquare test for Observed and expected bad by diving the variable
    into groups
  • The test assesses whether or not the observed event rates match
    expected event rates in subgroups of the model population.
  • The Hosmer–Lemeshow test specifically identifies subgroups as the




                                                                                   Venkat Reddy
                                                                            Data Analytics Course
    deciles of fitted risk values. Models for which expected and observed
    event rates in subgroups are similar are called well calibrated.
• Other methods
  •   ROC curves – How to interpret ROC curve?
  •   Somers' D
  •   Gamma
  •   Tau-a
  •   C
                                                                                14
  •   More than a dozen “R2”-type summaries
Lab-Logistic Regression
• Fit a logistic regression line for burger example
• If age increases, what happens to burger sales?
• 25 years old person, does he buy a burger? What is the
  probability (or what are the odds of him buying the burger)




                                                                         Venkat Reddy
                                                                  Data Analytics Course
• How good is the regression line for burger example
• If age difference is 20 years, what is the difference in odds
• Write the below code to see
Proc logistic data=burger2
PLOTS(ONLY)=(ROC(ID=prob) EFFECT) descending;
model buy=Age / lackfit;
run;
                                                                      15
Lab: Logistic regression
• Build a logistic regression line on credit card
  data(cred.training)
• Problem with monthly income, number of open cred limits,
  number of dependents? Drop them & build for the rest of
  variables




                                                                         Venkat Reddy
                                                                  Data Analytics Course
• How good is the fit?
• What is the impact of each variable
• Is there any interdependency between variables
• Remove 60 plus delinquency & 90 plus delinquency variables
  & build the model again
• How to identify & use the variables with data related issues?
  Data validation & cleaning                                          16
Venkat Reddy Konasani
Manager at Trendwise Analytics
venkat@TrendwiseAnalytics.com
21.venkat@gmail.com




                                          Venkat Reddy
                                    Data Analysis Course
www.TrendwiseAnalytics.com/venkat
+91 9886 768879




                                        17

More Related Content

What's hot

Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regressionJames Neill
 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regressiondessybudiyanti
 
Linear regression
Linear regressionLinear regression
Linear regressionTech_MX
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsAnirudha si
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reductionmrizwan969
 
Least Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaLeast Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaEdureka!
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spssDr Nisha Arora
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionKaushik Rajan
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsankit_ppt
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IJames Neill
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examplesGaurav Kamboj
 
Linear regression without tears
Linear regression without tearsLinear regression without tears
Linear regression without tearsAnkit Sharma
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis pptElkana Rorio
 
Linear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaLinear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaEdureka!
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear RegressionAndrew Ferlitsch
 

What's hot (20)

Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationships
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Least Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaLeast Squares Regression Method | Edureka
Least Squares Regression Method | Edureka
 
Linear regression
Linear regressionLinear regression
Linear regression
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic Regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metrics
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA I
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examples
 
Linear regression without tears
Linear regression without tearsLinear regression without tears
Linear regression without tears
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
 
Linear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaLinear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | Edureka
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
Covariance vs Correlation
Covariance vs CorrelationCovariance vs Correlation
Covariance vs Correlation
 

Similar to Logistic regression

1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Content standards math a
Content standards math aContent standards math a
Content standards math abgdance3
 
Introduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionIntroduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionGirish Gore
 
Algorithm evaluation using item response theory
Algorithm evaluation using item response theoryAlgorithm evaluation using item response theory
Algorithm evaluation using item response theoryCSIRO
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
Dive into the Data
Dive into the DataDive into the Data
Dive into the Datadr_jp_ebejer
 
Information Needs for Software Development Analytics
Information Needs for Software Development AnalyticsInformation Needs for Software Development Analytics
Information Needs for Software Development AnalyticsRay Buse
 
Text Mining Summit 2009 V4 (For General Presentations)
Text Mining Summit 2009 V4 (For General Presentations)Text Mining Summit 2009 V4 (For General Presentations)
Text Mining Summit 2009 V4 (For General Presentations)R. Scott Evans, PhD
 
BSSML17 - Logistic Regressions
BSSML17 - Logistic RegressionsBSSML17 - Logistic Regressions
BSSML17 - Logistic RegressionsBigML, Inc
 
Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - SlidesAditya Joshi
 
Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients Vatsal N Shah
 
Aspiring Minds | Automata
Aspiring Minds | Automata Aspiring Minds | Automata
Aspiring Minds | Automata Aspiring Minds
 
Looking for Relationships in Data in IBM SPSS Modeler.pptx
Looking for Relationships in Data in IBM SPSS Modeler.pptxLooking for Relationships in Data in IBM SPSS Modeler.pptx
Looking for Relationships in Data in IBM SPSS Modeler.pptxVersion 1 Analytics
 
Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data ScientistsMitch Sanders
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data ScienceMutia Ulfi
 

Similar to Logistic regression (20)

Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Data analysis Design Document
Data analysis Design DocumentData analysis Design Document
Data analysis Design Document
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Content standards math a
Content standards math aContent standards math a
Content standards math a
 
Introduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionIntroduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regression
 
Algorithm evaluation using item response theory
Algorithm evaluation using item response theoryAlgorithm evaluation using item response theory
Algorithm evaluation using item response theory
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Dive into the Data
Dive into the DataDive into the Data
Dive into the Data
 
Information Needs for Software Development Analytics
Information Needs for Software Development AnalyticsInformation Needs for Software Development Analytics
Information Needs for Software Development Analytics
 
Text Mining Summit 2009 V4 (For General Presentations)
Text Mining Summit 2009 V4 (For General Presentations)Text Mining Summit 2009 V4 (For General Presentations)
Text Mining Summit 2009 V4 (For General Presentations)
 
BSSML17 - Logistic Regressions
BSSML17 - Logistic RegressionsBSSML17 - Logistic Regressions
BSSML17 - Logistic Regressions
 
Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - Slides
 
Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
Aspiring Minds | Automata
Aspiring Minds | Automata Aspiring Minds | Automata
Aspiring Minds | Automata
 
Looking for Relationships in Data in IBM SPSS Modeler.pptx
Looking for Relationships in Data in IBM SPSS Modeler.pptxLooking for Relationships in Data in IBM SPSS Modeler.pptx
Looking for Relationships in Data in IBM SPSS Modeler.pptx
 
Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data Scientists
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Ati course contents
Ati course contentsAti course contents
Ati course contents
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data Science
 

More from Venkata Reddy Konasani

Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Venkata Reddy Konasani
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniquesVenkata Reddy Konasani
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Venkata Reddy Konasani
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced featuresVenkata Reddy Konasani
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitizationVenkata Reddy Konasani
 

More from Venkata Reddy Konasani (20)

Transformers 101
Transformers 101 Transformers 101
Transformers 101
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
 
Decision tree
Decision treeDecision tree
Decision tree
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS
 
SAS basics Step by step learning
SAS basics Step by step learningSAS basics Step by step learning
SAS basics Step by step learning
 
Testing of hypothesis case study
Testing of hypothesis case study Testing of hypothesis case study
Testing of hypothesis case study
 
L101 predictive modeling case_study
L101 predictive modeling case_studyL101 predictive modeling case_study
L101 predictive modeling case_study
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
 
Online data sources for analaysis
Online data sources for analaysis Online data sources for analaysis
Online data sources for analaysis
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 
ARIMA
ARIMA ARIMA
ARIMA
 

Recently uploaded

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 

Recently uploaded (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 

Logistic regression

  • 1. Data Analytics Course Logistic regression(Version-1) Venkat Reddy
  • 2. Data Analysis Course • Data analysis design document • Introduction to statistical data analysis • Descriptive statistics • Data exploration, validation & sanitization • Probability distributions examples and applications Venkat Reddy Data Analysis Course • Simple correlation and regression analysis • Multiple liner regression analysis • Logistic regression analysis • Testing of hypothesis • Clustering and decision trees • Time series analysis and forecasting • Credit Risk Model building-1 2 • Credit Risk Model building-2
  • 3. Note • This presentation is just class notes. The course notes for Data Analysis Training is by written by me, as an aid for myself. • The best way to treat this is as a high-level summary; the actual session went more in depth and contained other Venkat Reddy Data Analysis Course information. • Most of this material was written as informal notes, not intended for publication • Please send questions/comments/corrections to venkat@trenwiseanalytics.com or 21.venkat@gmail.com • Please check my website for latest version of this document -Venkat Reddy 3
  • 4. Contents • Need of logistic regression? • The logistic regression model • Meaning of beta • Goodness of fit Venkat Reddy Data Analytics Course • Multicollinearity • Prediction • Stepwise regression 4
  • 5. What is the need of logistic regression? • Remember the burger example? Number of burgers sold vs number of visitors? • What if we are trying to find whether a person is going to buy a burger or not, based on person’s age. • Download the data from here & fit a linear regression line. Venkat Reddy Data Analytics Course • What is R squared • If age increases, what happens to burger sales? • 25 years old person, does he buy a burger? • Can we fit a liner regression line to this data? 5
  • 6. Real-life Challenges • Gaming - Win vs Loss • Sales - Buying vs Not buying • Marketing – Response vs No Response • Credit card & Loans – Default vs Non Default Venkat Reddy Data Analytics Course • Operations – Attrition vs Retention • Websites – Click vs No click • Fraud identification –Fraud vs Non Frau • Healthcare –Cure vs No Cure 6
  • 7. Bought burger? No Yes 0 10 20 Why not liner ? 30 Age 40 50 60 70 80 Data Analytics Course 7 Venkat Reddy
  • 8. Any better line? 1.2 No 1 0.8 Venkat Reddy Data Analytics Course Bought burger? 0.6 0.4 0.2 Yes 0 0 10 20 30 40 50 60 70 80 Age 8
  • 9. Logistic function • We want a model that predicts probabilities between 0 and 1, that is, S-shaped. • There are lots of S-shaped curves. We use the logistic model: • Probability = 1/[1 + exp(0+ 1X) ] or loge[P/(1-P)] = 0+ 1X • The function on left, loge[P/(1-P)], is called the logistic function. Venkat Reddy Data Analytics Course 1.0 e  x P( y x )  0.8 1  e  x 0.6 0.4 0.2 9 0.0 x
  • 10. Logistic regression function Logistic regression models the logit of the outcome =Natural logarithm of the odds of the outcome =ln(Probability of the outcome (p)/Probability of not having the outcome (1-p))  P  Venkat Reddy Data Analytics Course ln    α  β1x1  β2x2  ... βixi  1- P   = log odds ratio associated with predictors  e = odds ratio 10
  • 11. Curve fitting using MLE • Remember OLS for linear models? • Maximum Likelihood Estimator: • starts with arbitrary values of the regression coefficients and constructs an initial model for predicting the observed data. • Then evaluates errors in such prediction and changes the Venkat Reddy Data Analytics Course regression coefficients so as make the likelihood of the observed data greater under the new model. • Repeats until the model converges, meaning the differences between the newest model and the previous model are trivial. • The idea is that you “find and report as statistics” the parameters that are most likely to have produced your data. 11
  • 12. Meaning of beta • The betas themselves are log-odds ratios. Negative values indicate a negative relationship between the probability of "success" and the independent variable; positive values indicate a positive relationship. • Increase in log-odds for a one unit increase in xi with all the Venkat Reddy Data Analytics Course other xi s constant • Measures association between xi and log-odds adjusted for all other xi  P  ln    α  β1x1  β2x2  ... βixi  1- P  12
  • 13. Goodness of fit for a logistic regression Chi-Square • The Chi-Square statistic and associated p-value (Sig.) tests whether the model coefficients as a group equal zero. • Larger Chi-squares and smaller p-values indicate greater confidence in rejected the null hypothesis of no impact Venkat Reddy Data Analytics Course Percent Correct Predictions • The "Percent Correct Predictions" statistic assumes that if the estimated p is greater than or equal to .5 then the event is expected to occur and not occur otherwise. • By assigning these probabilities 0s and 1s and comparing these to the actual 0s and 1s, the % correct Yes, % correct No, and overall % correct scores are calculated. • Note: subgroups for the % correctly predicted is also important, 13 especially if most of the data are 0s or 1s
  • 14. Goodness of fit for a logistic regression • Hosmer and Lemeshow Goodness-of-Fit Test • Chisquare test for Observed and expected bad by diving the variable into groups • The test assesses whether or not the observed event rates match expected event rates in subgroups of the model population. • The Hosmer–Lemeshow test specifically identifies subgroups as the Venkat Reddy Data Analytics Course deciles of fitted risk values. Models for which expected and observed event rates in subgroups are similar are called well calibrated. • Other methods • ROC curves – How to interpret ROC curve? • Somers' D • Gamma • Tau-a • C 14 • More than a dozen “R2”-type summaries
  • 15. Lab-Logistic Regression • Fit a logistic regression line for burger example • If age increases, what happens to burger sales? • 25 years old person, does he buy a burger? What is the probability (or what are the odds of him buying the burger) Venkat Reddy Data Analytics Course • How good is the regression line for burger example • If age difference is 20 years, what is the difference in odds • Write the below code to see Proc logistic data=burger2 PLOTS(ONLY)=(ROC(ID=prob) EFFECT) descending; model buy=Age / lackfit; run; 15
  • 16. Lab: Logistic regression • Build a logistic regression line on credit card data(cred.training) • Problem with monthly income, number of open cred limits, number of dependents? Drop them & build for the rest of variables Venkat Reddy Data Analytics Course • How good is the fit? • What is the impact of each variable • Is there any interdependency between variables • Remove 60 plus delinquency & 90 plus delinquency variables & build the model again • How to identify & use the variables with data related issues? Data validation & cleaning 16
  • 17. Venkat Reddy Konasani Manager at Trendwise Analytics venkat@TrendwiseAnalytics.com 21.venkat@gmail.com Venkat Reddy Data Analysis Course www.TrendwiseAnalytics.com/venkat +91 9886 768879 17