SlideShare a Scribd company logo
1 of 69
Download to read offline
Correlation (Pearson & Spearman)
       & Linear Regression




                  Azmi Mohd Tamil
Key Concepts


 Correlation as a statistic
 Positive and Negative Bivariate Correlation
 Range Effects
 Outliers
 Regression & Prediction
 Directionality Problem ( & cross-lagged panel)
 Third Variable Problem (& partial correlation)
Assumptions

 Related pairs
 Scale of measurement. For Pearson, data
 should be interval or ratio in nature.
 Normality
 Linearity
 Homocedasticity
Example of Non-Linear Relationship
Yerkes-Dodson Law – not for correlation


     Better




Performance



     Worse


              Low
                          Stress          High
Correlation



     X         Y
    Stress    Illness
Correlation – parametric & non-para

  2 Continuous Variables - Pearson
      linear relationship
      e.g., association between height and weight



  1 Continuous, 1 Categorical Variable
(Ordinal) Spearman/Kendall
    –e.g., association between Likert Scale on work
    satisfaction and work output
    –pain intensity (no, mild, moderate, severe) and
                                         severe
    dosage of pethidine
Pearson Correlation


  2 Continuous Variables
  –   linear relationship
  –   e.g., association between height and weight, +

  measures the degree of linear association
  between two interval scaled variables
  analysis of the relationship between two
  quantitative outcomes, e.g., height and weight,
How to calculate r?




                      df = np - 2
How to calculate r?
                          a




 b                    c
Example


•∑ x = 4631          ∑ x2 = 688837
•∑ y = 2863          ∑ y2 = 264527
•∑ xy = 424780        n = 32

•a=424780-(4631*2863/32)=10,450.22
•b=688837-46312/32=18,644.47
•c=264527-28632/32=8,377.969
•r=a/(b*c)0.5
  =10,450.22/(18,644.47*83,77.969)0.5
  =0.836144


•t= 0.836144*((32-2)/(1-0.8361442))0.5
t = 8.349436 & d.f. = n - 2 = 30,
p < 0.001
We refer to Table A3.
so we use df=30 .
t = 8.349436 > 3.65 (p=0.001)

Therefore if t=8.349436, p<0.001.
Correlation


  Two pieces of information:
   The strength of the relationship
   The direction of the relationship
Strength of relationship

 r lies between -1 and 1. Values near 0
 means no (linear) correlation and values
 near ± 1 means very strong correlation.



       -1.0             0.0            +1.0
     Strong Negative   No Rel.   Strong Positive
How to interpret the value of r?
Correlation ( + direction)

 Positive correlation:
 high values of one
 variable associated with
 high values of the other
 Example: Higher
 course entrance exam
 scores are associated
 with better course
 grades during the final    Positive and Linear
 exam.
Correlation ( - direction)

 Negative correlation: The
 negative sign means that
 the two variables are
 inversely related, that is,
 as one variable increases
 the other variable
 decreases.
 Example: Increase in
 body mass index is
 associated with reduced Negative and Linear
 effort tolerance.
Pearson’s r

 A 0.9 is a very strong positive association (as one
 variable rises, so does the other)
 A -0.9 is a very strong negative association
  (as one variable rises, the other falls)



  r=0.9 has nothing to do with 90%
  r=correlation coefficient
Coefficient of Determination
Defined

 Pearson’s r can be squared , r 2, to derive a
 coefficient of determination.


 Coefficient of determination – the portion of
 variability in one of the variables that can be
 accounted for by variability in the second
 variable
Coefficient of Determination

 Pearson’s r can be squared , r 2, to derive a
 coefficient of determination.

 Example of depression and CGPA
  –   Pearson’s r shows negative correlation, r=-0.5
  –   r2=0.25

  –   In this example we can say that 1/4 or 0.25 of the variability
      in CGPA scores can be accounted for by depression
      (remaining 75% of variability is other factors, habits, ability,
      motivation, courses studied, etc)
Coefficient of Determination
and Pearson’s r

 Pearson’s r can be squared , r 2

 If r=0.5, then r2=0.25
 If r=0.7 then r2=0.49

 Thus while r=0.5 versus 0.7 might not look so
 different in terms of strength, r2 tells us that r=0.7
 accounts for about twice the variability relative to
 r=0.5
A study was done to find the association between
                                  the mothers’ weight and their babies’ birth weight.
                                  The following is the scatter diagram showing the
                                  relationship between the two variables.
                     5.00
                                                                                                 The coefficient of
                                                                                                 correlation (r) is
                     4.00
                                                                                                 0.452
                                                                                                 The coefficient of
Baby's Birthweight




                                                                                                 determination (r2)
                     3.00                                                                        is 0.204
                                                                                                 Twenty percent of
                     2.00
                                                                                                 the variability of
                                                                                                 the babies’ birth
                                                                                                 weight is
                     1.00                                                                        determined by the
                                                                          R Sq Linear = 0.204    variability of the
                                                                                                 mothers’ weight.
                     0.00
                            0.0       20.0    40.0          60.0   80.0                  100.0

                                                 Mother's Weight
Causal Silence:
Correlation Does Not Imply Causality

Causality – must demonstrate that variance in one
  variable can only be due to influence of the other
  variable

   Directionality of Effect Problem


   Third Variable Problem
CORRELATION DOES NOT MEAN
CAUSATION

 A high correlation does not give us the evidence to make a cause-
 and-effect statement.
 A common example given is the high correlation between the cost of
 damage in a fire and the number of firemen helping to put out the
 fire.
 Does it mean that to cut down the cost of damage, the fire
 department should dispatch less firemen for a fire rescue!
 The intensity of the fire that is highly correlated with the cost of
 damage and the number of firemen dispatched.
 The high correlation between smoking and lung cancer. However,
 one may argue that both could be caused by stress; and smoking
 does not cause lung cancer.
 In this case, a correlation between lung cancer and smoking may be
 a result of a cause-and-effect relationship (by clinical experience +
 common sense?). To establish this cause-and-effect relationship,
 controlled experiments should be performed.
More
Big Fire    Firemen
            Sent




           More
           Damage
Directionality of Effect Problem



          X                        Y

          X                        Y

          X                        Y
Directionality of Effect Problem


       X                      Y
      Class                 Higher
                            Higher
    Attendance              Grades

       X                     Y
      Class                 Higher
                            Higher
    Attendance              Grades
Directionality of Effect Problem


          X                                Y
  Aggressive Behavior               Viewing Violent TV
                                    Viewing Violent TV



         X                                 Y
  Aggressive Behavior               Viewing Violent TV
                                     Viewing Violent TV



   Aggressive children may prefer violent programs or
   Violent programs may promote aggressive behavior
Methods for Dealing with
Directionality

 Cross-Lagged Panel design
  –   A type of longitudinal design
  –   Investigate correlations at several points in time
  –   STILL NOT CAUSAL

  Example next page
Cross-Lagged Panel

Pref for violent TV    .05             Pref for violent TV
3rd grade                              13th grade



.21              .31           .01               -.05
       TV to later           Aggression to later TV
       aggression
Aggression                              Aggression
3rd grade              .38              13th grade
Third Variable Problem




      X                  Y


                    Z
Class Exercise

              Identify the

             third variable

      that influences both X and Y
Third Variable Problem


                     +
   Class                      GPA
Attendance




                 Motivation
Third Variable Problem


                     +
Number of                     Crime
 Mosques                       Rate




                   Size of
                 Population
Third Variable Problem


                     +
Ice Cream                     Number of
Consumed                      Drownings




                Temperature
Third Variable Problem


                      +
Reading Score              Reading
                           Comprehension




                      IQ
Data Preparation - Correlation

 Screen data for outliers and ensure that
 there is evidence of linear relationship, since
 correlation is a measure of linear
 relationship.
 Assumption is that each pair is bivariate
 normal.
 If not normal, then use Spearman.
Correlation In SPSS

 For this exercise, we will be
 using the data from the CD,
 under Chapter 8, korelasi.sav
 This data is a subset of a case-
 control study on factors
 affecting SGA in Kelantan.
 Open the data & select -
 >Analyse
    >Correlate
       >Bivariate…
Correlation in SPSS

 We want to see whether
 there is any association
 between the mothers’ weight
 and the babies’weight. So
 select the variables (weight2
 & birthwgt) into ‘Variables’.
 Select ‘Pearson’ Correlation
 Coefficients.
 Click the ‘OK’ button.
Correlation Results

                                  Correlations

                                                 WEIGHT2 BIRTHWGT
         WEIGHT2       Pearson Correlation              1     .431*
                       Sig. (2-tailed)                  .     .017
                       N                               30       30
         BIRTHWGT      Pearson Correlation           .431*       1
                       Sig. (2-tailed)               .017        .
                       N                               30       30
          *. Correlation is significant at the 0.05 level (2-tailed).


 The r = 0.431 and the p value is significant at
 0.017.
 The r value indicates a fair and positive linear
 relationship.
Scatter Diagram
              3.6
              3.4
              3.2
              3.0
              2.8                                                                   If the correlation is
              2.6
                                                                                    significant, it is best to
              2.4
              2.2
                                                                                    include the scatter
              2.0                                                                   diagram.
              1.8
              1.6
                                                                                    The r square indicated
              1.4                                                                   mothers’ weight
              1.2                                                                   contribute 19% of the
BIRTHWEIGHT




              1.0
               .8
                                                                                    variability of the babies’
               .6                                                                   weight.
               .4
               .2
              0.0                                                           Rsq = 0.1861
                    0    10   20   30   40   50   60   70   80   90   100


                    MOTHERS' WEIGHT
Spearman/Kendall Correlation

 To find correlation between a related pair of
 continuous data (not normally distributed); or
 Between 1 Continuous, 1 Categorical
 Variable (Ordinal)
      e.g., association between Likert Scale on work
      satisfaction and work output.
Spearman's rank correlation coefficient

 In statistics, Spearman's rank correlation coefficient, named
 for Charles Spearman and often denoted by the Greek letter ρ
 (rho), is a non-parametric measure of correlation – that is, it
 assesses how well an arbitrary monotonic function could
 describe the relationship between two variables, without
 making any assumptions about the frequency distribution of the
 variables. Unlike the Pearson product-moment correlation
 coefficient, it does not require the assumption that the
 relationship between the variables is linear, nor does it require
 the variables to be measured on interval scales; it can be used
 for variables measured at the ordinal level.
Example
•Correlation between
sphericity and visual
acuity.
•Sphericity of the eyeball
is continuous data while
visual acuity is ordinal
data (6/6, 6/9, 6/12,
6/18, 6/24), therefore
Spearman correlation is
the most suitable.
•The Spearman rho
correlation coefficient is
-0.108 and p is 0.117. P
is larger than 0.05,
therefore there is no
significant association
between sphericity and
visual acuity.
Example 2
•- Correlation between glucose level and systolic
blood pressure.
•Based on the data given, prepare the following
table;
•For every variable, sort the data by rank. For ties,
take the average.
•Calculate the difference of rank, d for every pair
and square it. Take the total.
•Include the value into the following formula;



•∑ d2 = 4921.5      n = 32
•Therefore rs = 1-((6*4921.5)/(32*(322-1)))
                = 0.097966.
This is the value of Spearman correlation
coefficient (or ϒ).
•Compare the value against the Spearman table;
•p is larger than 0.05.
•Therefore there is no association between systolic
BP and blood glucose level.
Spearman’s
table


•0.097966 is the value
of Spearman correlation
coefficient (or ρ).
•Compare the value
against the Spearman
table;
•0.098 < 0.364 (p=0.05)
•p is larger than 0.05.
•Therefore there is no
association between
systolic BP and blood
glucose level.
SPSS Output

                            Correlations

                                                   GLU      BPS1
 Spearman's rho   GLU    Correlation Coefficient    1.000     .097
                         Sig. (2-tailed)                .     .599
                         N                             32       32
                  BPS1   Correlation Coefficient     .097    1.000
                         Sig. (2-tailed)             .599        .
                         N                             32       32
Linear Regression
Regression



     X                      Y
   Predictor              Criterion
   Variable               Variable



Predicting Y based on a given value of X
Regression and Prediction



    X                       Y
   Height               Weight
REGRESSION


 Regression
 –   one variable is a direct cause of the other
 –   or if the value of one variable is changed, then as
     a direct consequence the other variable also
     change
 –   or if the main purpose of the analysis is prediction
     of one variable from the other
REGRESSION


 –   Regression: the dependence of dependent
     variable (outcome) on the independant variable
     (RF)
 –   relationship is summarised by a regression
     equation.
 –   y = a + bx
REGRESSION


 Regression
 –   The regression line
        x - independent variable - horizontal axis
        y - dependent variable - vertical axis
 –   Regression equation
      y = a + bx
         – a = intercept at y axis
         – b = regression coefficient

 –   Test of significance - b is not equal to zero
REGRESSION


 Regression
The Least Squares (Regression)
Line

 A good line is one that minimizes
 the sum of squared differences between the
 points and the line.
The Least Squares (Regression)
Line
      Sum of squared differences = (2 - 1)2 + (4 - 2)2 +(1.5 - 3)2 + (3.2 - 4)2 = 6.89
      Sum of squared differences = (2 -2.5)2 + (4 - 2.5)2 + (1.5 - 2.5)2 + (3.2 - 2.5)2 = 3.99
                                                            Let us compare two lines
 4                      (2,4)
                                                            The second line is horizontal
                                                        (4,3.2)
  3
2.5
  2
       (1,2)                              (3,1.5)
 1                                                           The smaller the sum of
                                                             squared differences
                                                             the better the fit of the
               1        2             3             4
                                                             line to the data.
Linear Regression

 Come up with a Linear Regression Model to predict
 a continous outcome with a continous risk factor, i.e.
 predict BP with age. Usually LR is the next step after
 correlation is found to be strongly significant.
 y = a + bx; a = y - bx
 –    e.g. BP = constant (a) + regression coefficient (b) * age

 b=
Testing for significance

test whether the slope is significantly different from zero by:


                      t = b/SE(b)
y




y fit
Regression Line

  In a scatterplot showing the association
  between 2 variables, the regression line is
  the “best-fit” line and has the formula
y=a + bx
a=place where line crosses Y axis
b=slope of line (rise/run)
Thus, given a value of X, we can predict a
  value of Y
Regression Graphic – Regression Line
                                120



                                100



                                 80



                                 60
       Total ball toss points




    y’=47
                                 40



    y’=20                        20



                                  0                                                              Rsq = 0.6031
                                      8   10    12    14     16   18    20   22     24      26


                                      Distance from target    if x=18             if x=24
                                                              then…               then…
Regression Equation

 y’= a + bx
  – y’ = predicted value of y
  – b = slope of the line
  – x = value of x that you enter
  – a = y-intercept (where line crosses y axis)
 In this case….
  – y’ = 125.401 - 4.263(x)


 So if the distance is 20 feet
  – y’ = -4.263(20) + 125.401
                20
  – y’ = -85.26 + 125.401
  – y’ = 40.141
Regression Line
(Defined)

Regression line is the line where absolute values of
  vertical distances between points on scatterplot and
  a line form a minimum sum (relative to other possible
  lines)




     Positive and Linear Negative and Linear
Example

      b=

∑x = 6426          ∑ x2 = 1338088
∑y = 4631          ∑ xy = 929701
n = 32
b = (929701-(6426*4631/32))/
(1338088-(64262/32)) = -0.00549
Mean x = 6426/32=200.8125
mean y = 4631/32=144.71875
y = a + bx
a = y – bx (replace the x, y & b value)
a = 144.71875+(0.00549*200.8125)
= 145.8212106
Systolic BP = 144.71875 - 0.00549.chol
Example
∑x = 6426        ∑ x2 = 1338088
∑ y = 4631       ∑ xy = 929701
n = 32
b = (929701-(6426*4631/32))/
(1338088-(64262/32)) = -0.00549
Mean x = 6426/32=200.8125
mean y = 4631/32=144.71875
a = 144.71875+(0.00549*200.8125)
= 145.8212106
Systolic BP = 145.82 - 0.005.chol
•“Criterion,”
                         •y-axis variable,
SPSS Regression Set-up   •what you’re trying
                         to predict




                             •“Predictor,”
                             •x-axis variable,
                             •what you’re
                             basing the
                             prediction on
Getting Regression Info from
SPSS
                  Model Summary

                            Adjusted Std. Error of
  Model     R      R Square R Square the Estimate
  1         .777 a     .603     .581      18.476
   a. Predictors: (Constant), Distance from target
                                                         y’ = a + b (x)
                                                         y’ = 125.401 - 4.263(20)
                                                                              20
              a                                    a
                                        Coefficients

                              Unstandardized Standardized
                                 Coefficients     Coefficients
    Model                        B     Std. Error    Beta          t      Sig.
    1     (Constant)        125.401       14.265                  8.791    .000
          Distance from target -4.263       .815         -.777   -5.230    .000
      a. Dependent Variable: Total ball toss points


                                                           b
Birthweight=1.615+0.049mBMI

                                    Coefficientsa

                         Unstandardized        Standardized
                          Coefficients          Coefficients
Model                    B        Std. Error       Beta        t       Sig.
1       (Constant)       1.615          .181                   8.909     .000
        BMI               .049          .007            .410   6.605     .000
  a. Dependent Variable: Birth weight

More Related Content

What's hot

Analysis of variance ppt @ bec doms
Analysis of variance ppt @ bec domsAnalysis of variance ppt @ bec doms
Analysis of variance ppt @ bec domsBabasab Patil
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regressionUnsa Shakir
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionMohit Asija
 
Correlation IN STATISTICS
Correlation IN STATISTICSCorrelation IN STATISTICS
Correlation IN STATISTICSKriace Ward
 
Mann Whitney U Test | Statistics
Mann Whitney U Test | StatisticsMann Whitney U Test | Statistics
Mann Whitney U Test | StatisticsTransweb Global Inc
 
Quantitative analysis
Quantitative analysisQuantitative analysis
Quantitative analysisRajesh Mishra
 
Point Biserial Correlation - Thiyagu
Point Biserial Correlation - ThiyaguPoint Biserial Correlation - Thiyagu
Point Biserial Correlation - ThiyaguThiyagu K
 
Spearman’s Rank Correlation Coefficient
Spearman’s Rank Correlation CoefficientSpearman’s Rank Correlation Coefficient
Spearman’s Rank Correlation CoefficientSharlaine Ruth
 
The spearman rank order correlation coefficient
The spearman rank order correlation coefficientThe spearman rank order correlation coefficient
The spearman rank order correlation coefficientRegent University
 
Biserial Correlation - Thiyagu
Biserial Correlation - ThiyaguBiserial Correlation - Thiyagu
Biserial Correlation - ThiyaguThiyagu K
 
t distribution, paired and unpaired t-test
t distribution, paired and unpaired t-testt distribution, paired and unpaired t-test
t distribution, paired and unpaired t-testBPKIHS
 
Regression and corelation (Biostatistics)
Regression and corelation (Biostatistics)Regression and corelation (Biostatistics)
Regression and corelation (Biostatistics)Muhammadasif909
 
What is a partial correlation?
What is a partial correlation?What is a partial correlation?
What is a partial correlation?Ken Plummer
 
Student's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate TestStudent's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate TestAzmi Mohd Tamil
 

What's hot (20)

Confidence Intervals
Confidence IntervalsConfidence Intervals
Confidence Intervals
 
Analysis of variance ppt @ bec doms
Analysis of variance ppt @ bec domsAnalysis of variance ppt @ bec doms
Analysis of variance ppt @ bec doms
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regression
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
The mann whitney u test
The mann whitney u testThe mann whitney u test
The mann whitney u test
 
Correlation IN STATISTICS
Correlation IN STATISTICSCorrelation IN STATISTICS
Correlation IN STATISTICS
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
 
Mann Whitney U Test | Statistics
Mann Whitney U Test | StatisticsMann Whitney U Test | Statistics
Mann Whitney U Test | Statistics
 
Quantitative analysis
Quantitative analysisQuantitative analysis
Quantitative analysis
 
Point Biserial Correlation - Thiyagu
Point Biserial Correlation - ThiyaguPoint Biserial Correlation - Thiyagu
Point Biserial Correlation - Thiyagu
 
Spearman’s Rank Correlation Coefficient
Spearman’s Rank Correlation CoefficientSpearman’s Rank Correlation Coefficient
Spearman’s Rank Correlation Coefficient
 
The spearman rank order correlation coefficient
The spearman rank order correlation coefficientThe spearman rank order correlation coefficient
The spearman rank order correlation coefficient
 
Biserial Correlation - Thiyagu
Biserial Correlation - ThiyaguBiserial Correlation - Thiyagu
Biserial Correlation - Thiyagu
 
PEARSON'CORRELATION
PEARSON'CORRELATION PEARSON'CORRELATION
PEARSON'CORRELATION
 
t distribution, paired and unpaired t-test
t distribution, paired and unpaired t-testt distribution, paired and unpaired t-test
t distribution, paired and unpaired t-test
 
T test
T testT test
T test
 
Regression and corelation (Biostatistics)
Regression and corelation (Biostatistics)Regression and corelation (Biostatistics)
Regression and corelation (Biostatistics)
 
What is a partial correlation?
What is a partial correlation?What is a partial correlation?
What is a partial correlation?
 
Student's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate TestStudent's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate Test
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
 

Viewers also liked

Penulisan gaya ukm (2011 2012)
Penulisan gaya ukm (2011 2012)Penulisan gaya ukm (2011 2012)
Penulisan gaya ukm (2011 2012)Syahmi Jaafar
 
Chi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarChi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarAzmi Mohd Tamil
 
Non-parametric analysis: Wilcoxon, Kruskal Wallis & Spearman
Non-parametric analysis: Wilcoxon, Kruskal Wallis & SpearmanNon-parametric analysis: Wilcoxon, Kruskal Wallis & Spearman
Non-parametric analysis: Wilcoxon, Kruskal Wallis & SpearmanAzmi Mohd Tamil
 
Analysis of variance (ANOVA)
Analysis of variance (ANOVA)Analysis of variance (ANOVA)
Analysis of variance (ANOVA)Sneh Kumari
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsBabasab Patil
 
Running Pearson's Correlation on SPSS
Running Pearson's Correlation on SPSSRunning Pearson's Correlation on SPSS
Running Pearson's Correlation on SPSSAzmi Mohd Tamil
 
Measures of variability
Measures of variabilityMeasures of variability
Measures of variabilityMark Santos
 
Brm (one tailed and two tailed hypothesis)
Brm (one tailed and two tailed hypothesis)Brm (one tailed and two tailed hypothesis)
Brm (one tailed and two tailed hypothesis)Upama Dwivedi
 
Normal Curve and Standard Scores
Normal Curve and Standard ScoresNormal Curve and Standard Scores
Normal Curve and Standard ScoresJenewel Azuelo
 
Z test
Z testZ test
Z testkagil
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)Harsh Upadhyay
 
Test of significance (t-test, proportion test, chi-square test)
Test of significance (t-test, proportion test, chi-square test)Test of significance (t-test, proportion test, chi-square test)
Test of significance (t-test, proportion test, chi-square test)Ramnath Takiar
 
Linear regression
Linear regressionLinear regression
Linear regressionTech_MX
 
Research hypothesis
Research hypothesisResearch hypothesis
Research hypothesisNursing Path
 

Viewers also liked (20)

Penulisan gaya ukm (2011 2012)
Penulisan gaya ukm (2011 2012)Penulisan gaya ukm (2011 2012)
Penulisan gaya ukm (2011 2012)
 
Chi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarChi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemar
 
Non-parametric analysis: Wilcoxon, Kruskal Wallis & Spearman
Non-parametric analysis: Wilcoxon, Kruskal Wallis & SpearmanNon-parametric analysis: Wilcoxon, Kruskal Wallis & Spearman
Non-parametric analysis: Wilcoxon, Kruskal Wallis & Spearman
 
Analysis of variance (ANOVA)
Analysis of variance (ANOVA)Analysis of variance (ANOVA)
Analysis of variance (ANOVA)
 
Linear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec domsLinear regression and correlation analysis ppt @ bec doms
Linear regression and correlation analysis ppt @ bec doms
 
Z And T Tests
Z And T TestsZ And T Tests
Z And T Tests
 
Analysis of Variance (ANOVA)
Analysis of Variance (ANOVA)Analysis of Variance (ANOVA)
Analysis of Variance (ANOVA)
 
Z test, f-test,etc
Z test, f-test,etcZ test, f-test,etc
Z test, f-test,etc
 
Running Pearson's Correlation on SPSS
Running Pearson's Correlation on SPSSRunning Pearson's Correlation on SPSS
Running Pearson's Correlation on SPSS
 
Measures of variability
Measures of variabilityMeasures of variability
Measures of variability
 
Brm (one tailed and two tailed hypothesis)
Brm (one tailed and two tailed hypothesis)Brm (one tailed and two tailed hypothesis)
Brm (one tailed and two tailed hypothesis)
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
Normal Curve and Standard Scores
Normal Curve and Standard ScoresNormal Curve and Standard Scores
Normal Curve and Standard Scores
 
Z test
Z testZ test
Z test
 
Measures of dispersion
Measures  of  dispersionMeasures  of  dispersion
Measures of dispersion
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)
 
Test of significance (t-test, proportion test, chi-square test)
Test of significance (t-test, proportion test, chi-square test)Test of significance (t-test, proportion test, chi-square test)
Test of significance (t-test, proportion test, chi-square test)
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Research hypothesis
Research hypothesisResearch hypothesis
Research hypothesis
 

Similar to Pearson Correlation, Spearman Correlation &Linear Regression

Class 9 Covariance & Correlation Concepts.pptx
Class 9 Covariance & Correlation Concepts.pptxClass 9 Covariance & Correlation Concepts.pptx
Class 9 Covariance & Correlation Concepts.pptxCallplanetsDeveloper
 
Correlation - Biostatistics
Correlation - BiostatisticsCorrelation - Biostatistics
Correlation - BiostatisticsFahmida Swati
 
5 regressionand correlation
5 regressionand correlation5 regressionand correlation
5 regressionand correlationLama K Banna
 
Hph7310week2winter2009narr
Hph7310week2winter2009narrHph7310week2winter2009narr
Hph7310week2winter2009narrSarah
 
G5-Statistical-Relationship-Among-Variables.pptx
G5-Statistical-Relationship-Among-Variables.pptxG5-Statistical-Relationship-Among-Variables.pptx
G5-Statistical-Relationship-Among-Variables.pptxArmanBenedictNuguid
 
Pearson Product Moment Correlation - Thiyagu
Pearson Product Moment Correlation - ThiyaguPearson Product Moment Correlation - Thiyagu
Pearson Product Moment Correlation - ThiyaguThiyagu K
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and RegressionShubham Mehta
 
Properties of correlation coefficient
Properties of correlation coefficientProperties of correlation coefficient
Properties of correlation coefficientRaja Adapa
 
Correlation AnalysisCorrelation AnalysisCorrelation meas.docx
Correlation AnalysisCorrelation AnalysisCorrelation meas.docxCorrelation AnalysisCorrelation AnalysisCorrelation meas.docx
Correlation AnalysisCorrelation AnalysisCorrelation meas.docxfaithxdunce63732
 
Lecture08
Lecture08Lecture08
Lecture08wilsone
 
Aronchpt3correlation
Aronchpt3correlationAronchpt3correlation
Aronchpt3correlationSandra Nicks
 
Chapter 08correlation
Chapter 08correlationChapter 08correlation
Chapter 08correlationghalan
 
Correlation and Regression Analysis using SPSS and Microsoft Excel
Correlation and Regression Analysis using SPSS and Microsoft ExcelCorrelation and Regression Analysis using SPSS and Microsoft Excel
Correlation and Regression Analysis using SPSS and Microsoft ExcelSetia Pramana
 
Aron chpt 3 correlation compatability version f2011
Aron chpt 3 correlation compatability version f2011Aron chpt 3 correlation compatability version f2011
Aron chpt 3 correlation compatability version f2011Sandra Nicks
 
Topic 5 Covariance & Correlation.pptx
Topic 5  Covariance & Correlation.pptxTopic 5  Covariance & Correlation.pptx
Topic 5 Covariance & Correlation.pptxCallplanetsDeveloper
 

Similar to Pearson Correlation, Spearman Correlation &Linear Regression (20)

Class 9 Covariance & Correlation Concepts.pptx
Class 9 Covariance & Correlation Concepts.pptxClass 9 Covariance & Correlation Concepts.pptx
Class 9 Covariance & Correlation Concepts.pptx
 
Correlation - Biostatistics
Correlation - BiostatisticsCorrelation - Biostatistics
Correlation - Biostatistics
 
5 regressionand correlation
5 regressionand correlation5 regressionand correlation
5 regressionand correlation
 
Hph7310week2winter2009narr
Hph7310week2winter2009narrHph7310week2winter2009narr
Hph7310week2winter2009narr
 
G5-Statistical-Relationship-Among-Variables.pptx
G5-Statistical-Relationship-Among-Variables.pptxG5-Statistical-Relationship-Among-Variables.pptx
G5-Statistical-Relationship-Among-Variables.pptx
 
correlation.ppt
correlation.pptcorrelation.ppt
correlation.ppt
 
Pearson Product Moment Correlation - Thiyagu
Pearson Product Moment Correlation - ThiyaguPearson Product Moment Correlation - Thiyagu
Pearson Product Moment Correlation - Thiyagu
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Linear Correlation
Linear Correlation Linear Correlation
Linear Correlation
 
Properties of correlation coefficient
Properties of correlation coefficientProperties of correlation coefficient
Properties of correlation coefficient
 
Correlational research
Correlational researchCorrelational research
Correlational research
 
Correlation AnalysisCorrelation AnalysisCorrelation meas.docx
Correlation AnalysisCorrelation AnalysisCorrelation meas.docxCorrelation AnalysisCorrelation AnalysisCorrelation meas.docx
Correlation AnalysisCorrelation AnalysisCorrelation meas.docx
 
Lecture08
Lecture08Lecture08
Lecture08
 
correlation ;.pptx
correlation ;.pptxcorrelation ;.pptx
correlation ;.pptx
 
correlation.pptx
correlation.pptxcorrelation.pptx
correlation.pptx
 
Aronchpt3correlation
Aronchpt3correlationAronchpt3correlation
Aronchpt3correlation
 
Chapter 08correlation
Chapter 08correlationChapter 08correlation
Chapter 08correlation
 
Correlation and Regression Analysis using SPSS and Microsoft Excel
Correlation and Regression Analysis using SPSS and Microsoft ExcelCorrelation and Regression Analysis using SPSS and Microsoft Excel
Correlation and Regression Analysis using SPSS and Microsoft Excel
 
Aron chpt 3 correlation compatability version f2011
Aron chpt 3 correlation compatability version f2011Aron chpt 3 correlation compatability version f2011
Aron chpt 3 correlation compatability version f2011
 
Topic 5 Covariance & Correlation.pptx
Topic 5  Covariance & Correlation.pptxTopic 5  Covariance & Correlation.pptx
Topic 5 Covariance & Correlation.pptx
 

More from Azmi Mohd Tamil

Hybrid setup - How to conduct simultaneous face-to-face and online presentati...
Hybrid setup - How to conduct simultaneous face-to-face and online presentati...Hybrid setup - How to conduct simultaneous face-to-face and online presentati...
Hybrid setup - How to conduct simultaneous face-to-face and online presentati...Azmi Mohd Tamil
 
Audiovisual and technicalities from preparation to retrieval how to enhance m...
Audiovisual and technicalities from preparation to retrieval how to enhance m...Audiovisual and technicalities from preparation to retrieval how to enhance m...
Audiovisual and technicalities from preparation to retrieval how to enhance m...Azmi Mohd Tamil
 
Broadcast quality online teaching at zero budget
Broadcast quality online teaching at zero budgetBroadcast quality online teaching at zero budget
Broadcast quality online teaching at zero budgetAzmi Mohd Tamil
 
Video for Teaching & Learning: OBS
Video for Teaching & Learning: OBSVideo for Teaching & Learning: OBS
Video for Teaching & Learning: OBSAzmi Mohd Tamil
 
Bengkel 21-12-2020 - Etika atas Talian & Alat Minima
Bengkel 21-12-2020 - Etika atas Talian & Alat MinimaBengkel 21-12-2020 - Etika atas Talian & Alat Minima
Bengkel 21-12-2020 - Etika atas Talian & Alat MinimaAzmi Mohd Tamil
 
GIS & History of Mapping in Malaya (lecture notes circa 2009)
GIS & History of Mapping in Malaya (lecture notes circa 2009)GIS & History of Mapping in Malaya (lecture notes circa 2009)
GIS & History of Mapping in Malaya (lecture notes circa 2009)Azmi Mohd Tamil
 
Blended e-learning in UKMFolio
Blended e-learning in UKMFolioBlended e-learning in UKMFolio
Blended e-learning in UKMFolioAzmi Mohd Tamil
 
How to Compute & Recode SPSS Data
How to Compute & Recode SPSS DataHow to Compute & Recode SPSS Data
How to Compute & Recode SPSS DataAzmi Mohd Tamil
 
Introduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R StudioIntroduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R StudioAzmi Mohd Tamil
 
Hack#38 - How to Stream Zoom to Facebook & YouTube Without Using An Encoder o...
Hack#38 - How to Stream Zoom to Facebook & YouTube Without Using An Encoder o...Hack#38 - How to Stream Zoom to Facebook & YouTube Without Using An Encoder o...
Hack#38 - How to Stream Zoom to Facebook & YouTube Without Using An Encoder o...Azmi Mohd Tamil
 
Hack#37 - How to simultaneously live stream to 4 sites using a single hardwar...
Hack#37 - How to simultaneously live stream to 4 sites using a single hardwar...Hack#37 - How to simultaneously live stream to 4 sites using a single hardwar...
Hack#37 - How to simultaneously live stream to 4 sites using a single hardwar...Azmi Mohd Tamil
 
Cochran Mantel Haenszel Test with Breslow-Day Test & Quadratic Equation
Cochran Mantel Haenszel Test with Breslow-Day Test & Quadratic EquationCochran Mantel Haenszel Test with Breslow-Day Test & Quadratic Equation
Cochran Mantel Haenszel Test with Breslow-Day Test & Quadratic EquationAzmi Mohd Tamil
 
New Emerging And Reemerging Infections circa 2006
New Emerging And Reemerging Infections circa 2006New Emerging And Reemerging Infections circa 2006
New Emerging And Reemerging Infections circa 2006Azmi Mohd Tamil
 
Hacks#36 -Raspberry Pi 4 Mini Computer
Hacks#36 -Raspberry Pi 4 Mini ComputerHacks#36 -Raspberry Pi 4 Mini Computer
Hacks#36 -Raspberry Pi 4 Mini ComputerAzmi Mohd Tamil
 
Hack#35 How to FB Live using a Video Encoder
Hack#35 How to FB Live using a Video EncoderHack#35 How to FB Live using a Video Encoder
Hack#35 How to FB Live using a Video EncoderAzmi Mohd Tamil
 
Hack#34 - Online Teaching with Microsoft Teams
Hack#34 - Online Teaching with Microsoft TeamsHack#34 - Online Teaching with Microsoft Teams
Hack#34 - Online Teaching with Microsoft TeamsAzmi Mohd Tamil
 
Skype for Business for UKM
Skype for Business for UKM Skype for Business for UKM
Skype for Business for UKM Azmi Mohd Tamil
 
Introduction to Structural Equation Modeling
Introduction to Structural Equation ModelingIntroduction to Structural Equation Modeling
Introduction to Structural Equation ModelingAzmi Mohd Tamil
 
Safe computing (circa 2004)
Safe computing (circa 2004)Safe computing (circa 2004)
Safe computing (circa 2004)Azmi Mohd Tamil
 

More from Azmi Mohd Tamil (20)

Hybrid setup - How to conduct simultaneous face-to-face and online presentati...
Hybrid setup - How to conduct simultaneous face-to-face and online presentati...Hybrid setup - How to conduct simultaneous face-to-face and online presentati...
Hybrid setup - How to conduct simultaneous face-to-face and online presentati...
 
Audiovisual and technicalities from preparation to retrieval how to enhance m...
Audiovisual and technicalities from preparation to retrieval how to enhance m...Audiovisual and technicalities from preparation to retrieval how to enhance m...
Audiovisual and technicalities from preparation to retrieval how to enhance m...
 
Broadcast quality online teaching at zero budget
Broadcast quality online teaching at zero budgetBroadcast quality online teaching at zero budget
Broadcast quality online teaching at zero budget
 
Video for Teaching & Learning: OBS
Video for Teaching & Learning: OBSVideo for Teaching & Learning: OBS
Video for Teaching & Learning: OBS
 
Bengkel 21-12-2020 - Etika atas Talian & Alat Minima
Bengkel 21-12-2020 - Etika atas Talian & Alat MinimaBengkel 21-12-2020 - Etika atas Talian & Alat Minima
Bengkel 21-12-2020 - Etika atas Talian & Alat Minima
 
GIS & History of Mapping in Malaya (lecture notes circa 2009)
GIS & History of Mapping in Malaya (lecture notes circa 2009)GIS & History of Mapping in Malaya (lecture notes circa 2009)
GIS & History of Mapping in Malaya (lecture notes circa 2009)
 
Blended e-learning in UKMFolio
Blended e-learning in UKMFolioBlended e-learning in UKMFolio
Blended e-learning in UKMFolio
 
How to Compute & Recode SPSS Data
How to Compute & Recode SPSS DataHow to Compute & Recode SPSS Data
How to Compute & Recode SPSS Data
 
Introduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R StudioIntroduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R Studio
 
Hack#38 - How to Stream Zoom to Facebook & YouTube Without Using An Encoder o...
Hack#38 - How to Stream Zoom to Facebook & YouTube Without Using An Encoder o...Hack#38 - How to Stream Zoom to Facebook & YouTube Without Using An Encoder o...
Hack#38 - How to Stream Zoom to Facebook & YouTube Without Using An Encoder o...
 
Hack#37 - How to simultaneously live stream to 4 sites using a single hardwar...
Hack#37 - How to simultaneously live stream to 4 sites using a single hardwar...Hack#37 - How to simultaneously live stream to 4 sites using a single hardwar...
Hack#37 - How to simultaneously live stream to 4 sites using a single hardwar...
 
Cochran Mantel Haenszel Test with Breslow-Day Test & Quadratic Equation
Cochran Mantel Haenszel Test with Breslow-Day Test & Quadratic EquationCochran Mantel Haenszel Test with Breslow-Day Test & Quadratic Equation
Cochran Mantel Haenszel Test with Breslow-Day Test & Quadratic Equation
 
New Emerging And Reemerging Infections circa 2006
New Emerging And Reemerging Infections circa 2006New Emerging And Reemerging Infections circa 2006
New Emerging And Reemerging Infections circa 2006
 
Hacks#36 -Raspberry Pi 4 Mini Computer
Hacks#36 -Raspberry Pi 4 Mini ComputerHacks#36 -Raspberry Pi 4 Mini Computer
Hacks#36 -Raspberry Pi 4 Mini Computer
 
Hack#35 How to FB Live using a Video Encoder
Hack#35 How to FB Live using a Video EncoderHack#35 How to FB Live using a Video Encoder
Hack#35 How to FB Live using a Video Encoder
 
Hack#34 - Online Teaching with Microsoft Teams
Hack#34 - Online Teaching with Microsoft TeamsHack#34 - Online Teaching with Microsoft Teams
Hack#34 - Online Teaching with Microsoft Teams
 
Hack#33 How To FB-Live
Hack#33 How To FB-LiveHack#33 How To FB-Live
Hack#33 How To FB-Live
 
Skype for Business for UKM
Skype for Business for UKM Skype for Business for UKM
Skype for Business for UKM
 
Introduction to Structural Equation Modeling
Introduction to Structural Equation ModelingIntroduction to Structural Equation Modeling
Introduction to Structural Equation Modeling
 
Safe computing (circa 2004)
Safe computing (circa 2004)Safe computing (circa 2004)
Safe computing (circa 2004)
 

Pearson Correlation, Spearman Correlation &Linear Regression

  • 1. Correlation (Pearson & Spearman) & Linear Regression Azmi Mohd Tamil
  • 2. Key Concepts Correlation as a statistic Positive and Negative Bivariate Correlation Range Effects Outliers Regression & Prediction Directionality Problem ( & cross-lagged panel) Third Variable Problem (& partial correlation)
  • 3. Assumptions Related pairs Scale of measurement. For Pearson, data should be interval or ratio in nature. Normality Linearity Homocedasticity
  • 4. Example of Non-Linear Relationship Yerkes-Dodson Law – not for correlation Better Performance Worse Low Stress High
  • 5. Correlation X Y Stress Illness
  • 6. Correlation – parametric & non-para 2 Continuous Variables - Pearson linear relationship e.g., association between height and weight 1 Continuous, 1 Categorical Variable (Ordinal) Spearman/Kendall –e.g., association between Likert Scale on work satisfaction and work output –pain intensity (no, mild, moderate, severe) and severe dosage of pethidine
  • 7. Pearson Correlation 2 Continuous Variables – linear relationship – e.g., association between height and weight, + measures the degree of linear association between two interval scaled variables analysis of the relationship between two quantitative outcomes, e.g., height and weight,
  • 8.
  • 9. How to calculate r? df = np - 2
  • 10. How to calculate r? a b c
  • 11. Example •∑ x = 4631 ∑ x2 = 688837 •∑ y = 2863 ∑ y2 = 264527 •∑ xy = 424780 n = 32 •a=424780-(4631*2863/32)=10,450.22 •b=688837-46312/32=18,644.47 •c=264527-28632/32=8,377.969 •r=a/(b*c)0.5 =10,450.22/(18,644.47*83,77.969)0.5 =0.836144 •t= 0.836144*((32-2)/(1-0.8361442))0.5 t = 8.349436 & d.f. = n - 2 = 30, p < 0.001
  • 12. We refer to Table A3. so we use df=30 . t = 8.349436 > 3.65 (p=0.001) Therefore if t=8.349436, p<0.001.
  • 13. Correlation Two pieces of information: The strength of the relationship The direction of the relationship
  • 14. Strength of relationship r lies between -1 and 1. Values near 0 means no (linear) correlation and values near ± 1 means very strong correlation. -1.0 0.0 +1.0 Strong Negative No Rel. Strong Positive
  • 15. How to interpret the value of r?
  • 16. Correlation ( + direction) Positive correlation: high values of one variable associated with high values of the other Example: Higher course entrance exam scores are associated with better course grades during the final Positive and Linear exam.
  • 17. Correlation ( - direction) Negative correlation: The negative sign means that the two variables are inversely related, that is, as one variable increases the other variable decreases. Example: Increase in body mass index is associated with reduced Negative and Linear effort tolerance.
  • 18. Pearson’s r A 0.9 is a very strong positive association (as one variable rises, so does the other) A -0.9 is a very strong negative association (as one variable rises, the other falls) r=0.9 has nothing to do with 90% r=correlation coefficient
  • 19. Coefficient of Determination Defined Pearson’s r can be squared , r 2, to derive a coefficient of determination. Coefficient of determination – the portion of variability in one of the variables that can be accounted for by variability in the second variable
  • 20. Coefficient of Determination Pearson’s r can be squared , r 2, to derive a coefficient of determination. Example of depression and CGPA – Pearson’s r shows negative correlation, r=-0.5 – r2=0.25 – In this example we can say that 1/4 or 0.25 of the variability in CGPA scores can be accounted for by depression (remaining 75% of variability is other factors, habits, ability, motivation, courses studied, etc)
  • 21. Coefficient of Determination and Pearson’s r Pearson’s r can be squared , r 2 If r=0.5, then r2=0.25 If r=0.7 then r2=0.49 Thus while r=0.5 versus 0.7 might not look so different in terms of strength, r2 tells us that r=0.7 accounts for about twice the variability relative to r=0.5
  • 22. A study was done to find the association between the mothers’ weight and their babies’ birth weight. The following is the scatter diagram showing the relationship between the two variables. 5.00 The coefficient of correlation (r) is 4.00 0.452 The coefficient of Baby's Birthweight determination (r2) 3.00 is 0.204 Twenty percent of 2.00 the variability of the babies’ birth weight is 1.00 determined by the R Sq Linear = 0.204 variability of the mothers’ weight. 0.00 0.0 20.0 40.0 60.0 80.0 100.0 Mother's Weight
  • 23. Causal Silence: Correlation Does Not Imply Causality Causality – must demonstrate that variance in one variable can only be due to influence of the other variable Directionality of Effect Problem Third Variable Problem
  • 24. CORRELATION DOES NOT MEAN CAUSATION A high correlation does not give us the evidence to make a cause- and-effect statement. A common example given is the high correlation between the cost of damage in a fire and the number of firemen helping to put out the fire. Does it mean that to cut down the cost of damage, the fire department should dispatch less firemen for a fire rescue! The intensity of the fire that is highly correlated with the cost of damage and the number of firemen dispatched. The high correlation between smoking and lung cancer. However, one may argue that both could be caused by stress; and smoking does not cause lung cancer. In this case, a correlation between lung cancer and smoking may be a result of a cause-and-effect relationship (by clinical experience + common sense?). To establish this cause-and-effect relationship, controlled experiments should be performed.
  • 25. More Big Fire Firemen Sent More Damage
  • 26. Directionality of Effect Problem X Y X Y X Y
  • 27. Directionality of Effect Problem X Y Class Higher Higher Attendance Grades X Y Class Higher Higher Attendance Grades
  • 28. Directionality of Effect Problem X Y Aggressive Behavior Viewing Violent TV Viewing Violent TV X Y Aggressive Behavior Viewing Violent TV Viewing Violent TV Aggressive children may prefer violent programs or Violent programs may promote aggressive behavior
  • 29. Methods for Dealing with Directionality Cross-Lagged Panel design – A type of longitudinal design – Investigate correlations at several points in time – STILL NOT CAUSAL Example next page
  • 30. Cross-Lagged Panel Pref for violent TV .05 Pref for violent TV 3rd grade 13th grade .21 .31 .01 -.05 TV to later Aggression to later TV aggression Aggression Aggression 3rd grade .38 13th grade
  • 32. Class Exercise Identify the third variable that influences both X and Y
  • 33. Third Variable Problem + Class GPA Attendance Motivation
  • 34. Third Variable Problem + Number of Crime Mosques Rate Size of Population
  • 35. Third Variable Problem + Ice Cream Number of Consumed Drownings Temperature
  • 36. Third Variable Problem + Reading Score Reading Comprehension IQ
  • 37. Data Preparation - Correlation Screen data for outliers and ensure that there is evidence of linear relationship, since correlation is a measure of linear relationship. Assumption is that each pair is bivariate normal. If not normal, then use Spearman.
  • 38. Correlation In SPSS For this exercise, we will be using the data from the CD, under Chapter 8, korelasi.sav This data is a subset of a case- control study on factors affecting SGA in Kelantan. Open the data & select - >Analyse >Correlate >Bivariate…
  • 39. Correlation in SPSS We want to see whether there is any association between the mothers’ weight and the babies’weight. So select the variables (weight2 & birthwgt) into ‘Variables’. Select ‘Pearson’ Correlation Coefficients. Click the ‘OK’ button.
  • 40. Correlation Results Correlations WEIGHT2 BIRTHWGT WEIGHT2 Pearson Correlation 1 .431* Sig. (2-tailed) . .017 N 30 30 BIRTHWGT Pearson Correlation .431* 1 Sig. (2-tailed) .017 . N 30 30 *. Correlation is significant at the 0.05 level (2-tailed). The r = 0.431 and the p value is significant at 0.017. The r value indicates a fair and positive linear relationship.
  • 41. Scatter Diagram 3.6 3.4 3.2 3.0 2.8 If the correlation is 2.6 significant, it is best to 2.4 2.2 include the scatter 2.0 diagram. 1.8 1.6 The r square indicated 1.4 mothers’ weight 1.2 contribute 19% of the BIRTHWEIGHT 1.0 .8 variability of the babies’ .6 weight. .4 .2 0.0 Rsq = 0.1861 0 10 20 30 40 50 60 70 80 90 100 MOTHERS' WEIGHT
  • 42. Spearman/Kendall Correlation To find correlation between a related pair of continuous data (not normally distributed); or Between 1 Continuous, 1 Categorical Variable (Ordinal) e.g., association between Likert Scale on work satisfaction and work output.
  • 43. Spearman's rank correlation coefficient In statistics, Spearman's rank correlation coefficient, named for Charles Spearman and often denoted by the Greek letter ρ (rho), is a non-parametric measure of correlation – that is, it assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables. Unlike the Pearson product-moment correlation coefficient, it does not require the assumption that the relationship between the variables is linear, nor does it require the variables to be measured on interval scales; it can be used for variables measured at the ordinal level.
  • 44. Example •Correlation between sphericity and visual acuity. •Sphericity of the eyeball is continuous data while visual acuity is ordinal data (6/6, 6/9, 6/12, 6/18, 6/24), therefore Spearman correlation is the most suitable. •The Spearman rho correlation coefficient is -0.108 and p is 0.117. P is larger than 0.05, therefore there is no significant association between sphericity and visual acuity.
  • 45. Example 2 •- Correlation between glucose level and systolic blood pressure. •Based on the data given, prepare the following table; •For every variable, sort the data by rank. For ties, take the average. •Calculate the difference of rank, d for every pair and square it. Take the total. •Include the value into the following formula; •∑ d2 = 4921.5 n = 32 •Therefore rs = 1-((6*4921.5)/(32*(322-1))) = 0.097966. This is the value of Spearman correlation coefficient (or ϒ). •Compare the value against the Spearman table; •p is larger than 0.05. •Therefore there is no association between systolic BP and blood glucose level.
  • 46. Spearman’s table •0.097966 is the value of Spearman correlation coefficient (or ρ). •Compare the value against the Spearman table; •0.098 < 0.364 (p=0.05) •p is larger than 0.05. •Therefore there is no association between systolic BP and blood glucose level.
  • 47. SPSS Output Correlations GLU BPS1 Spearman's rho GLU Correlation Coefficient 1.000 .097 Sig. (2-tailed) . .599 N 32 32 BPS1 Correlation Coefficient .097 1.000 Sig. (2-tailed) .599 . N 32 32
  • 49. Regression X Y Predictor Criterion Variable Variable Predicting Y based on a given value of X
  • 50. Regression and Prediction X Y Height Weight
  • 51. REGRESSION Regression – one variable is a direct cause of the other – or if the value of one variable is changed, then as a direct consequence the other variable also change – or if the main purpose of the analysis is prediction of one variable from the other
  • 52. REGRESSION – Regression: the dependence of dependent variable (outcome) on the independant variable (RF) – relationship is summarised by a regression equation. – y = a + bx
  • 53. REGRESSION Regression – The regression line x - independent variable - horizontal axis y - dependent variable - vertical axis – Regression equation y = a + bx – a = intercept at y axis – b = regression coefficient – Test of significance - b is not equal to zero
  • 55. The Least Squares (Regression) Line A good line is one that minimizes the sum of squared differences between the points and the line.
  • 56. The Least Squares (Regression) Line Sum of squared differences = (2 - 1)2 + (4 - 2)2 +(1.5 - 3)2 + (3.2 - 4)2 = 6.89 Sum of squared differences = (2 -2.5)2 + (4 - 2.5)2 + (1.5 - 2.5)2 + (3.2 - 2.5)2 = 3.99 Let us compare two lines 4 (2,4) The second line is horizontal (4,3.2) 3 2.5 2 (1,2) (3,1.5) 1 The smaller the sum of squared differences the better the fit of the 1 2 3 4 line to the data.
  • 57. Linear Regression Come up with a Linear Regression Model to predict a continous outcome with a continous risk factor, i.e. predict BP with age. Usually LR is the next step after correlation is found to be strongly significant. y = a + bx; a = y - bx – e.g. BP = constant (a) + regression coefficient (b) * age b=
  • 58. Testing for significance test whether the slope is significantly different from zero by: t = b/SE(b)
  • 60.
  • 61. Regression Line In a scatterplot showing the association between 2 variables, the regression line is the “best-fit” line and has the formula y=a + bx a=place where line crosses Y axis b=slope of line (rise/run) Thus, given a value of X, we can predict a value of Y
  • 62. Regression Graphic – Regression Line 120 100 80 60 Total ball toss points y’=47 40 y’=20 20 0 Rsq = 0.6031 8 10 12 14 16 18 20 22 24 26 Distance from target if x=18 if x=24 then… then…
  • 63. Regression Equation y’= a + bx – y’ = predicted value of y – b = slope of the line – x = value of x that you enter – a = y-intercept (where line crosses y axis) In this case…. – y’ = 125.401 - 4.263(x) So if the distance is 20 feet – y’ = -4.263(20) + 125.401 20 – y’ = -85.26 + 125.401 – y’ = 40.141
  • 64. Regression Line (Defined) Regression line is the line where absolute values of vertical distances between points on scatterplot and a line form a minimum sum (relative to other possible lines) Positive and Linear Negative and Linear
  • 65. Example b= ∑x = 6426 ∑ x2 = 1338088 ∑y = 4631 ∑ xy = 929701 n = 32 b = (929701-(6426*4631/32))/ (1338088-(64262/32)) = -0.00549 Mean x = 6426/32=200.8125 mean y = 4631/32=144.71875 y = a + bx a = y – bx (replace the x, y & b value) a = 144.71875+(0.00549*200.8125) = 145.8212106 Systolic BP = 144.71875 - 0.00549.chol
  • 66. Example ∑x = 6426 ∑ x2 = 1338088 ∑ y = 4631 ∑ xy = 929701 n = 32 b = (929701-(6426*4631/32))/ (1338088-(64262/32)) = -0.00549 Mean x = 6426/32=200.8125 mean y = 4631/32=144.71875 a = 144.71875+(0.00549*200.8125) = 145.8212106 Systolic BP = 145.82 - 0.005.chol
  • 67. •“Criterion,” •y-axis variable, SPSS Regression Set-up •what you’re trying to predict •“Predictor,” •x-axis variable, •what you’re basing the prediction on
  • 68. Getting Regression Info from SPSS Model Summary Adjusted Std. Error of Model R R Square R Square the Estimate 1 .777 a .603 .581 18.476 a. Predictors: (Constant), Distance from target y’ = a + b (x) y’ = 125.401 - 4.263(20) 20 a a Coefficients Unstandardized Standardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 125.401 14.265 8.791 .000 Distance from target -4.263 .815 -.777 -5.230 .000 a. Dependent Variable: Total ball toss points b
  • 69. Birthweight=1.615+0.049mBMI Coefficientsa Unstandardized Standardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 1.615 .181 8.909 .000 BMI .049 .007 .410 6.605 .000 a. Dependent Variable: Birth weight