6. Correlation
• Properties r:
• r is unit less
• r does not depend on the location & scale of the data
• ‐1 < r < 1
• minimal value = ‐1: extreme negative association
• maximal value = 1: extreme positive association
• special value = 0: no association
• And ..... measures linear association!
Biostatistics Workshop 6
11. Simple Linear Regression
• Model y = a + b x
• Simple: only 1 X
• Linear: straight line relationship
• Terminology:x: regressor (independent variable)
• y: response (dependent variable)
• intercept, slope: regression coefficients
Biostatistics Workshop 11
12. Assumptions of Linear Regression
• Linearity
• Linear relationship between outcome and predictors
• E(Y|X=x)=+ x1 + 2x2
2 is still a linear regression equation
because each of the ’s is to the first power
• Normality of the residuals
• The residuals, i, are normally distributed, N(0,
• Homoscedasticity of the residuals
• The residuals, i, have the same variance
• Independence
• All of the data points are independent
• Correlated data points can be taken into account using
multivariate and longitudinal data methods
36. Biostatistics Workshop 36
The Multiple R for the relationship between the set of
independent variables and the dependent variable is 0.79,
which would be characterized as strong using the rule of
thumb than a correlation less than or equal to 0.20 is
characterized as very weak; greater than 0.20 and less than
or equal to 0.40 is weak; greater than 0.40 and less than or
equal to 0.60 is moderate; greater than 0.60 and less than or
equal to 0.80 is strong; and greater than 0.80 is very strong.
R2 is a statistic that will give some information
about the goodness of fit of a model.
The R2 coefficient of determination is a
statistical measure of how well the regression
line approximates the real data points.
An R2 of 1 indicates that the regression line
perfectly fits the data.
37. Biostatistics Workshop 37
The probability of the F statistic (107.4) for the
overall regression relationship is <0.001, less than or
equal to the level of significance of 0.05. We reject
the null hypothesis that there is no relationship
between the set of independent variables and the
dependent variable (R² = 0). We support the
research hypothesis that there is a statistically
significant relationship between the set of
independent variables and the dependent variable.
38. Biostatistics Workshop 38
For the independent variable strength of affiliation, the
probability of the t statistic (-3.672) for the b
coefficient is <0.001 which is less than or equal to the
level of significance of 0.05. We reject the null
hypothesis that the slope associated with strength of
affiliation is equal to zero (b = 0) and conclude that
there is a statistically significant relationship between
gender and academic index.
39. Biostatistics Workshop 39
The b coefficient associated with gender (-5.8) is
negative, indicating that Male has lower academic index
as compared female student.
40. Biostatistics Workshop 40
The b coefficient associated with reading is positive,
indicating an positive relationship in which higher reading
(and writing) are associated with higher academic index.