Bivariate

The Multiple Regression Model

Idea: Examine the linear relationship between
1 dependent (Y) & 2 or more independent variables (Xi)

Multiple Regression Model with k Independent Variables:

Y-intercept Population slopes Random Error

Yi β0 β1X1i β2 X2i  βk Xki εi

Assumptions of Regression
Use the acronym LINE:
• Linearity
– The underlying relationship between X and Y is linear

• Independence of Errors
– Error values are statistically independent

• Normality of Error
– Error values (ε) are normally distributed for any given value of X

• Equal Variance (Homoscedasticity)
– The probability distribution of the errors has constant variance

Regression Statistics
Multiple R 0.998368 2 SSR 11704.1
r .996739
R Square 0.996739 SST 11740
Adjusted R
Square 0.995808
Standard
Error 1.350151 99.674% variation is
Observations 28 explained by the
dependent Variables
ANOVA
Significan
df SS MS F ce F
Regression 6 11701.72 1950.286 1069.876 5.54E-25
Residual 21 38.28108 1.822908
Total 27 11740

Adjusted r2
• r2 never decreases when a new X variable is
added to the model
– This can be a disadvantage when comparing models
• What is the net effect of adding a new variable?
– We lose a degree of freedom when a new X variable
is added
– Did the new X variable add enough explanatory
power to offset the loss of one degree of freedom?

Adjusted r2
• Shows the proportion of variation in Y explained
by all X variables adjusted for the number of X
variables used
2 n 1
2
radj 1 (1 r )
n k 1
(where n = sample size, k = number of independent variables)

– Penalize excessive use of unimportant independent
variables
– Smaller than r2
– Useful in comparing among models

Error and coefficients relationship
• B1 = Covar(yx)/Varp(x)

Stddevp 419.28571 1103.4439 115902.4 1630165.82 36245060.6 706538.59 195.9184
Covar 662.14286 6862.5 25621.4286 120976.786 16061.643 257.1429
b1 0.6000694 0.059209 0.01571707 0.00333775 0.0227329 1.3125

Is the Model Significant?
• F Test for Overall Significance of the Model
• Shows if there is a linear relationship between all of the
X variables considered together and Y
• Use F-test statistic
• Hypotheses:
H0: β1 = β2 = … = βk = 0 (no linear relationship)
H1: at least one βi ≠ 0 (at least one independent
variable affects Y)

F Test for Overall Significance
• Test statistic:
SSR
MSR k
F
MSE SSE
n k 1
where F has (numerator) = k and
(denominator) = (n – k - 1)
degrees of freedom

Multiple Regression Assumptions
Errors (residuals) from the regression model:

<
ei = (Yi – Yi)

Assumptions:
• The errors are normally distributed
• Errors have a constant variance
• The model errors are independent

Error terms and coefficient estimates
• Once we think of the Error term as a random
variable, it becomes clear that the estimates
of b1, b2, … (as distinguished from their true
values) will also be random variables, because
the estimates generated by the SSE criterion
will depend upon the particular value of e
drawn by nature for each individual in the
data set.

Statistical Inference and Goodness of
fit
• The parameter estimates are themselves random
variables, dependent upon the random variables e.
• Thus, each estimate can be thought of as a draw
from some underlying probability distribution, the
nature of that distribution as yet unspecified.
• If we assume that the error terms e are all drawn
from the same normal distribution, it is possible to
show that the parameter estimates have a normal
distribution as well.

T Statistic and P value
• T = B1-B1average/B1 std dev

Can you have a hypothesis that
b1 average = b1 estimate
and do the T test

Are Individual Variables Significant?

• Use t tests of individual variable slopes
• Shows if there is a linear relationship between the
variable Xj and Y
• Hypotheses:
– H0: βj = 0 (no linear relationship)
– H1: βj ≠ 0 (linear relationship does exist
between Xj and Y)

Are Individual Variables Significant?

H0: βj = 0 (no linear relationship)
H1: βj ≠ 0 (linear relationship does exist
between xj and y)

Test Statistic:

bj 0
t (df = n – k – 1)
Sb j

Coefficien Standard Lower Upper Lower Upper
ts Error t Stat P-value 95% 95% 95.0% 95.0%
Intercept -59.0661 11.28404 -5.23448 3.45E-05 -82.5325 -35.5996 -82.5325 -35.5996
OFF -0.00696 0.04619 -0.15068 0.881663 -0.10302 0.089097 -0.10302 0.089097
BAR 0.041988 0.005271 7.966651 8.81E-08 0.031028 0.052949 0.031028 0.052949
YNG 0.002716 0.000999 2.717326 0.012904 0.000637 0.004794 0.000637 0.004794
VEH 0.00147 0.000265 5.540878 1.69E-05 0.000918 0.002021 0.000918 0.002021
INV -0.00274 0.001336 -2.05135 0.052914 -0.00552 3.78E-05 -0.00552 3.78E-05
SPD -0.2682 0.068418 -3.92009 0.000786 -0.41049 -0.12592 -0.41049 -0.12592

with n – (k+1) degrees of freedom

Confidence Interval Estimate
for the Slope
• Confidence interval for the population slope βj

• b j tn S
k 1 bj where t has (n – k – 1) d.f.

Example: Form a 95% confidence interval for the effect of
changes in Bars on fatal accidents:
0.041988 (2.079614 )(0.005271)
So the interval is (0.031028, 0.052949 )
(This interval does not contain zero, so bars has a significant
effect on Accidents)

Coefficien Standard Lower Upper
ts Error t Stat P-value 95% 95%
Intercept -59.0661 11.28404 -5.23448 3.45E-05 -82.5325 -35.5996
OFF -0.00696 0.04619 -0.15068 0.881663 -0.10302 0.089097
BAR 0.041988 0.005271 7.966651 8.81E-08 0.031028 0.052949
YNG 0.002716 0.000999 2.717326 0.012904 0.000637 0.004794
VEH 0.00147 0.000265 5.540878 1.69E-05 0.000918 0.002021
INV -0.00274 0.001336 -2.05135 0.052914 -0.00552 3.78E-05
SPD -0.2682 0.068418 -3.92009 0.000786 -0.41049 -0.12592

Using Dummy Variables

• A dummy variable is a categorical explanatory
variable with two levels:
– yes or no, on or off, male or female
– coded as 0 or 1
• Regression intercepts are different if the
variable is significant
• Assumes equal slopes for other variables

Interaction Between
Independent Variables
• Hypothesizes interaction between pairs of X
variables
– Response to one X variable may vary at different
levels of another X variable

• Contains cross-product term
ˆ
Y b0 b1X1 b2 X2 b3 X3
–
b0 b1X1 b2 X2 b3 (X1X2 )

Effect of Interaction
• Given:
Y β0 β1X1 β2 X2 β3 X1X2 ε

• Without interaction term, effect of X1 on Y is
measured by β1
• With interaction term, effect of X1 on Y is
measured by β1 + β3 X2
• Effect changes as X2 changes

Interaction Example
Suppose X2 is a dummy variable and the estimated regression equation is
ˆ
Y= 1 + 2X1 + 3X2 + 4X1X2
Y

12

X2 = 1:
8 Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1

4
X2 = 0:
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
0
X1
0 0.5 1 1.5

Slopes are different if the effect of X1 on Y depends on X2 value

Residual Analysis
ei Yi ˆ
Yi
• The residual for observation i, ei, is the difference between
its observed and predicted value
• Check the assumptions of regression by examining the
residuals
– Examine for linearity assumption
– Evaluate independence assumption
– Evaluate normal distribution assumption
– Examine for constant variance for all levels of X (homoscedasticity)

• Graphical Analysis of Residuals
– Can plot residuals vs. X

Residual Analysis for
Independence

Not Independent

 Independent
residuals

X

residuals
X
residuals

X

Residual Analysis for
Equal Variance
Y
Y

x x
residuals

x residuals x

Non-constant variance
 Constant variance

Linear vs. Nonlinear Fit

Y Y

X X
residuals

X residuals X

Linear fit does not give Nonlinear fit gives
random residuals
 random residuals

Quadratic Regression Model
Yi β0 β1X1i β 2 X1i
2
εi
Quadratic models may be considered when the scatter diagram takes on one of
the following shapes:

Y Y Y Y

X1 X1 X1 X1
β1 < 0 β1 > 0 β1 < 0 β1 > 0

β2 > 0 β2 > 0 β2 < 0 β2 < 0

β1 = the coefficient of the linear term
β2 = the coefficient of the squared term

Bivariate

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Bivariate

Similar to Bivariate (20)

More from Vikas Saini

More from Vikas Saini (10)

Recently uploaded

Recently uploaded (20)

Bivariate