We present an overview of regression analysis, theoretical construct, then provide a graphic representation before performing multiple regression analysis step by step using SPSS (audio files accompany the tutorial).
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression Analysis presentation by Al Arizmendez and Cathryn Lottier
1. Regression Analysis
CMGT 587A
UNIVERSITY OF SOUTHERN CALIFORNIA
AL ARIZMENDEZ/CATHRYN LOTTIER
2. What is Regression Analysis?
The Regression Method, more commonly referred to
as Regression Analysis, is the assessment of the
relationship of a dependent variable and one or more
multiple independent variable(s).
It involves techniques for measuring or analyzing
multiple variables and their relationship
This technique is used to analyze variables with at
least one dependent variable (often y) and one or
multiple independent variables (often x) to
understand a phenomena, make predictions, and/or
test hypotheses
3. Assumptions Underlying the Method
The validity of regression analysis depends on four
assumptions:
Linearity: where the relationship between dependent and
independent variables are directly proportional to each other
Independence: an independence of errors with no serial
correlation (a random value of Y is assumed to be independent
of any other value of Y)
Constant variance: having your data values be scattered to
the same extent
Normality: the random variable of interest is distributed is a
normal manner
4. When can you use Regression Analysis?
Regression Analysis is used to make predictions, so it can
virtually be used by anyone
Some reasons that you may want to use regression
analysis are:
To model a phenomena to understand it better in order to make
decisions
To model a phenomena to understand it better to predict values for
that in other places or times (later in these slides, you will see an
example of this as we created an example to forecast album sales)
To test a hypotheses, but one should note that regression analysis is
an estimate or guess, not an accurate data set (we will show an
example of this later in the slides with our test of life expectancy vs.
literary rates)
5. Diving a Little Deeper…
Multiple linear regression analysis begins by positing the
general form of the relationship in the following model:
ϒi = β0 + β1Χi1 + εi
More simply put: Outcomei = (b0 + b1xi) + errori
Where Y is the dependent variable, β0 is the intercept,
β1 is the slope and Χi1 is the independent variable
The ε is the residual term, which expresses the
composite of all the other types of individual differences
that aren’t explicitly identified in the model (a.k.a.
random error term)…a reminder that it will never be
perfect
6. What does that really mean?
That equation means that the “outcome” can be predicted
from a model and some error associated with that
prediction (εi)
The outcome variable is represented as yi, which is
predicted using a predictor variable (xi) and a parameter
(bi) associated with the predictor variable
Bi is the line the direction or strength of the relationship or effect
B0 tells us what the value of the outcome is when the predictor is 0
(the intercept)
The betas tell us what the shape of the model is and what it
looks like
7. Explanation of R Squared
R2 allows one to assess how well the model fits
If you square all of the differences, the sum of all the squared
differences is known as the total sum of squares (SST )
If an optimal model is fitted to the data, the differences
between the observed data points and the values predicted by
the regression line can be squared and summed, which is
referred to as the sum of squared residuals (SSR)
The difference between SST and SSR is the model sum of
squares (SSM)
R2 is determined by dividing the model sum of squares by the
total sum of squares, which is used to describe how well the
regression line fits
An R2 near 1 indicates that a regression line fits the data well,
while an R2 closer to 0 indicates a regression line does not fit
the data very well
8. Example of Regression Analysis
Regression Analysis can be used to forecast the trend of
album sales (shown on the y-axis) in relation to the
advertising budget (shown on the x-axis)
9. Adding Another Variable to the Equation
Now, taking it one step further
and adding amount of radio
play to the equation
This turns into multiple
regression analysis with
more predictors creating a
regression plane (or a 3d
model) with the line turning
into a plane
It looks more complicated, but
the principles remain the
same as linear regression
10. Explanation of Multiple Regression Analysis
Multiple Regression Analysis
Often referred to as OLS (Ordinary Least Squares) regression
“multiple regression can establish whether a set of
independent variables explains a proportion of the variance in
a dependent variable at a significant level (through a
significance test of R2)” (Garson, 2012, p. 10)
It can also determine the relative predictive importance of the
independent variable (by comparing regression weights, also
known as beta weights)
11. Multiple Regression Analysis
While the formula for linear regression analysis
looks like this:
ϒi = β0 + β1Χ1i + εi
Multiple regression analysis looks more like this:
ϒi = (β0 + β1Χ1i+ β2Χ2i…+ βnΧni) + εi
This shows that the principles are the same as
linear regression, there are just more predictors!
12. Talking About the Betas
The betas tell the relationship
between a particular predictor
and the outcome
The betas also define the shape
of the plane
In this instance:
the beta 0 is represent where the
plane hits the y-axis (value of the
outcome when both predictors are
zero)
b1 represents the slope of the side
associated with radio play
b2 represents the slope of the side
associated with advertising budget
This can go on for multiple
dimensions with each of the
predictors defining the shape
13. Simple Linear Regression w/ SPSS
Life Expectancy of Females (dependent variable)
Literacy of country in percent (independent variable)
23. Multiple Linear Regression w/SPSS
Top half of output; notice the multiple variables entered
and the single dependent variable (female life expectancy)
25. Multiple Linear Regression w/SPSS
Literacy is one variable, but it is that specific combination of the
variables that Multiple Linear Regression tests for makes MLR so
powerful
Editor's Notes
With a constant beta zero, but a different beta, the beta gives the direction of the line (up, horizontal, or down). A different beta zero but a constant beta has the lines going the same direction, but on different parts of the graph (different betas).
Each beta tells us about the relationship of the predictor and the outcome.
Click on file> open a data document> We are looking at female life expectancy, which is Female Life Expectancy (column 6) and Literacy Rate of the The Country by column 8
Before beginning any statistical procedure, it is always a good idea to run a scatter plot. First Go to Graphs Legacy dialogs, scatter dot, click define,\\
And add Female Life Expectancy on the Y axis and Literacy rate on the X-axis and click OK.
And there is a robust uphill pattern, with A big group of countries on the right, high literacy rates and high average female expectancy. But since We are doing a numerical running of regression, we might as well run a graphical regression line
Double click onto the graph and you’ll get these windows, click onto the button (see the pink arrow) and you’ll get the line and the r squared .Close the Chart Editor and both windows will close and you’ll get only the actual graph with the line and r-squared.A measure of how well the data fit closely to the line. 75% means that if we know the percentage of the people who can read, you can measureAccurately predict 75 percent of the variants in the female life expectancy.
From this output, we see that there was one dependent variable and independent variable. Model summary how well this particular regression predicts the outcome variable. It has a our squared that we saw earlier , at 75%, rounded up from .747, which is very good. Again, the closer to 1 is better. The analysis of variants also known as ANOVA shows how well the model the slope and the intercept model fits the data… again it is a very good fit… the f value is 313 and the sig value is less than .001.
Coefficients… the constant is the intercept for the regression line… 38.5 if the predictor was zero, the women would have the avg expectancy of 38.5 yearsThe people who read percent SLOPE.. For every percentage point increase of literacy, you can expect a .4 (4/10) of a year increase in women’s life expectancy… pretty large…
For Multiple Linear Regression, we want to look at the association four variables TOGETHER to predict life expectancy for women.Analyze>Regression> Linear format to run a simple multiple linear regression…Now that we have already run a scatter plot on two variables, we don’t have to run another one.We’ll just add the additional independent variables which are literacy rate (which was an independent variable), the GDP, the daily caloric intake, and birth rate per 1000 as predictor variable of female life expectancy and CLICK OK
The typing on top is a command that lists the the specific running of this regression.Variables. .note the multiple variables entered and the single dependent variable (female life expectancy)
The model summary shows these variables predict life expectancy VERY WELL…The capital R is looking at the association of all the variables together. The max value is 1, and this value is .912. When squared, the number is .832 83 percent of the variance in female average life expectancy can be predicted by these four variables. An especially important section is the Coefficients section The constant - or the Y intercept -- when all the predictor variables are zero. The average life span is 43.7 years For each percentage point increase, there is an increase of .226 years in avg life expectancyFor each additional calorie, add .006 to the female average life expectancy. Except for the GDP,, which isn’t no longer significant… this combination only in combination with each other. Probably better to use the entire model to predict life expectancy
While we have seen in simple regression that literacy is a an important variable in female life expectancy.In multiple Linear regression we see that using additional variables can turn out to be key variables in combination with others to help us get closer to the regression line and predict a more accurate outcome.