2. Here is an App
that Predicts the
Price Per Hour of
Various Lawyers
City
Firm Size
Partner
Experience Calculate
Regression Analysis
in Legal Procurement
http://tymetrix.com/mobile_apps/
3. Here is an App
that Predicts the
Price Per Hour of
Various Lawyers
City
Firm Size
Partner
Experience
Expected
Hourly
Rate
Regression Analysis
in Legal Procurement
http://tymetrix.com/mobile_apps/
Our
Dependent Variable
(i.e. Y)
Our
Independent Variables
(i.e. X1 ... Xn)
10. Here are the measures:
academic performance of the school (api00),
average class size in kindergarten through 3rd grade (acs_k3)
percentage of students receiving free meals (meals) - which is an
indicator of poverty
percentage of teachers who have full teaching credentials (full)
Multiple Regression Analysis
regression analysis using the variables
api00 as the Y Dependent Variable
acs_k3, meals, full X Independent Variable
18. the three predictors - are they statistically significant and what is the
direction of the relationship?
The average class size (acs_k3, b=-2.68), is not significant (p=0.055), but
only just so.
The coefficient is negative which would indicate that larger class size is
related to lower academic performance -- which is what we would expect.
19. Effect of meals (b=-3.70, p=.000) is significant and its coefficient is
negative indicating that the greater the proportion students receiving
free meals, the lower the academic performance.
The meals variable is highly related to income level and functions more
as a proxy for poverty. Thus, higher levels of poverty are associated
with lower academic performance. This result also makes sense.
20. Finally, the percentage of teachers with full credentials (full, b=0.11,
p=.232) seems to be unrelated to academic performance.
This would seem to indicate that the percentage of teachers with full
credentials is not an important factor in predicting academic
performance -- this result was somewhat unexpected.
22. “We use regression to estimate the unknown effect of changing
one variable over another
regression requires making two assumptions:
1) there is a linear relationship between two variables (i.e. X
and Y)
2) this relationship is additive
(i.e. Y= X1 + X2 + ...+ Xn)
(Note: Additivity applies across terms - as within terms there can be a square,
log, etc.)
Technically, linear regression estimates how much Y changes
when X changes one unit.”
http://dss.princeton.edu/training/
Regression Analysis
23. Example: After controlling by other factors, are SAT scores
higher in states that spend more money on education?*
Outcome (Y) variable = SAT scores --> variable csat in dataset
Predictor (X) variables
• Per Pupil Expenditures Primary & Secondary (expense)
• % HS of graduates taking SAT (percent)
• Median Household Income (income)
• % adults with HS Diploma (high)
• % adults with College Degree (college)
• Region (region)
Regression Analysis
*Source: search for dataset at http://www.duxbury.com/highered/
Use the file states.dta (educational data for the U.S.).
24. Getting Started
Lets Begin by Loading it and Use the Head Command
https://s3.amazonaws.com/KatzCloud/states.dta
26. Getting Started
Lets Start Simple:
We Might Hypothesize a Positive Relationship
As Expenditures Go Up
SAT Performances Also Goes Up
Relationship Between Sat Score and Expenditures?
27. Getting Started
It is Certainly NOT Definitive But a Scatterplot is a good
place to start ...
28. Notice the Nature
of the Relationship
is not what we
would naively
anticipate
Getting Started
It is Certainly NOT Definitive But a Scatterplot is a good
place to start ...
29. Getting Started
It is Certainly NOT Definitive But a Scatterplot is a good
place to start ...
It Appears to be a
N e g a t i v e
Relationship
Notice the Nature
of the Relationship
is not what we
would naively
anticipate
30. Bivariate
Regression
Notice the -.02228 for
expense which is the
slope of the regression
line shown above
w e j u s t fi t t h e
regression line to this
bivariate relationship
31. Bivariate Regression
Y = B0 + ( B1 * (X1) )
csat = 1060.7 - (0.022*expense)
For each one-point increase in expense,
SAT scores decrease by 0.022 points.
32. Bivariate
Regression
Y = B0 + ( B1 * (X1) )
csat = 1060.7 - (0.022*expense)
Look at the
T Stats, P Values
with a Tstat (which is
Z when N>30) of
Greater than 1.96 we
can reject the notion
that the coefficient is
equal to zero
33. A Brief Word about
Standard Errors
N o t i c e t h a t t h e 9 5 %
Confidence Interval is the Beta
Coefficient ~ Plus or Minus
Two Times the Standard Error
The standard error of the estimate tells us the accuracy to expect from our
prediction -- The standard error of a correlation coefficient is used to determine the
confidence intervals around a true correlation of zero.
look at the Standard Error and you can
obtain the 95% Confidence Interval
1060.732 + 2(32.7) = ~1126.4
1060.732 - 2(32.7) = ~ 995.0
34. Daniel Martin Katz
@ computational
computationallegalstudies.com
lexpredict.com
danielmartinkatz.com
illinois tech - chicago kent college of law@