Simple (and Simplistic) Introduction to Econometrics and Linear Regression

What is econometrics? Simple, non-technical introduction on Linear Regression/OLS as a technique

About this document… ,[object Object],[object Object]

About this document ,[object Object],[object Object],[object Object]

“Econometrics? Isn’t that difficult?”

It’s full of formulas… and it could be complex

This is an attempt to present econometrics as simple as possible…

What’s required to learn a little bit of econometrics

… confidence in dealing with numbers

… a belief that numbers can tell stories

Let’s start with a little bit of definition What is econometrics?

What is econometrics? ,[object Object],[object Object],[object Object],[object Object]

What is econometrics? ,[object Object],[object Object],[object Object],* … but not necessarily moot and unimportant For those interested about the differences, see future tutorials…

What is econometrics? ,[object Object],[object Object],[object Object],[object Object],[object Object]

What is econometrics? ,[object Object],[object Object]

What is econometrics? ,[object Object],[object Object],We know the values of y and x Econometrics helps us identify the values of m, b and u

If we were interested in awareness and GRPs… ,[object Object],awareness = m • GRPs + b + u NB. This is simplifying the relationship between GRPs and awareness drastically. The relationship is far more complex, of course – but let’s assume that this equation is true for now. What econometrics does is “estimate” the values of “m”, “b” and “u” based on the available data on Awareness and GRPs, such that we have an equation that relates Awareness and GRPs. Once m, b and u are identified and estimated, we can then use the equation to explain the movements in awareness with respect to GRPs – and predict how awareness is going to move in the future given different levels of GRPs

There are many econometric techniques… ,[object Object]

What is linear regression ? ,[object Object]

Introduction to linear regression ,[object Object],[object Object],[object Object],[object Object]

If we plotted the data, we would indeed see an upward trend… Time t, in months Product users ‘000 In the 1 st month, we see that there are about 5’000 product users By the 30 th month, the number of users have increased to about 40’000 users

To answer this question… … we need to understand first the past relationship between the two variables – time and numbers of users . We will then use this understanding of the past to predict what’s going to happen in the next 12 months The Past The Future

What bridges the gap between the past and the future… Once we have identified the equation or the model, we will have a better grasp of (1) the past trends and (2) the potentials of the future Linear regression comes into the picture by bridging that gap between the past and the future The Past The Future Linear regression equation

With that in mind, let’s look at the chart again

From mere observation, we see an uptrend in users across time… Time t, in months Product users ‘000

How do we quantify* that uptrend? Time t, in months Product users ‘000 * Remember: In order to project into the future, we need to create a model that quantifies the relationship between time and number of users

There are an infinite number of lines that we could use to characterize the uptrend… Time t, in months Product users ‘000 Different people have different views – even when viewing the same set of data: I can argue that the best line is the grey line, another can argue that the blue line is best, and still another can argue that the best line is the pink line

Linear regression insists that there is one (and only one) line that would best characterize the trend and the relationship between the two variables

Linear regression also insists that this equation be of the following form: ,[object Object],[object Object],[object Object],[object Object],[object Object]

This one line that best describes the relationship between the two variables is derived through OLS ,[object Object],[object Object],Huh

Let’s go back a few charts… What OLS does is it objectively goes through these infinite number of lines – and finds the best-fitting line such that the distance between the line and the original data-points are at a minimum OLS does this iteratively – that is, through trial-and-error – until it arrives at the values of m, b, and u that define a line with minimum distance between it and the original data. (Think of OLS as a search-algorithm that tries different m-b-u combinations to achieve the best-fitting line.) Remember: Given any data set, there are an infinite number of lines that can be used to describe the trend. One can choose the “pink” to be the best and rationalize it; another person can argue that the yellow line is the best, and still another third person can defend the blue line. We can argue indefinitely about the merits of each of these infinite number of lines.

Going back to the data – the best fitting regression line, after applying OLS is… Time t, in months Product users ‘000

By applying OLS, the equation «y = 1.416x + 3.6329» is found to be the best-fitting regression line ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Now comes the interesting part… ,[object Object]

The story behind «y = 1.416x + 3.6329» ,[object Object],[object Object],[object Object],[object Object],[object Object]

OK, we have an equation – how do we know it’s the correct equation? ,[object Object],[object Object],[object Object],[object Object]

Let’s eyeball the model: There seem to be no data-points that are significantly away from the line… Time t, in months Product users ‘000

Eyeballing the data, however, brings back subjective interpretations Time t, in months Product users ‘000 One can argue that point at month 11 is significantly away from the line – and so is data for month 24… We therefore need a more accurate, more objective measurement of “fit”

How else do we know if the equation is valid or not? ,[object Object],[object Object],[object Object],[object Object],[object Object],The r-squared is only one of few that measure goodness-of-fit (GIF). Other measures include adjusted R-squared, AIC/Akaike Information Criteria, RMSE/root-mean squared error, and GLM-ANOVA. These will not be discussed here.

Will we ever have a r-squared of 1.00? ,[object Object],[object Object],[object Object],[object Object]

But there are deviations between the line and the data! ,[object Object],[object Object]

Deviations are not entirely bad… ,[object Object],[object Object],[object Object]

Let’s go back to the original question:

What have we done so far…? ,[object Object]

Let’s now project what’s going to happen in the next 12 months… Time t, in months Product users ‘000 At the end of the next 12 months [by month 42], we can expect to have 543’000 users – if all things remain equal

Since we don’t really know what’s going to happen in the future – and we don’t have a perfect model… We can report ranges instead of just a line… The dashed lines indicate the range of expectations for the next 12 months We can expect that there will be about 470’000 to 616’000 users by month 42

Linear regression through OLS is just amongst of the many techniques in econometrics… ,[object Object],[object Object],[object Object],[object Object]

Books on econometrics that we’ve found useful… ,[object Object],[object Object],[object Object]

Other books that might be helpful ,[object Object],[object Object]

Credits for the images use ,[object Object],[object Object],[object Object]

This presentation ,[object Object],[object Object]

Simple (and Simplistic) Introduction to Econometrics and Linear Regression

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Simple (and Simplistic) Introduction to Econometrics and Linear Regression

Similar to Simple (and Simplistic) Introduction to Econometrics and Linear Regression (19)

Recently uploaded

Recently uploaded (20)

Simple (and Simplistic) Introduction to Econometrics and Linear Regression

Editor's Notes