You know that one friend in your group who never has a beef with anyone, who always remembers little details about everyone’s lives, who always has a kind word, and even volunteers to help you move?
A regression line is kinda like that for a set of linear-ish data, always trying to stay as close to every data point as possible, no matter how far away some data points try to get. In fact, it’s the one “line of best fit” that minimizes the distances between the data points and the line.
It’s important to remember that we only find regression lines for data that is probably linear. We say “probably,” because there’s no way to be sure that a data set must be linear...only a bunch of circumstantial evidence that it might be linear.
Polly, the CEO of the world-renowned plant distributor “Polly’s Pretty Plants,” sells vegetable plants she has grown in her vast network of greenhouses. Okay, you got us. She works out of her mom’s basement, and she’s 13. Still, Polly is a maven for experimentation and statistical data gathering. So much so that she's gathered data on the different amounts of her special fertilizer/water mixture given to several plants from the same packet of tomato plant seeds all planted in soil from the same spot in her mom’s backyard.
Polly would like to be able to predict the amount of growth a seed will experience based on how much fertilizer she puts in the water. The whole point of the regression line is to allow her to do this...at least to a certain degree of accuracy. Regression lines give predictions, not guarantees.
Regression lines are the best fit lines to a set of data with a linear pattern. In this case, the phrase “best fit” means the line reduces the vertical distance between the points and the best fit line to as small as possible. We can find the slope and intercept of the regression line using the formulas, or we can just use tech to do everything for us.
And we can use the regression equation to help us predict either x or y values, with the expectation that the real result will probably be close to the value predicted by the regression equation.
Now we just need some regression to help us figure out if we were Albert Einstein or Fred Astaire in a previous life.
Related or Semi-related Video
Finance: What is a regression line?2 Views
Finance allah shmoop what is a regression line You know
that one friend in your group who never has a
beef with anyone who always remembers little details about everyone's
lives who always has a kind word and even volunteers
that help you move All right A regression line is
kind of like that for a set of linear ish
data always trying to stay as close to every data
point as possible no matter how far away some data
points try to get In fact it's the one line
of best fit that minimizes the distances between the data
points and the line That's the regression line Before we
get too carried away with the ins and outs of
finding these elusive regression lines it's important to remember that
we only find regression lines for data that is probably
linear We safe probably because there's no way to be
sure that a data set must be linear only a
bunch of circumstantial evidence that it might be linear There
are several bits that all work together to help us
feel okay with saying data are linear ish but probably
the most important and the only one we really need
to worry about Here in finance land is a linear
pattern in the scatter plot Okay Example Here we go
Polly the ceo of the world renowned plant distributor polly's
pretty plants sells vegetable plants She has grown in her
vast network of greenhouses Okay you got us She works
out of her mom's basement and she's thirteen Still polly
is a maven for experimentation in statistical data gathering So
much so that she has gathered the following data on
the different amounts of her special fertilizer water mixture given
to several plants from the same packet of tomato plant
seeds all planted in soil from the same spot in
her mom's backyard Well polly would like to be able
to predict the amount of growth a seed will experience
based on how much fertilizers puts in the water The
whole point of the regression line is to allow her
to do this at least to a certain degree of
accuracy Right Like that's we're getting it but first she
needs to know if the data points are linear okay
so polly quickly whips up a scatter plot that's what
thirteen year old girls d'oh isn't it and see what
appears to be a roughly believing your pattern right Well
now we could draw lines on that scatter plot until
the cows come home But only one is the line
of best fit or the line that gets a smallest
bunch of distances from each data point possible All right
well what would this regression line look like Okay well
as an eyeball on lee approach a line of best
fit does not have to hit any of the data
points but it should follow the slant or slope of
the data and tries to split the points so that
the distance is straight up from the line to The
points are balanced out by distances straight down from the
line to the point Unless you start with line any
line and it goes up from left to right All
right this first line does nothing right We usually try
to fix the slow first and what we need to
make the slope less steep In other words decrease the
slow period little by little until we have a pretty
good match to the slope of the data That last
line looks like it's pretty close to the slant of
the data And again this is just an eyeball approach
so yeah it won't be perfect We've got the slow
part pretty locked in But we don't have the split
The data points with equal distances thing going for us
yet we need to move the whole line straight up
until it kind of splits The data points a little
by little until we can get a good split There
don't have to be equal numbers of points above and
below It's more about equal distances So we have three
points all about the same medium ish distance above with
a very close point a medium close point and a
kind of farpoint below Yeah down there s o The
total distance above is about equal to the total distance
below What We could find the equation of the eyeball
line but it definitely isn't the perfect best fit line
It's just close So how do we find the equation
by hand Well the formula for the slope of the
regression line is m equals The correlation coefficient are times
the standard deviation of the wide data s sub y
divided by the standard deviation of the ecs data s
sub x right We typically don't find the correlation coefficient
or the standard deviations by hand especially since they're super
duper easy to get via technology like a graphing calculator
spreadsheet our website But well you can check out some
of our other videos of you really want to gold
if we pop polly's data from before into a t
I graphing calculator can run a lillian wreg when your
aggression to get pr value Then run one of our
stats on both the x and y data to get
standard deviations But we get the correlation Coefficient are to
be zero point eight six five four in santa deviation
of the ecs data s sub x to be one
point eight seven Oh wait right there and stare a
deviation of the white data s Why to be two
point seven Oh five three Easy Okay that in turn
gives a slope of our regression line of point eight
six five Four times two point seven oh five three
divided by one point eight seven away which is approximately
one point two five one four So we did all
the math there for you What We're still missing the
y intercept So how do we find that for polly
Wealth The formula for the y intercept of the regression
line b is found by taking the mean of the
wide data Why bar Minus the product of the slope
em And we just calculated in the mean of the
ecs data x bar again we usually get the two
means x bar And why bar using tech This is
the twenty first century after all People using the same
one var stats from before on each data set using
our tricked out diamond plate covered voice activated t i
graphing count Well that gives us exper equal to four
point five and wiebe are equaled a fourteen point two
six six seven to go along with our slope of
one point two five one four Well r y intercept
then is fourteen point two six six seven minds one
point two five one four times four point five which
is approximately eight point six three five for a lot
of numbers at you don't We've thrown them Yeah but
what's our regression equation than wealth We jam the slope
And why intercept into slope intercept form that y equals
mx plus b thing and we get y equals one
point two five one four Acts plussed And that's b
8 point six whatever well plotted on the original scatter
plot This is how that line looks right there Finding
the slope and white intercept using the formulas or find
again twenty first century people Come on ask your parents
No one except maybe old man hostetler who still thinks
calculators or a tool of the devil Does any of
this by hand And what any of you be really
mad if we told you that we already had the
equation on a screen way back at the beginning of
the by hand calculations Yeah we knew you'd be be
cool about it Seems that when we found the r
value we also had the slope And why intercepted the
regression line staring us right in the face just above
the are there see that value in the a equals
ro that's the slope of the regression line See the
b equals ro Yeah that's the why intercept the fact
that the values are a wee bit different than ours
Like poor decimal places in That's just a rounding here
so polly's got a regression line What can she do
with it Well she could use it to predict likely
growth of another of those same variety of seeds based
on how much fertilizer she adds And let's say probably
uses a five point five cubic centimeters of fertilizer How
tall might her plant be What we just plug five
point five in for acts in the regression equation giving
us white schools one point two five one four times
five point five plus eight point six three five four
which is fifteen point five and change centimeters She could
expect the plan to be about fifteen point five plus
centimeters tall after the same time period passes and probably
won't be exactly that value But it should at least
be in the neighborhood Regression lines give predictions not guarantees
So to recap regression lines are the best fit lines
to a set of data with linear pattern In this
case the phrase best If it means the line reduces
the vertical distance between the points and the best fit
line to as small as possible we can find the
slope and intercept of the regression line using the formula
or we could just use tact to do everything for
us And we can use the regression equation to help
us predict either x or y values with the expectation
that the real result will probably be close to the
value predicted by the regression equation Now we just need
some regression to help us figure out if we were 00:07:39.223 --> [endTime] albert einstein or fred flintstone in a previous life