R-Squared
Categories: Metrics
R-squared is a measure of the percentage of change in one variable due exclusively to changes in another variable. Accidentally back into a car in the lot and need someone to blame it on? Late for work and need a scapegoat the boss will believe? R-squared is here to find you a patsy to take the blame for pretty much anything you can dream up.
While r-squared won’t take the fall for you, it will find the fall guy best suited for that job. Let’s say we looked at a set of bivariate (two variable) data that compares the top speed of a remote control car to the size of the tires on the car, which ends up having an r-squared value of 0.79. That means 79% of the changes in the top speed of the car are due to changes in the size of the tires. More simply, the changes in tire size are the primary cause for changes in the top speed. Other factors do affect top speed, like wind resistance, battery power, etc., but changes in tire speed are the primary cause for changes in top speed.
We need to be careful not to say that tire size is the primary cause of top speed. R-squared doesn’t tell us that. It just tells us that changes in the tire size are the primary cause for changes in the top speed. The difference is subtle but hugely important.
Let’s walk through how we might really work with r-squared from beginning to end. We’ve got two variables, like the daily price of a gallon of gasoline and the average number of gallons purchased per customer on that same day. It’s not unreasonable to think that changes in the price of gas are a factor in how much gas people buy. But how much of a factor?
R-squared will tell us how much of the changes in how much people pump into their tanks is due to changes in the price of gas, and how much of those changes in the amount of gas purchased is due to other factors, like length of trip, amount of money on hand, how loud the kids are screaming in the back, butterflies flapping their wings in Indonesia, etc.
A few important safety tips. We should only find r-squared for data that have a linear-ish pattern. We can find r-squared by hand, but that’s a sign of insanity, so use tech to do the grunt work of the actual calculations. R-squared is the percentage of change in one variable that is due strictly to changes in another variable.
If we square root r-squared, we get the correlation coefficient, r.
And finally, never, ever, ever suggest that 99% of the changes in a policeman's weight is due to changes in donut consumption. That last one's a freebie.
Related or Semi-related Video
Finance: What is r-squared?0 Views
Finance allah shmoop what is r squared r squared It's
a measure of the personage of change in one variable
Do exclusively to changes in another variable Accidentally back into
a car in the lot and need someone to blame
it on late for work and need a scapegoat The
boss will believe well r squared is here to find
you up patsy To take the blame for well pretty
much anything you can dream up well r squared won't
take the fall for you It will find the fall
guy best suited for that job So let's say we
looked at a set of by various or to variable
data that compares the top speed of remote control cars
to the size of the tires on the car which
end up having in our scored value of point seven
nine That means seventy nine percent of the changes in
the top speed of the car were due to changes
in the size of the tires More simply the changes
entire side are the primary cause then in changes in
the top speed and you'd say in normal english the
r squared between larger tires and faster speed was high
like point eight Other factors affect top speed like wind
resistance battery power and so on But change his entire
speech of the primary cause for changes in top speed
We need to be careful not to say that tire
size is the primary cause of top speed See that
was a little error there They just fill in You
are square doesn't tell us that It just tells us
that changes in the tire side are the primary cause
for changes in the top speed meaning they're just related
We don't know that one causes the other and the
difference is subtle But while hugely important here So let's
walk through how we might really work with r squared
from beginning to end in a problem What We've got
two variables like the daily price of a gallon gasoline
and the average number of gallons purchased per customer on
that same day It's not unreasonable to think that changes
in the price of gas or a factor in how
much gas people buy but how much of a factor
like a gallon of gasoline costs You know eighteen dollars
right there pete Fewer people buying instead of a god
three bucks So r squared will tell us how much
of the changes in how much people pump into their
tank is due to changes in the price of gas
And how much of those changes in the amount of
gas purchased is due to other factors like time of
season or the length of trip there taking or the
amount of money they happen to have on hand or
how loud the kids are screaming Are we there yet
Are we there yet in the back seat Yeah okay
well because we knew it would come in handy We
collected data from our local gas and sip on seven
different days Calculations like the coefficient of determination are squared
and or the correlation coefficient are should only be attempted
on data that has a linear ish shape It's always
a good idea to whip up a scatter plot of
the data just to make sure it's not obviously curved
or has some other weird non linear pattern that we
can't then generalize from Well the pattern here is linear
enough and doesn't show an obvious curve or other pattern
So we're good to go and we can calculate r
squared by hand but almost nobody does Even for a
very small data sets graphing calculator spreadsheets and web sites
dedicated to finding our and only r and r squared
Well all do a dandy job of getting us the
values we want So let's do that well after popping
this data into our jailbroken diamond plated solar powered voice
activated t I eighty four plus i'd any um edition
We get in our squared value of point one three
five eight right there What does that mean for us
And for our gas problem Well since r squared is
the percentage of change in the uae variable that is
do strictly to changes in the x variable It means
that on ly thirteen point five eight percent of changes
in the average amount of gas purchase are due to
changes in the price of gas It also means that
eighty six point four two percent of the changes in
the amount of gas purchase are due to other factors
like changes in how much money people have on hand
It could also have to do with well changes in
how far they're planning on driving That day could mean
many other things but it doesn't For now we're just
focused on the numbers And again we need to be
very careful not to claim that r squared number tells
us how much of a percentage cause one variable to
do another thing that's a no no r squared is
always an on ly the percentage of changes in one
variable do ooh to changes in the other Also weaken
Take r squared out back behind the wood shed and
square roots The mess out of it to get the
correlation coefficient are got it Okay Some important safety tips
We should only find r squared for data that have
a linear ish pattern We confined our squared by hand
but that's a sign of insanity So used tech to
do the grunt work for the actual calculations Well our
scores the percentage of change in one variable that is
do strictly to changes in another variable if we square
root are squared while we get the correlation coefficient are
positive number There never never ever suggest that ninety nine
percent of the changes in a police person's weight is
due to changes in doughnut consumption either just a extra 00:04:39.283 --> [endTime] free warning there from your friends at shmoop