Two-Tailed Test

  

Categories: Metrics

We’ve all heard some talking head on TV spout some statistic we feel just has to be wrong. Stuff like, “75% of small businesses fail within the first year”...which obviously can’t be correct, because that seems like a crazy-high percentage.

A two-tailed test is a way to determine if there’s actually evidence to support the idea that the value is something other than the 75% that the talking head quoted.

Two-tailed tests are a specific kind of hypothesis test that we run when we believe that a proportion (percentage) or mean (average) or some other value we’ve heard or read about has a value different from what we’ve been quoted. We take a random sample, calculate a statistic from that sample, run it through a couple of formulas, and come up with a probability that indicates how likely it is that the actual value is different from the quoted one. Most people use a graphing calculator, spreadsheet, or website to do all the mathematical heavy lifting.

Related or Semi-related Video

Finance: What is the standard normal dis...6 Views

00:00

And finance Allah shmoop What is the standard normal distribution

00:08

Senate Normal distribution is the destruction of the Z Scores

00:11

of the data points from a normal distribution Okay but

00:15

why do we need to create a new normal distribution

00:19

like the new normal Isn't that a thing Wasn't the

00:21

normal distribution we already had good enough before We explain

00:24

why the standard normal distribution is such a huge improvement

00:27

on the plain old normal distribution but we need a

00:30

quick recap of the original A normal distribution or normal

00:34

curve is a continuous bell shaped distribution that follows the

00:37

empirical rule which says that sixty eight percent of the

00:40

data is between negative one and one Standard deviations on

00:43

either side of the mean ninety five percent of the

00:45

data is between negative two and two Standard deviations on

00:48

either side of the mean and ninety nine point seven

00:51

percent of the data is between negative three and three

00:53

Standard deviations on either side of the mean well the

00:56

regular normal curve has its peak located at the mean

00:59

Ex Bar and is marked off in units of the

01:01

standard deviation s right there That's what it looks like

01:04

Adding the standard deviation over and over to the right

01:06

and subtracting the standard deviation over and over to the

01:09

left But what makes it normal The fact that sixty

01:12

eight percent of all the data is between one standard

01:14

deviation on each side of the means that makes it

01:17

normal It's that sixty eight percent truism that makes it

01:20

a normal distribution Then ninety five percent of the data

01:23

is between two standard deviations on either side of the

01:25

mean That's another test for normalcy And ninety nine point

01:28

seven percent of the data is between the three Senate

01:30

aviation's on either side Another test That's a third test

01:33

You passed all three your normal well tons of things

01:35

in nature and from manufacturing and lots of other scenarios

01:38

are normally distributed like heights of adult males or weights

01:42

of snicker bars or the diameter of drink cup lids

01:46

or eleventy million other things Okay fun size Snickers have

01:50

a mean weight of twenty point Oh five grams of

01:52

the standard deviation of point seven two grams and the

01:55

weights are normally distributed What that gives us this distribution

01:58

of fun size Snickers Wait it's the height of the

02:00

graph At any point it's the likelihood of us getting

02:02

a candy bar of that specific weight dire the curve

02:04

at a point the greater the chance we get the

02:06

exact weight This means that the fun size snickers wait

02:09

we'll get the most often is that twenty point Oh

02:12

five grams size that is smack dab in the middle

02:14

Right there waits larger and smaller than that will be

02:17

less common in our Halloween candy haul Waits like seventeen

02:21

point eight nine grams are twenty two point two one

02:24

grams will be extremely rare because there's shofar from the

02:27

middle and are at a part of the curve where

02:29

we have a very small likelihood of getting those weights

02:32

So why should we even mess with the normal distribution

02:34

we already have by calculating Z scores to create a

02:37

standard normal distribution And well what the heck is a

02:39

Z score Anyway We'll answer the first question in just

02:42

a sec but a Z scores of value we calculate

02:45

that tells us exactly how far a specific data point

02:48

is from the mean measured in units of standard deviation

02:51

Z scores were a way to get an idea for

02:53

how larger small a data point is compared to all

02:56

the other data points in the distribution It's like getting

02:59

a measure of how fast a Formula One racecar is

03:02

compared not to regular beaters on the road but two

03:05

other Formula One race cars the Formula One cars obviously

03:08

faster than the Shmoop mobile here But is it faster

03:12

than other Formula One cars That's what really matters A

03:15

Z score will tell us effectively where that one Formula

03:18

One car ranks compared to all the other ones we

03:20

can speed test If it's got a large positive Z

03:23

score it's faster than many if not most of the

03:26

cars It has a Z score close to zero Well

03:28

then it's right in the middle The pack speed wise

03:30

If it's got a small negative Z score well it's

03:32

the turtle to the other cars Hairs Why would we

03:35

plot the Z scores instead of the scores themselves Well

03:38

because the process of standardizing or calculating the plotting of

03:41

the Z scores of the data points makes any work

03:44

we need to do with the distribution about ten thousand

03:46

times easier When we calculated plot the Z scores we

03:50

create a distribution that doesn't care anything about the context

03:53

of the problem or about the individual means or standard

03:56

deviations or whatever Effectively we create one single distribution that

04:01

works equally well for heights of people or weights of

04:04

candy bars or diameters of drink lids or lengths of

04:08

ring tailed Leamer taels If we don't standardize by working

04:12

with Z scores we must create a normal curve that

04:14

has different numbers for each different scenario And we have

04:17

to do new calculations for each scenario for each different

04:21

set of values So let's explore the important features of

04:24

the standard normal distribution and how it differs from all

04:27

the other regular normal distributions The standard normal curve and

04:31

the regular normal curve look identical in shape They just

04:36

differ in how the X axis this thing right here

04:38

is divided Let's walk through an example where we compare

04:41

how the normal distribution of the actual data and the

04:43

standard normal distribution for the sea Scores of the data

04:46

are created at the same time Okay What are we

04:48

gonna pick here Well let's pick narwhal tusks They're very

04:52

close to normal in their distribution with a mean length

04:55

of two point seven five meters and standard deviation of

04:57

point to three meters The regular normal distribution of Narwhal

05:01

Tusk links are narwhal distribution is that I think we'll

05:05

have the peak located above the mean of two point

05:07

seven five meters We'll need the Z score of a

05:09

data point representing the length of two point seven five

05:12

to start labeling the standard normal distribution the same way

05:15

we'll Z scores were found by subtracting the mean from

05:18

a data point and dividing that value by the standard

05:20

deviation of the data To find a Z score we

05:23

subtract the mean two point seven five from our data

05:25

point also two point seven five to get zero And

05:28

then we divide that by the standard deviation of point

05:30

two three while we get a Z score for that

05:32

middle value of zero Here's the same normal curve of

05:35

the Tusk clanks paired with the standard normal curve of

05:38

the Z scores Now for the tick marks on the

05:40

straight up Tusk link distribution Right there we add the

05:43

standard deviation of point two three three times to the

05:46

mean of two point seven five to get the tick

05:49

marks to the right of the meanwhile we just get

05:51

was that two point nine eight and then three point

05:53

two ones were adding point to three to it And

05:55

then another point that gets us three point four four

05:57

There we go and we repeat that procedure on the

06:00

left but subtracted three times So we get to point

06:02

five to two point two nine And then what is

06:05

that two point Oh six on the left Well to

06:07

get these same values on our standard normal curve we

06:10

need to find some more Z scores The first score

06:13

of the right of the mean is that a value

06:14

two point nine eight meters It Z score will be

06:16

found by taking two point nine eight and subtracting the

06:19

mean of two point seven five to get that point

06:20

to three and then dividing that by the standard deviation

06:23

of point two three while we get one See that's

06:25

kind of a little mini proof there The second take

06:28

mark to the right will be for data points at

06:30

three point two one meters Well when we subtract the

06:32

mean we get point four six which we divide by

06:35

point two three and get Z equals two and the

06:37

third take mark their works out similarly gets a C

06:40

equals three See there it is Things will work out

06:42

similarly but negatively on the other side on the laughed

06:44

when we do the same thing for tick marks Negative

06:47

one negative too And then there we go Negative three

06:50

Well let's look at the two curves together One is

06:52

specific to the data of narwhal Tusk flanks while the

06:55

other is standardized to represent the perfect normal curve usable

06:59

for all normal data regardless of context or the values

07:02

of the means or standard deviations So after standardizing does

07:07

the standard normal curve follow the empirical rule Yeah it's

07:11

a normal curve After all it's even in the name

07:14

standard normal curve See they kind of tipped me off

07:17

to those things They're still sixty eight percent of data

07:19

points between Negative one and one on the standard normal

07:21

curve There's still ninety five percent of the data pretty

07:23

negative two and two on the standard normal curve And

07:26

there's still ninety nine point seven ten of the day

07:27

to pretty negative three and three on standard normal curve

07:30

so getting back to the ten thousand times easier thing

07:33

Well it comes in when we try to answer questions

07:36

like how many of the gummy coded pretzel logs weigh

07:40

between twelve and fifteen grams So here's the set up

07:43

Gummy coated pretzel log weights are normally distributed with a

07:47

mean of thirteen point two grams and a Sarah deviation

07:50

of point seven eight grams We want to know what

07:52

percentage of pretzel logs that come out of the gummy

07:55

bear coding machine way between twelve and fifteen grams which

07:58

the company considers their ideal weight range and likely that

08:01

customers wouldn't complain and send them back for being too

08:04

little or too big If we don't standardize things by

08:06

finding the Z scores of our boundary values of twelve

08:09

and fifteen grand we'll need some kind of technology to

08:11

interpret our mean standard deviation and boundary values in terms

08:15

of the normal curve specific to this situation If we

08:17

change anything about the problem like the boundary values or

08:21

mean or standard deviation well then we'll have to re

08:24

input all the new data and start completely over And

08:27

that would suck On the other hand since we know

08:29

that data are already normally distributed While we can simply

08:33

standardize the two boundary values by calculating their Z scores

08:36

and use the majesty of the Z table this thing

08:39

to answer our questions which is a table telling us

08:42

what percentage of data lies to the left or right

08:45

of an easy score across the whole standard normal distribution

08:49

Many lives were lost and billions of dollars were spent

08:52

Teo build this thing so you know you gotta respect

08:54

it not to put too fine a point on it

08:56

but if we don't standardize dizzy scores we need to

08:58

use a unique normal curve and unique calculations every single

09:02

time we work with those situations But if we do

09:05

standardized to Z scores we just need to check the

09:07

one table for every situation It's like choosing to go

09:10

to a different store every time we need a different

09:13

product or going toe one store that has all of

09:15

them in one place like you'd rather go to Safeway

09:18

than just the broccoli store and then the egg store

09:21

and then the milk store right So let's calculate our

09:23

two Z scores for our boundary values and then check

09:26

the Z Table to get our percentage of pretzel logs

09:28

in the sweet spot that twelve to fifteen range thing

09:31

What will take first data point twelve and subtract the

09:33

mean weight of thirteen point to giving us negative one

09:36

point two grams and then divide that by the standard

09:38

deviation of point seven eight which gives us a Z

09:40

score there of negative one point five three eight Then

09:42

we'll take the second data point fifteen subtract that mean

09:45

of thirteen point two to get one point eight then

09:47

divide that value by our standard deviation of point seven

09:50

eight to get his E score of two point three

09:51

eight Well there are two different kinds of ze tables

09:54

One shows the area to the left of a specific

09:57

Z score The other shows the area to the right

10:00

They both give the same info just so we'll use

10:03

a left ze table A Siri's of Z scores accurate

10:07

to the tense place runs down the left hand side

10:09

and the hundreds place for each of those e scores

10:11

runs across the top Well the percentage of data to

10:14

the left of a specific Z score can be found

10:16

at the intersection of a row and a column bullied

10:18

around both our Z scores to the hundreds Place negative

10:21

one point five four and then two point three one

10:24

respectively in order to locate a percentage of data to

10:27

the left of each one Well we'll go down to

10:29

the negative one point five row then across to the

10:32

column here headed by the negative zero point zero four

10:35

where negative one point five Avenue intersects with negative zero

10:38

point zero four street and we find a percentage of

10:41

data to the left of Z equals negative one point

10:44

five four of zero point zero six one seven eight

10:48

This thing Well well then head way down to the

10:51

two point three boulevard then across to the point zero

10:53

one road they cross at point nine eight nine five

10:57

six So now what What do we do with these

10:59

Two percentage is well glad you asked We know the

11:01

percentage of data to the left of our fifteen grand

11:03

upper boundary Which is that a Z score of two

11:06

point three one We also know the area to the

11:08

left of our twelve Graham lower boundary at a Z

11:10

score of negative one point five four announced time to

11:13

merge those two areas Check the area to the left

11:16

of the Z score of two point three one on

11:18

the standard normal curve This is the percentage of data

11:20

to the left of that value Now check the area

11:23

to the left of it Z score of negative one

11:25

point five four on the same standard normal curve Well

11:28

this is the percentage of data to the left of

11:30

that value If we cut away the area to the

11:32

left of Z equals negative one point five four or

11:35

left with the area here between Z equals negative one

11:38

point five for ends e equals two point three one

11:40

This is the percentage of data between these two values

11:44

and you're looking at this really heavily to be sure

11:46

that you got enough in that general sweet spot range

11:49

They don't get a whole lot of returns from angry

11:50

customers Well we just need to subtract the point Oh

11:53

six one seven eight from the point nine eight nine

11:55

five six to get the percentage of data between those

11:57

two values which is yes about ninety three percent so

12:01

What does that mean Well that means ninety three percent

12:03

of the gum encoded pretzel logs produced will be between

12:06

twelve and fifteen grams in weight And that's either good

12:08

news or not Well a couple of important safety tips

12:12

though Before you all head out to the store for

12:14

some more gumming coded pretzel log We should on Lee

12:16

try to standardize I'ii do things with Z scores if

12:19

the data are normal in shape to begin with If

12:22

they're not the data Maki nations here will be useless

12:24

to you Make sure you're paying attention to what kind

12:26

of ze table you have again Some show areas to

12:29

the left while others give areas to the right and

12:32

specific Z scores Every time you've got a set of

12:35

normally distributed data you should standardize the situation by finding

12:39

Z scores And while you'll save yourself a ton of

12:42

work in the long run what least tons of stats

12:44

work if we can't help you Sorry I do

Up Next

Finance: What is the normal distribution/normal curve?
3 Views

What is the normal distribution/normal curve? The normal distribution or normal curve is when data transposed into a graph shows a fairly strong ad...

Find other enlightening terms in Shmoop Finance Genius Bar(f)