p-Values

Even vs Odd # of Letters in Last Name

You are given a data set with two variables: 1) even or odd number of letters in last name i.e. two groups and 2) test score. For example, here is an example of what this data would look like, with only 3 rows:

id	num_letters	test_score
1	even	0.7
2	odd	0.6
3	odd	0.8

Think about how you would test:

\(H_0\): no difference in test scores between odd vs even
\(H_A\): there is a difference

Hints:

Think about what assuming \(H_0\) allows you to do the data set
Question from Problem Set 08 Question 1.b)

1) Assuming \(H_0\) Allows You To…

All hypothesis testing assumes the null hypothesis is true. In our case:

We assume no difference in test scores between evens and odds
So for each student, it doesn’t matter if they have even or odd
In other words, the variable num_letters is meaningless
If num_letters is meaningless, then we can permute its values to no consequence

Thus assuming \(H_0\) is true, the above observed data is the same as the following permuted data

id	num_letters	test_score
1	odd	0.7
2	even	0.6
3	odd	0.8

which is the same as the following permuted data

id	num_letters	test_score
1	odd	0.7
2	odd	0.6
3	even	0.8

2) Test Statistic

We need a test statistic:

A statistic is just a summary of many values of numbers to a single value. Ex: sum, mean, min, max, median, etc.
A test statistic is simply a statistic used for the purpose of testing.

In our case

We want to compare test scores for student with even vs odd
Choice of test statistic: the mean test score of odd MINUS the mean test score of even.
Using mathematical notation: \(\mu_E- \mu_O\). More on this notation later.
What is the observed test statistic? \(\overline{x}_E - \overline{x}_o\): the observed difference in means.

3) Null Distribution in General

We need a null distribution: the typical behavior of the test statistic assuming \(H_0\) is true. That way we can say how likely/unlikely the observed test statistic is.

Think back to the Lady Tasting Tea example. 8 correct guesses (red line) is unlikely relative to the typical number correct if she were guessing at random i.e. the null distribution i.e. the bar plot.

4) Null Distribution in our Case

How do we construct the null distribution in our case? Using permutations assuming \(H_0\) is true. What is a synonym for permute?