You are given a data set with two variables: 1) even or odd number of letters in last name i.e. two groups and 2) test score. For example, here is an example of what this data would look like, with only 3 rows:
id | num_letters | test_score |
---|---|---|
1 | even | 0.7 |
2 | odd | 0.6 |
3 | odd | 0.8 |
Think about how you would test:
Hints:
All hypothesis testing assumes the null hypothesis is true. In our case:
num_letters
is meaninglessnum_letters
is meaningless, then we can permute its values to no consequenceThus assuming \(H_0\) is true, the above observed data is the same as the following permuted data
id | num_letters | test_score |
---|---|---|
1 | odd | 0.7 |
2 | even | 0.6 |
3 | odd | 0.8 |
which is the same as the following permuted data
id | num_letters | test_score |
---|---|---|
1 | odd | 0.7 |
2 | odd | 0.6 |
3 | even | 0.8 |
We need a test statistic:
In our case
We need a null distribution: the typical behavior of the test statistic assuming \(H_0\) is true. That way we can say how likely/unlikely the observed test statistic is.
Think back to the Lady Tasting Tea example. 8 correct guesses (red line) is unlikely relative to the typical number correct if she were guessing at random i.e. the null distribution i.e. the bar plot.
How do we construct the null distribution in our case? Using permutations assuming \(H_0\) is true. What is a synonym for permute?