Hypothesis Testing Part III

library(dplyr)
library(mosaic)
library(ggplot2)
library(readr)

# Set seed value
set.seed(76)

# Note you will need to change this line to whatever loads the data for you:
grades <- read_csv("../PS/grades.csv")

LC1: Visuzalization

Both the following get the job done, but when using the boxplot:

Pro: You can compare both groups with a single horizontal line
Con: You lose information about the shape of the distribution

ggplot(data=grades, aes(x=even_vs_odd, y=final)) +
  geom_boxplot()

ggplot(data=grades, aes(x=final)) +
  geom_histogram(binwidth = 0.25) + 
  facet_wrap(~even_vs_odd)

No clear “slam-dunk” winner in my opinion! But really? # of letters in your last name?!?

LC2: Setting the Seed Value

When demonstrating that the long code and the wrapper function code do the same thing when randomization is involved, we need to set the seed value to get replicable random results. 76 was an arbitrary choice of seed. Choose your favorite number.

LC3: Perform Hypothesis Test

# The true observed difference in averages i.e. the observed test statistic
observed_diff <- mean(final ~ even_vs_odd, data=grades) %>% diff()

# Simulate the null distribution of the test statistic:
simulations <- do(10000) * mean(final ~ shuffle(even_vs_odd), data=grades)
simulations <- simulations %>%
  as_data_frame() %>% 
  mutate(difference=odd-even)

# Compare what we observed (red) to what happens if we assume no difference:
ggplot(data=simulations , aes(x=difference)) +
  geom_histogram() +
  geom_vline(xintercept = observed_diff, col="red") +
  geom_vline(xintercept = 0, linetype="dashed") +
  labs(x="Difference: Odd Avg - Even Avg")

Observing a difference in means of -0.073 still seems somewhat plausible. Also, not where it is centered: 0 i.e. no difference!

p-Value

The p-value is the probability of observing a test statistic just as or more extreme than the one observed. In our case since \(H_A: \mu_{odd} - \mu_{even} \neq 0\): i.e. there is a difference, we can have either

A more extreme negative difference
A more extreme positive difference

simulations %>%
  mutate(more_extreme_left = difference <= observed_diff) %>% 
  summarise(more_extreme_left = sum(more_extreme_left))

## # A tibble: 1 × 1
##   more_extreme_left
##               <int>
## 1              1248

There are 1248 values less than the observed difference of -0.073. So the p-value is

\[ \frac{2 \times 1248 + 1}{10000 + 1} = 0.250 \] Notes:

Note we doubled the 1248 because we have a two-sided alternative \(H_A: \mu_{odd} - \mu_{even} \neq 0\)
We add 1 to both the numerator and denominator since we also need to account for the observed test statistic itself

Other Alternatives

Say we had \(H_A: \mu_{odd} - \mu_{even} < 0\), more extreme would mean more negative, i.e. more to the left of the observed test statistic, i.e. the p-value would be \[ \frac{1248 + 1}{10000 + 1} = 0.125 \]
Say we had \(H_A: \mu_{odd} - \mu_{even} > 0\), more extreme would mean more positive, i.e. more to the right of the observed test statistic, i.e. the p-value would be \[ \frac{(10000-1248) + 1}{10000 + 1} = 0.875 \]

Crucial Concept: Conclusion

We can only falsify the null hypothesis, never prove that it’s true.

We are not
- Statistically: Saying \(H_0\) is true
- Conceptually: We haven’t proven that odds and evens perform equally well.
Rather, we are
- Statistically: Failing to reject \(H_0\): it might still be false, we just don’t have the evidence here.
- Conceptually: Odds and evens might still perform differently, we just don’t have the evidence here to suggest so.

Analogy: Criminal justice system has two possible verdicts:

Guilty
Not guilty. This is NOT the same as saying the defendent is innocent, but rather they might still be guilty, but we can’t prove beyond a reasonable doubt that they are guilty.