Finishing Confidence Intervals

Load the familiar data again, removing individuals with no listed height

library(mosaic)
library(dplyr)
library(ggplot2)
library(okcupiddata)
data(profiles)
profiles <- profiles %>% 
  filter(!is.na(height))

Learning Checks

Let’s pretend we don’t know the true mean mu = 68.3 inches
Using the resample() function, take a single sample (without replacement) of size $n=50$ of 50 OkCupid users’ heights profiles$height. Assign this to an object sample_50.
Using the mean(), sd(), and sqrt() functions, compute one confidence interval for mu.
Did your net catch the fish?

LC1: A Single Sample

set.seed(76)
sample_50 <- resample(profiles$height, size=50, replace = FALSE)

LC2: A 95% CI

xbar <- mean(sample_50)
s <- sd(sample_50)
n <- length(sample_50)
c(xbar, s, n)

## [1] 68.340000  4.688719 50.000000

c(xbar -2*s/sqrt(n), xbar +2*s/sqrt(n))

## [1] 67.01383 69.66617

Our 95% CI is \[ \left(\overline{x} - 2 SE, \overline{x} + 2 SE\right) =\left(\overline{x} - 2 \frac{s}{\sqrt{n}}, \overline{x} + 2 \frac{s}{\sqrt{n}}\right) = \left(67.01, 69.67\right) \]

Long-Run Performance of CI

Back to theoretical/rhetorical land Let’s repeat the following procedure 100 times:

Take a sample of size $n=50$
Compute the 95% CI based on $\overline{x}$, $s$, $\sqrt{n}$
See if we caught the fish

Here are the (random) results:

Our net missed the fish 3 times! On average, it will miss it 5% of the time.

Correct Interpretation of Original CI

So the correct interpretation of our original 95% confidence interval (67.01, 69.67)

IS: the procedure that generated the CI (67.01, 69.67) is 95% reliable. i.e. 95% of the time it will get it right
IS NOT: the probability that (67.01, 69.67) contains the true mean height is 95%.
i.e. the probability is either 1 or 0: either is does or doesn’t.