Load the familiar data again, removing individuals with no listed height

library(mosaic)
library(dplyr)
library(ggplot2)
library(okcupiddata)
data(profiles)
profiles <- profiles %>% 
  filter(!is.na(height))

Learning Checks

  • Let’s pretend we don’t know the true mean mu = 68.3 inches
  • Using the resample() function, take a single sample (without replacement) of size \(n=50\) of 50 OkCupid users’ heights profiles$height. Assign this to an object sample_50.
  • Using the mean(), sd(), and sqrt() functions, compute one confidence interval for mu.
  • Did your net catch the fish?

LC1: A Single Sample

set.seed(76)
sample_50 <- resample(profiles$height, size=50, replace = FALSE)

LC2: A 95% CI

xbar <- mean(sample_50)
s <- sd(sample_50)
n <- length(sample_50)
c(xbar, s, n)
## [1] 68.340000  4.688719 50.000000
c(xbar -2*s/sqrt(n), xbar +2*s/sqrt(n))
## [1] 67.01383 69.66617

Our 95% CI is \[ \left(\overline{x} - 2 SE, \overline{x} + 2 SE\right) =\left(\overline{x} - 2 \frac{s}{\sqrt{n}}, \overline{x} + 2 \frac{s}{\sqrt{n}}\right) = \left(67.01, 69.67\right) \]

Long-Run Performance of CI

Back to theoretical/rhetorical land Let’s repeat the following procedure 100 times:

  1. Take a sample of size \(n=50\)
  2. Compute the 95% CI based on \(\overline{x}\), \(s\), \(\sqrt{n}\)
  3. See if we caught the fish

Here are the (random) results:

Our net missed the fish 3 times! On average, it will miss it 5% of the time.

Correct Interpretation of Original CI

So the correct interpretation of our original 95% confidence interval (67.01, 69.67)

  • IS: the procedure that generated the CI (67.01, 69.67) is 95% reliable. i.e. 95% of the time it will get it right
  • IS NOT: the probability that (67.01, 69.67) contains the true mean height is 95%.
    i.e. the probability is either 1 or 0: either is does or doesn’t.