Confidence Intervals

Mon Nov 21, 2016

Recall: Point of Statistics

Taking a sample in order to infer about the population:

Drawing

Recall: Point of Statistics

Sample statistics estimate unknown population parameters. From last lecture:

Population Parameter	Sample Statistic
Mean \(\mu\)	Sample Mean \(\overline{x}\)
Proportion \(p\)	Sample Proportion \(\widehat{p}\)
Diff of Means \(\mu_1 - \mu_2\)	\(\overline{x}_1 - \overline{x}_2\)
Diff of Proportions \(p_1 - p_2\)	\(\widehat{p}_1 - \widehat{p}_2\)

Example:

You are interested in the avg height of Midd Kids so you take a random sample of 10 students and find an avg of 66 inches.

\(\mu\): true mean height of all 2400 students. You don't know this
\(\overline{x} = 66\): sample mean. This is a point estimate of \(\mu\)

Confidence Intervals

Instead of guessing \(\mu\) with a single value, why not a range of plausible values?

Confidence Intervals

Imagine the \(\mu\) is a fish:

Point Estimate	Confidence Interval

Learning Check

Let's revisit the OkCupid profile data. Run the following in your console:

library(mosaic)
library(dplyr)
library(ggplot2)

library(okcupiddata)
data(profiles)

Learning Check

Discuss with your seatmates what the following code does.
Try varying n. What does this correspond to doing?
How does the histogram change?

n <- 5
samples <- do(10000) * 
  mean(resample(profiles$height, size=n, replace=TRUE))
samples <- samples %>% 
  as_data_frame()

ggplot(samples, aes(x=mean)) +
  geom_histogram(binwidth = 1) +
  xlim(c(40,90))