Taking a sample in order to infer about the population:
Mon Nov 21, 2016
Taking a sample in order to infer about the population:
Sample statistics estimate unknown population parameters. From last lecture:
Population Parameter | Sample Statistic |
---|---|
Mean \(\mu\) | Sample Mean \(\overline{x}\) |
Proportion \(p\) | Sample Proportion \(\widehat{p}\) |
Diff of Means \(\mu_1 - \mu_2\) | \(\overline{x}_1 - \overline{x}_2\) |
Diff of Proportions \(p_1 - p_2\) | \(\widehat{p}_1 - \widehat{p}_2\) |
You are interested in the avg height of Midd Kids so you take a random sample of 10 students and find an avg of 66 inches.
Instead of guessing \(\mu\) with a single value, why not a range of plausible values?
Imagine the \(\mu\) is a fish:
Point Estimate | Confidence Interval |
---|---|
Let's revisit the OkCupid profile data. Run the following in your console:
library(mosaic) library(dplyr) library(ggplot2) library(okcupiddata) data(profiles)
n
. What does this correspond to doing?n <- 5 samples <- do(10000) * mean(resample(profiles$height, size=n, replace=TRUE)) samples <- samples %>% as_data_frame() ggplot(samples, aes(x=mean)) + geom_histogram(binwidth = 1) + xlim(c(40,90))