Taking a sample in order to infer about the population:
Thu Dec 01, 2016
Taking a sample in order to infer about the population:
Imagine the \(\mu\) is a fish:
Sample Mean \(\overline{x}\) | Confidence Interval |
---|---|
Let's focus on the case of taking many, many, many samples of size n=5
and n=50
from the population of OkCupid users:
n <- 5 samples_5 <- do(10000) * mean(resample(profiles$height, size=n, replace=TRUE)) samples_5 <- samples_5 %>% as_data_frame() n <- 50 samples_50 <- do(10000) * mean(resample(profiles$height, size=n, replace=TRUE)) samples_50 <- samples_50 %>% as_data_frame()
We computed
Sample Size | Standard Error |
---|---|
5 |
1.768 |
50 |
0.561 |
Let's construct 95% confidence intervals via two methods:
quantile()
Recall from Lec28 Chalk Talk: quantiles. Run the following in your console:
x <- c(0:12) x quantile(x, prob=c(0.25, 0.5, 0.75))
This is saying
x
values are less than 3x
values are less than 6x
values are less than 9Behold: the normal distribution i.e. the bell curve.
Properties of the Normal Distribution:
Below we have a Normal Distribution with \(\mu=5\) and \(\sigma=2\)…
… Interval \([\mu -2\sigma, \mu +2\sigma] = [5 -2\times 2, 5 +2\times 2] = [1, 9]\) contains 95% of values (in purple).
Construct 95% confidence intervals for \(\mu\), the true average height of all 60K OkCupid users
n=5
and n=50
Hint: Look at the histograms of the 10000 simulations.