Let’s revisit the OkCupid profile data. Run the following in your console:

library(mosaic)
library(dplyr)
library(ggplot2)

library(okcupiddata)
data(profiles)

# Remove individuals with no listed height
profiles <- profiles %>%
filter(!is.na(height))

We take many, many, many samples of size 5 and then take the sample mean:

n <- 5
samples_5 <- do(10000) *
mean(resample(profiles$height, size=n, replace=TRUE)) samples_5 <- samples_5 %>% as_data_frame()  We take many, many, many samples of size 50 and then take the sample mean: n <- 50 samples_50 <- do(10000) * mean(resample(profiles$height, size=n, replace=TRUE))
samples_50 <- samples_50 %>%
as_data_frame() 

## Learning Checks

1. Explicity compute the standard error when taking samples of size n=5 and n=50.
2. Discuss with your peers why they matter in any study that involves some kind of sampling.

#### LC1

In General: The standard error is the standard deviation of the point estimate. In this case, it is the value that quantifies how much the sample means vary by.

In Our Case: If we take a sample of 5 OkCupid users and compute the (sample) mean height, are we going to get the same value each time? No. The SE measures this sample mean varies.

Mathematically: You can derive the standard error mathematically, but this is for a more advanced class in Probability/Statistics. See Advanced section below.

Computationally: It is the standard deviation of our 10000 sample means:

samples_5 %>%
summarise(SE = sd(mean))
## # A tibble: 1 × 1
##         SE
##      <dbl>
## 1 1.803266
samples_50 %>%
summarise(SE = sd(mean))
## # A tibble: 1 × 1
##          SE
##       <dbl>
## 1 0.5730157

Results:

1. The SE with n=50 is smaller i.e.
2. The sample mean $$\overline{x}$$ are less variable when n=50
3. The sample mean $$\overline{x}$$ is more precise when n=50
4. Our estimates are on average better when n=50

Visualization: Recall, the sampling distribution is the distribution of the point estimate. We see that for n=50

• the distribution is narrower i.e.
• it has a smaller standard deviation i.e.
• the standard error is smaller
• Our estimates are on average better when n=50. Bigger sample size is better.