Populations, samples, and generalizability

For each of the following 4 scenarios

Identify
- The population of interest and if applicable the population parameter
- The sample used and if applicable the statistic
Comment on the representativeness/generalizability of the results of the sample to the population.

Scenario 1

The Royal Air Force wants to study how resistant their airplanes are to bullets. They study the bullet holes on all the airplanes on the tarmac after an air battle against the Luftwaffe (German Air Force).

Statements about population
- population: ALL Royal Air Force airplanes
- population parameter: It wasn’t explicitly defined here, but imagine some aircraft engineering measure of resistance/strength.
Statements about sample
- sample: Only the airplanes that returned from an air battle
- statistic: The same measure above, but applied only to the returning aircraft:
Representativeness/generalizability:
- The sample suffers from survivor’s bias i.e. only planes that didn’t get shot down are included in your sample. You don’t have information on the more important cases of when planes do get shot down. Wald, an American statistician during World War II, suggested that they reinforce parts of the planes where bullet holes were not present.
- Also, this was a fight only against the German Air Force. Perhaps the Italian and Japanese Air Forces used different bullets, but we don’t have a sample representative of these groups.

Scenario 2

You want to know the average income of Middlebury graduates in the last 10 years. So you get the records of 10 randomly chosen Midd Kids. They all answer and you take the average.

Statements about population
- population: All Middlebury graduates in the last ten years.
- population parameter: The population mean \(\mu\) (greek letter “mu”): their average income of all these graduates.
Statements about sample
- sample: Then 10 chosen Midd Kids
- statistic: The sample mean \(\overline{x} = \frac{1}{n}\sum_{i=1}^n x_i\) of their incomes.
Representativeness/generalizability:
- While the sample size is small (i.e. our estimate won’t be very precise and highly variable), it is still representative (i.e. still accurate). We’ll see that accuracy and precision are different concepts.

Scenario 3

Imagine it’s 1993 i.e. almost all households have landlines. You want to know the average number of people in each household in Middlebury. You randomly pick out 500 phone numbers from the phone book and conduct a phone survey.

Statements about population
- population: All Middlebury households
- population parameter: the population mean \(\mu\): average number of people in a household
Statements about sample
- sample: Of the 500 households chosen, those who answer the phone
- statistic: The sample mean \(\overline{x} = \frac{1}{n}\sum_{i=1}^n x_i\) of the number of people in the households.
Representativeness/generalizability:
- We assumed that all households have landlines, so the real issue of poorer individuals not having phones is not in question here.
- Rather, households with larger numbers of people are more likely to have at least one person at home, and thus someone able to pick up the phone. Our results will be biased towards larger households.
- One way to address this is to keep trying until every house on your list picks up. But especially for single young professionals, this might be very hard to do.

Scenario 4

You want to know the prevalence of illegal downloading of TV shows among Middlebury students. You get the emails of 100 randomly chosen Midd Kids and ask them ``How many times did you download a pirated TV show last week?’’

Statements about population
- population: Current Midd Kids
- population parameter: the population proportion \(p\) of Midd Kids who downloaded a pirated TV show last week.
Statements about sample
- sample: The 100 randomly chosen Midd kids
- statistic: The sample proportion \(\widehat{p}\) of Midd Kids who self-report to have done so
Representativeness/generalizability:
- This study could suffer from volunteer bias, where different people might have different probabilities of willingness to report the truth. Since we are asking Midd Kids to fess up to illegal activity, your results might get skewed.

Populations, samples, and generalizability

Albert Y. Kim

Wed Oct 26, 2016

Scenario 1

Scenario 2

Scenario 3

Scenario 4