Please Indicate

Instructions/Hints:

Scenario:

You are charged with studying the average age of members of congress (House of Representatives and Senate) at the start of their term for all sessions of Congress between:

In particular, you are interested in answering the following three questions:

  1. What is the average age of all members of Congress between the 80th and 113th sessions? Is it 30?
  2. Comparing just the 80th and the 113th sessions, are members of congress on average younger now (in 2013) or then (in 1947)?
  3. Considering just the 113th session, on average who are younger? Democrats or Republicans?

However, researching birthdays and age at the start of the terms for all 18635 members of congress between the 80th and 113th sessions of Congress is painstakingly boring work. So you decide to compute ages only for a random sample of members of congress instead. Run the following lines of code in your console before answering the three questions:

source("https://rudeboybert.github.io/MATH116/assets/PS/raw_data/sampling.R")
# Change this seed value to be your favorite number:
seed_value <- 76

Question 1: CI for Population Mean

Run these two lines to get a random sample of size 100 from the population of all members of congress between the 80th and 113th sessions.

set.seed(seed_value)
congress <- get_sample_of_congress(n = 100)

a) EDA

Create one plot of exploratory data analysis.

b) 95% CI with n=100

Just as in the LC for Lec31, create a 95% confidence interval for \(\mu\), the true population mean age of all members of congress between the 80th and 113th sessions.

Your Answers:

  • My confidence interval is: ( , )
  • My answer to the first question above is:

c) 95% CI with n=1000

Repeat part b), but now taking a random sample of 1000.

set.seed(seed_value)
congress <- get_sample_of_congress(n = 1000)

Your Answers:

  • My confidence interval is: ( , )
  • My answer to the first question above is:

d) Google Form

After you’ve completed parts b) and c), fill out this Google Form with your two confidence intervals.

Question 2: CI for Difference of Population Means

Run these two lines to get your random sample of

set.seed(seed_value)
congress_80_and_113 <- get_sample_of_congress_80_and_113(n_80 = 40, n_113 = 35)

a) EDA

Create one plot of exploratory data analysis.

b) 95% CI

Create a 95% confidence interval for \(\mu_{80}-\mu_{113}\), the true population difference in means of the average age in the 80th session minus the average age in the 113th session. Hint: The formula to approximate the standard error for a difference of means \(\overline{x}_1 - \overline{x}_2\) is

\[ SE_{\overline{x}_1 - \overline{x}_2} = \sqrt{ \left( \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2} \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } \]

Nasty! Why don’t I just give you code to compute it:

SE <- SE_diff_means(congress_80_and_113, variable="age", group_by_variable = "congress")

Your Answers:

  • My confidence interval is: ( , )
  • My answer to the second question above is:

(BONUS) Question 3: Difference of Population Means

Answer the third question above based on a sample of size 544. Don’t forget to do an EDA.

set.seed(seed_value)
congress_113 <- get_sample_of_congress_113(n_113 = 544)

Your Answers: