While I encourage you to discuss problem sets with your peers, you must submit your own answers and not simple rewordings of another’s work. Furthermore, all collaborations must be explicitly acknowledged at the top of your submissions.

General Information

  • Discussions and solutions on each of the Problem Sets can be found
  • The R Markdown Debugging Sheet is here.




Problem Set 12

  • Info:
    • Assigned Mon 5/8
    • Due Fri 5/12 11:15am
  • Homework:
    1. Complete the work in PS-12.Rmd.
  • Learning Goals:
    1. Simple linear regression viewed through the lens of sampling:
      • Confidence intervals
      • Hypothesis testing
  • Discussion/Solutions: Can be found in both




Problem Set 11

  • Info:
    • Assigned Fri 4/28
    • Due Fri 5/5 11:15am
  • Homework:
    1. Read the following 538 article: Both Republicans And Democrats Have an Age Problem
    2. After loading the library(fivethirtyeight) package and loading the data(congress_age) data, scan over the help file ?congress_age.
    3. Complete the work in PS-11.Rmd.
  • Learning Goals:
    1. Study a confidence interval for something other than the population mean \(\mu\).
  • Discussion/Solutions: Can be found in both




Problem Set 10

  • Info:
    • Assigned Fri 4/21
    • Due Fri 4/28 11:15am
  • Homework:
    1. Reading for Question 2 on the Chi-Squared Test: Chapters 6.3 through 6.3.4 from OpenIntro Statistics: click “Free Download” then read bottom of book page 286 through bottom of book page 293. As you read, try to tie everything to the components of the hypothesis testing framework from the chalk talk from Lec25.
    2. Complete the work in PS-10.Rmd.
  • Learning Goals:
    1. Perform a hypothesis test from start to finish.
    2. Generalize the lady tasting tea to other hypothesis testing situations.
  • Discussion/Solutions: Can be found in both




Problem Set 09

  • Info:
    • Assigned Mon 4/17
    • Due Fri 4/21 11:15am
  • Homework:
    1. Complete the work in PS-09.Rmd.
  • Learning Goals:
    1. Reinforce the topics covered in Midterm II.
  • Discussion/Solutions: Can be found in both




Problem Set 08

  • Info:
    • Assigned Sun 4/9
    • Due Fri 4/14 11:15am
  • Homework:
    1. Complete the work in PS-08.Rmd.
  • Learning Goals:
    • Study probability using the mosaic packages sampling and simulation capabilities, instead of using mathematical formulae (reserved for MATH 310 Probability).
  • Instructions/Hints
    1. Interpreting probability:
      • In general: one interpretation of “the probability of x occuring” is the proportion of the time “x” occurs across many, many, many attempts.
      • Example: “the probability of flipping a coin and getting heads is 0.5 = 50%” can be interpreted as the proportion of heads occuring over many, many, many coin flips being one half.
      • For the purposes of this problem set, let “many, many, many times” mean 10,000 times.
    2. As always, I recommend you separate out the what vs the how by first sketching out a plan of what you are going to do for each question on paper.
  • Discussion/Solutions: Can be found in both




Problem Set 07

  • Info:
    • Assigned Fri 3/24
    • Due Fri 4/7 11:15am
  • Homework:
    1. Read over the slides for Lec18.
    2. Read over the questions in PS-07.Rmd.
    3. Sketch out your plan of attack.
    4. Read over the solutions to Problem Set 06 to make sure you understand the coding part.
    5. The start coding your answers in PS-07.Rmd.
  • Learning Goals:
    • More getting used to R Markdown.
    • Working on a really substantive data analysis, in the mold of the final project.
    • Providing actionable insight from data.
  • Discussion/Solutions: Can be found in both




Problem Set 06

  • Info:
    • Assigned Sat 3/18
    • Due Fri 3/24 11:15am
  • Homework:
    1. Complete this feedback survey on the “R Markdown” course on DataCamp from last week.
    2. Complete the work in PS-06.Rmd. If you’re having trouble with R Markdown, read this Google Doc first.
    3. Optional: If you want reinforcement on dplyr, from the “Effective Data Storytelling using the tidyverse” DataCamp course, complete Chapter 7 (Filtering, Grouping, & Summarizing) and Chapter 8 (dplyr Review). Notes:
      • Two topics, %in% and geom_col, have not been covered in our course.
      • If you do complete this course, please help me and Chester out by completing this feedback survey.
  • Learning Goals:
    1. Perform data wrangling
    2. Start answering substantive questions with data
    3. Get familiar with R Markdown
  • Discussion/Solutions: Can be found in both




Problem Set 05

  • Info:
    • Assigned Sun 3/12
    • Due Fri 3/17 11:15am
  • Homework:
    1. Complete the work in PS-05.R, saving your work in this file as you will be submitting it.
    2. From the DataCamp course “Reporting with R Markdown”, complete the first two Chapters (I anticipate this taking between 90 min and 2 hours):
      1. Authoring R Markdown Reports. In this Chapter, don’t worry if you don’t fully understand “Section 2: R code for your report.”
      2. Embedding Code
    3. Optional: If you want reinforcement on ggplot2, from the “Effective Data Storytelling using the tidyverse” DataCamp course, complete Chapters
      • 3: Scatter-plots & Line-graphs
      • 4: Histograms & Boxplots
      • 5: Barplots
  • Learning Goals:
    • Begin to master ggplot
    • Take our first steps with R Markdown
  • Notes:
    • For Question 1.b) I accidentally left the code in the problem set. There is nothing to do here.
    • For Question 4.b) I gave an example of how to show the data using a geom_boxplot(). Submit an answer that does it using another geom.




Problem Set 04

  • Info:
    • Assigned Mon 3/6
    • Due Fri 3/10 11:15am
  • Homework:
    1. Complete this feedback survey on “Chapter 2: Tidy Data” of the “Effective Data Storytelling using the tidyverse” DataCamp course.
    2. Ethics:
      • Listen to Econ Talk podcast interview (time 1h11m) of Cathy O’Neil, author of Weapons of Math Destruction.
      • Explain in two paragraphs Cathy O’Neil’s argument of how supposedly objective mathematical/algorithmic models reinforce inequality in the two of the three following contexts:
        1. Crime recidivism
        2. The thought experiment of hiring in tech firms
        3. Teacher evaluations
      • Save this is a file PS-04_Discussion_FirstName_LastName.doc or .txt or whatever.




Problem Set 03

  • Info:
    • Assigned Sat 2/25
    • Due Fri 3/3 11:15am
  • Homework:
    1. Complete this feedback survey on the “Intro to R” and “Intermediate R” courses on DataCamp.
    2. Complete the work in PS-03.R, saving your work in this file as you will be submitting it. Standby for the submission format.
    3. Complete Chapter 2: Tidy Data of Effective Data Storytelling using the tidyverse
  • Learning Goals:
    • Weaning yourselves away from the DataCamp nest and doing your own work in RStudio.
    • Baby’s first data analysis!
  • Tips:
    • Learning to Code:
      1. Computers are stupid: In order for step C of your code to work, you need to make sure you ran steps A & B first.
      2. Learning strategy: Tweak existing code in the Learning Checks (go over them!) to suit your ends; don’t code from scratch.
    • Working with Data:
      1. Always look and explore your data first. In our case with the View() function and/or the glimpse() function from the dplyr package.
      2. Help files are your friend. Most R functions and datasets have help files. For example, you can access the help file for the movies data set by typing ?movies.




Problem Set 02

  • Info:
    • Assigned Thu 2/16
    • Due Fri 2/24 11:15am
  • Homework: Complete the following three chapters (in this order) from the DataCamp course “Intermediate R”:
    1. Conditionals and Control Flow
    2. Loops
    3. Functions
  • Learning Goals:
    • Getting more experience with the R command line.
    • Expanding our toolbox!
  • Notes:
    • Again, don’t focus on memorizing anything; just get a feel for things.
    • If you are feeling lost/overwhelmed, speak to me sooner than later!




Problem Set 01

  • Info:
    • Assigned Mon 2/13
    • Due Thu 2/16 11am
  • Homework:
  • Learning Goals:
    • Getting familiar with working from command line and the R workflow.
    • Learn R-specific terminology.
  • Notes:
    • Don’t focus on memorizing anything for now, just complete the assignment.
    • If you find yourself spinning your wheels, let me know.