# Problem Sets

While I encourage you to discuss problem sets with your peers, you must submit your
own answers and not simple rewordings of another’s work. Furthermore, **all
collaborations must be explicitly acknowledged at the top of your submissions**.

### General Information

- Discussions and solutions on each of the Problem Sets can be found
- In master HTML file
- In
`.R`

code format

- The R Markdown Debugging Sheet is here.

### Problem Set 12

**Info:**- Assigned Mon 5/8
- Due Fri 5/12 11:15am

**Homework:**- Complete the work in
`PS-12.Rmd`

.

- Complete the work in
**Learning Goals:**- Simple linear regression viewed through the lens of sampling:
- Confidence intervals
- Hypothesis testing

- Simple linear regression viewed through the lens of sampling:
**Discussion/Solutions**: Can be found in both- The master HTML file.
- R Markdown file specific to PS12
`PS-12_discussion.Rmd`

### Problem Set 11

**Info:**- Assigned Fri 4/28
- Due Fri 5/5 11:15am

**Homework:**- Read the following 538 article: Both Republicans And Democrats Have an Age Problem
- After loading the
`library(fivethirtyeight)`

package and loading the`data(congress_age)`

data, scan over the help file`?congress_age`

. - Complete the work in
`PS-11.Rmd`

.

**Learning Goals:**- Study a confidence interval for something
*other*than the population mean \(\mu\).

- Study a confidence interval for something
**Discussion/Solutions**: Can be found in both- The master HTML file.
- R Markdown file specific to PS11
`PS-11_discussion.Rmd`

### Problem Set 10

**Info:**- Assigned Fri 4/21
- Due Fri 4/28 11:15am

**Homework:**- Reading for Question 2 on the
*Chi-Squared Test*: Chapters 6.3 through 6.3.4 from OpenIntro Statistics: click “Free Download” then read bottom of book page 286 through bottom of book page 293. As you read, try to tie everything to the components of the hypothesis testing framework from the chalk talk from Lec25. - Complete the work in
`PS-10.Rmd`

.

- Reading for Question 2 on the
**Learning Goals:**- Perform a hypothesis test from start to finish.
- Generalize the lady tasting tea to other hypothesis testing situations.

**Discussion/Solutions**: Can be found in both- The master HTML file.
- R Markdown file specific to PS10
`PS-10_discussion.Rmd`

### Problem Set 09

**Info:**- Assigned Mon 4/17
- Due Fri 4/21 11:15am

**Homework:**- Complete the work in
`PS-09.Rmd`

.

- Complete the work in
**Learning Goals:**- Reinforce the topics covered in Midterm II.

**Discussion/Solutions**: Can be found in both- The master HTML file.
- R Markdown file specific to PS09
`PS-09_discussion.Rmd`

### Problem Set 08

**Info:**- Assigned Sun 4/9
- Due Fri 4/14 11:15am

**Homework:**- Complete the work in
`PS-08.Rmd`

.

- Complete the work in
**Learning Goals:**- Study probability using the
`mosaic`

packages sampling and simulation capabilities, instead of using mathematical formulae (reserved for MATH 310 Probability).

- Study probability using the
**Instructions/Hints**- Interpreting probability:
*In general*: one interpretation of “the probability of x occuring” is the proportion of the time “x” occurs across many, many, many attempts.*Example*: “the probability of flipping a coin and getting heads is 0.5 = 50%” can be interpreted as the proportion of heads occuring over many, many, many coin flips being one half.- For the purposes of this problem set, let “many, many, many times” mean 10,000 times.

- As always, I recommend you separate out the
**what**vs the**how**by first sketching out a plan of what you are going to do for each question on paper.

- Interpreting probability:
**Discussion/Solutions**: Can be found in both- The master HTML file.
- R Markdown file specific to PS08
`PS-08_discussion.Rmd`

### Problem Set 07

**Info:**- Assigned Fri 3/24
- Due Fri 4/7 11:15am

**Homework:****Learning Goals:**- More getting used to R Markdown.
- Working on a
**really**substantive data analysis, in the mold of the final project. - Providing
**actionable insight**from data.

**Discussion/Solutions**: Can be found in both- The master HTML file.
- R Markdown file specific to PS07
`PS-07_discussion.Rmd`

### Problem Set 06

**Info:**- Assigned Sat 3/18
- Due Fri 3/24 11:15am

**Homework:**- Complete this feedback survey on the “R Markdown” course on DataCamp from last week.
- Complete the work in
`PS-06.Rmd`

. If you’re having trouble with R Markdown, read this Google Doc first. **Optional:**If you want reinforcement on`dplyr`

, from the “Effective Data Storytelling using the tidyverse” DataCamp course, complete Chapter 7 (Filtering, Grouping, & Summarizing) and Chapter 8 (dplyr Review). Notes:- Two topics,
`%in%`

and`geom_col`

, have not been covered in our course. - If you do complete this course, please help me and Chester out by completing this feedback survey.

- Two topics,

**Learning Goals:**- Perform data wrangling
- Start answering substantive questions with data
- Get familiar with R Markdown

**Discussion/Solutions**: Can be found in both- The master HTML file.
- R Markdown file specific to PS06
`PS-06_discussion.Rmd`

### Problem Set 05

**Info:**- Assigned Sun 3/12
- Due Fri 3/17 11:15am

**Homework:**- Complete the work in
`PS-05.R`

, saving your work in this file as you will be submitting it. - From the DataCamp course “Reporting with R Markdown”, complete the first two Chapters (I anticipate this taking between 90 min and 2 hours):
- Authoring R Markdown Reports. In this Chapter, don’t worry if you don’t fully understand “Section 2: R code for your report.”
- Embedding Code

**Optional:**If you want reinforcement on`ggplot2`

, from the “Effective Data Storytelling using the tidyverse” DataCamp course, complete Chapters- 3: Scatter-plots & Line-graphs
- 4: Histograms & Boxplots
- 5: Barplots

- Complete the work in
**Learning Goals:**- Begin to master
`ggplot`

- Take our first steps with R Markdown

- Begin to master
**Notes**:- For Question 1.b) I accidentally left the code in the problem set. There is nothing to do here.
- For Question 4.b) I gave an example of how to show the data using a
`geom_boxplot()`

. Submit an answer that does it using another`geom`

.

### Problem Set 04

**Info:**- Assigned Mon 3/6
- Due Fri 3/10 11:15am

**Homework:**- Complete this feedback survey on “Chapter 2: Tidy Data” of the “Effective Data Storytelling using the tidyverse” DataCamp course.
*Ethics*:- Listen to Econ Talk podcast interview (time 1h11m) of Cathy O’Neil, author of Weapons of Math Destruction.
- Explain in two paragraphs Cathy O’Neil’s argument of how supposedly objective mathematical/algorithmic models reinforce inequality in the
**two of the three**following contexts:- Crime recidivism
- The thought experiment of hiring in tech firms
- Teacher evaluations

- Save this is a file
`PS-04_Discussion_FirstName_LastName.doc`

or`.txt`

or whatever.

### Problem Set 03

**Info:**- Assigned Sat 2/25
- Due Fri 3/3 11:15am

**Homework:**- Complete this feedback survey on the “Intro to R” and “Intermediate R” courses on DataCamp.
- Complete the work in
`PS-03.R`

, saving your work in this file as you will be submitting it. Standby for the submission format. - Complete Chapter 2: Tidy Data of Effective Data Storytelling using the tidyverse

**Learning Goals:**- Weaning yourselves away from the DataCamp nest and doing your own work in RStudio.
- Baby’s first data analysis!

**Tips**:*Learning to Code*:- Computers are stupid: In order for step C of your code to work, you need to make sure you ran steps A & B first.
- Learning strategy: Tweak existing code in the Learning Checks (go over them!) to suit your ends; don’t code from scratch.

*Working with Data*:- Always look and explore your data first. In our case with the
`View()`

function and/or the`glimpse()`

function from the`dplyr`

package. - Help files are your friend. Most R functions and datasets have
help files. For example, you can access the help file for the
`movies`

data set by typing`?movies`

.

- Always look and explore your data first. In our case with the

### Problem Set 02

**Info:**- Assigned Thu 2/16
- Due Fri 2/24 11:15am

**Homework:**Complete the following three chapters (in this order) from the DataCamp course “Intermediate R”:- Conditionals and Control Flow
- Loops
- Functions

**Learning Goals:**- Getting more experience with the R command line.
- Expanding our toolbox!

**Notes**:- Again, don’t focus on memorizing anything; just get a feel for things.
- If you are feeling lost/overwhelmed, speak to me sooner than later!

### Problem Set 01

**Info:**- Assigned Mon 2/13
- Due Thu 2/16 11am

**Homework:**- Accept email invitation to new assigment on DataCamp.
- Complete the DataCamp course “Introduction to R”.

**Learning Goals:**- Getting familiar with working from command line and the R workflow.
- Learn R-specific terminology.

**Notes**:- Don’t focus on memorizing anything for now, just complete the assignment.
- If you find yourself spinning your wheels, let me know.