Problem Sets
While I encourage you to discuss problem sets with your peers, you must submit your own answers and not simple rewordings of another’s work. Furthermore, all collaborations must be explicitly acknowledged at the top of your submissions.
General Information
- Discussions and solutions on each of the Problem Sets can be found
- In master HTML file
- In
.R
code format
- The R Markdown Debugging Sheet is here.
Problem Set 12
- Info:
- Assigned Mon 5/8
- Due Fri 5/12 11:15am
- Homework:
- Complete the work in
PS-12.Rmd
.
- Complete the work in
- Learning Goals:
- Simple linear regression viewed through the lens of sampling:
- Confidence intervals
- Hypothesis testing
- Simple linear regression viewed through the lens of sampling:
- Discussion/Solutions: Can be found in both
- The master HTML file.
- R Markdown file specific to PS12
PS-12_discussion.Rmd
Problem Set 11
- Info:
- Assigned Fri 4/28
- Due Fri 5/5 11:15am
- Homework:
- Read the following 538 article: Both Republicans And Democrats Have an Age Problem
- After loading the
library(fivethirtyeight)
package and loading thedata(congress_age)
data, scan over the help file?congress_age
. - Complete the work in
PS-11.Rmd
.
- Learning Goals:
- Study a confidence interval for something other than the population mean \(\mu\).
- Discussion/Solutions: Can be found in both
- The master HTML file.
- R Markdown file specific to PS11
PS-11_discussion.Rmd
Problem Set 10
- Info:
- Assigned Fri 4/21
- Due Fri 4/28 11:15am
- Homework:
- Reading for Question 2 on the Chi-Squared Test: Chapters 6.3 through 6.3.4 from OpenIntro Statistics: click “Free Download” then read bottom of book page 286 through bottom of book page 293. As you read, try to tie everything to the components of the hypothesis testing framework from the chalk talk from Lec25.
- Complete the work in
PS-10.Rmd
.
- Learning Goals:
- Perform a hypothesis test from start to finish.
- Generalize the lady tasting tea to other hypothesis testing situations.
- Discussion/Solutions: Can be found in both
- The master HTML file.
- R Markdown file specific to PS10
PS-10_discussion.Rmd
Problem Set 09
- Info:
- Assigned Mon 4/17
- Due Fri 4/21 11:15am
- Homework:
- Complete the work in
PS-09.Rmd
.
- Complete the work in
- Learning Goals:
- Reinforce the topics covered in Midterm II.
- Discussion/Solutions: Can be found in both
- The master HTML file.
- R Markdown file specific to PS09
PS-09_discussion.Rmd
Problem Set 08
- Info:
- Assigned Sun 4/9
- Due Fri 4/14 11:15am
- Homework:
- Complete the work in
PS-08.Rmd
.
- Complete the work in
- Learning Goals:
- Study probability using the
mosaic
packages sampling and simulation capabilities, instead of using mathematical formulae (reserved for MATH 310 Probability).
- Study probability using the
- Instructions/Hints
- Interpreting probability:
- In general: one interpretation of “the probability of x occuring” is the proportion of the time “x” occurs across many, many, many attempts.
- Example: “the probability of flipping a coin and getting heads is 0.5 = 50%” can be interpreted as the proportion of heads occuring over many, many, many coin flips being one half.
- For the purposes of this problem set, let “many, many, many times” mean 10,000 times.
- As always, I recommend you separate out the what vs the how by first sketching out a plan of what you are going to do for each question on paper.
- Interpreting probability:
- Discussion/Solutions: Can be found in both
- The master HTML file.
- R Markdown file specific to PS08
PS-08_discussion.Rmd
Problem Set 07
- Info:
- Assigned Fri 3/24
- Due Fri 4/7 11:15am
- Homework:
- Learning Goals:
- More getting used to R Markdown.
- Working on a really substantive data analysis, in the mold of the final project.
- Providing actionable insight from data.
- Discussion/Solutions: Can be found in both
- The master HTML file.
- R Markdown file specific to PS07
PS-07_discussion.Rmd
Problem Set 06
- Info:
- Assigned Sat 3/18
- Due Fri 3/24 11:15am
- Homework:
- Complete this feedback survey on the “R Markdown” course on DataCamp from last week.
- Complete the work in
PS-06.Rmd
. If you’re having trouble with R Markdown, read this Google Doc first. - Optional: If you want reinforcement on
dplyr
, from the “Effective Data Storytelling using the tidyverse” DataCamp course, complete Chapter 7 (Filtering, Grouping, & Summarizing) and Chapter 8 (dplyr Review). Notes:- Two topics,
%in%
andgeom_col
, have not been covered in our course. - If you do complete this course, please help me and Chester out by completing this feedback survey.
- Two topics,
- Learning Goals:
- Perform data wrangling
- Start answering substantive questions with data
- Get familiar with R Markdown
- Discussion/Solutions: Can be found in both
- The master HTML file.
- R Markdown file specific to PS06
PS-06_discussion.Rmd
Problem Set 05
- Info:
- Assigned Sun 3/12
- Due Fri 3/17 11:15am
- Homework:
- Complete the work in
PS-05.R
, saving your work in this file as you will be submitting it. - From the DataCamp course “Reporting with R Markdown”, complete the first two Chapters (I anticipate this taking between 90 min and 2 hours):
- Authoring R Markdown Reports. In this Chapter, don’t worry if you don’t fully understand “Section 2: R code for your report.”
- Embedding Code
- Optional: If you want reinforcement on
ggplot2
, from the “Effective Data Storytelling using the tidyverse” DataCamp course, complete Chapters- 3: Scatter-plots & Line-graphs
- 4: Histograms & Boxplots
- 5: Barplots
- Complete the work in
- Learning Goals:
- Begin to master
ggplot
- Take our first steps with R Markdown
- Begin to master
- Notes:
- For Question 1.b) I accidentally left the code in the problem set. There is nothing to do here.
- For Question 4.b) I gave an example of how to show the data using a
geom_boxplot()
. Submit an answer that does it using anothergeom
.
Problem Set 04
- Info:
- Assigned Mon 3/6
- Due Fri 3/10 11:15am
- Homework:
- Complete this feedback survey on “Chapter 2: Tidy Data” of the “Effective Data Storytelling using the tidyverse” DataCamp course.
- Ethics:
- Listen to Econ Talk podcast interview (time 1h11m) of Cathy O’Neil, author of Weapons of Math Destruction.
- Explain in two paragraphs Cathy O’Neil’s argument of how supposedly objective mathematical/algorithmic models reinforce inequality in the two of the three following contexts:
- Crime recidivism
- The thought experiment of hiring in tech firms
- Teacher evaluations
- Save this is a file
PS-04_Discussion_FirstName_LastName.doc
or.txt
or whatever.
Problem Set 03
- Info:
- Assigned Sat 2/25
- Due Fri 3/3 11:15am
- Homework:
- Complete this feedback survey on the “Intro to R” and “Intermediate R” courses on DataCamp.
- Complete the work in
PS-03.R
, saving your work in this file as you will be submitting it. Standby for the submission format. - Complete Chapter 2: Tidy Data of Effective Data Storytelling using the tidyverse
- Learning Goals:
- Weaning yourselves away from the DataCamp nest and doing your own work in RStudio.
- Baby’s first data analysis!
- Tips:
- Learning to Code:
- Computers are stupid: In order for step C of your code to work, you need to make sure you ran steps A & B first.
- Learning strategy: Tweak existing code in the Learning Checks (go over them!) to suit your ends; don’t code from scratch.
- Working with Data:
- Always look and explore your data first. In our case with the
View()
function and/or theglimpse()
function from thedplyr
package. - Help files are your friend. Most R functions and datasets have
help files. For example, you can access the help file for the
movies
data set by typing?movies
.
- Always look and explore your data first. In our case with the
- Learning to Code:
Problem Set 02
- Info:
- Assigned Thu 2/16
- Due Fri 2/24 11:15am
- Homework: Complete the following three chapters (in this order) from the DataCamp course “Intermediate R”:
- Conditionals and Control Flow
- Loops
- Functions
- Learning Goals:
- Getting more experience with the R command line.
- Expanding our toolbox!
- Notes:
- Again, don’t focus on memorizing anything; just get a feel for things.
- If you are feeling lost/overwhelmed, speak to me sooner than later!
Problem Set 01
- Info:
- Assigned Mon 2/13
- Due Thu 2/16 11am
- Homework:
- Accept email invitation to new assigment on DataCamp.
- Complete the DataCamp course “Introduction to R”.
- Learning Goals:
- Getting familiar with working from command line and the R workflow.
- Learn R-specific terminology.
- Notes:
- Don’t focus on memorizing anything for now, just complete the assignment.
- If you find yourself spinning your wheels, let me know.