Problem Sets
While I encourage you to discuss problem sets with your peers, you must submit your own answers and not simple rewordings of another’s work. Furthermore, all collaborations must be explicitly acknowledged at the top of your submissions.
Final Group Project
- Final group project instructions.
- Monday 11/14 updates:
- Final group project template file
Final_Project.Rmd
posted. One group member to upload to group’s RStudio Shared Project Folder. - Please write your group’s info in this Google Sheet. Note that one group member will also be publishing the analysis on the web on rpubs.com for all to see:
- Knit HTML your document as usual
- On the top right click “Publish” -> Select RPubs -> Publish -> Then login to RPubs.
- Give your file an appropriate title and URL name
Final_Project
. - Copy/paste the URL into the Google Sheet above.
- Update your publication as need by repeating this process.
- Final group project template file
Problem Set 11
- Assigned Sun 12/4
- Due Fri 12/9 11am
Learning Goals
- Perform a more realistic confidence interval calculation: where you don’t know the real population parameter
- Solidify understanding of confidence intervals
Homework
- Please read over
PS-11
first. - Download these files to your computer and upload them to your
problem_sets
folder on RStudio Server:PS-11.Rmd
- PS-11 Discussion
- PS-11 Discussion source code file
PS-11_discussion.Rmd
Problem Set 10
- Assigned Mon 11/21
- Due Mon 11/28 5pm (note special date and time)
Learning Goals
- Perform a start-to-finish hypothesis test and state the conclusion both a scientific and statistical conclusion.
- More exploratory data analysis
Homework
- Download these files to your computer and upload them to your
problem_sets
folder on RStudio Server: - PS-10 Discussion
- PS-10 Discussion source code file
PS-10_discussion.Rmd
Problem Set 09
- Assigned Wed 11/16
- Due Fri 11/18 11am
Learning Goals
- Only one question: based on
Lec25.R
from Lecture 25 - Baby’s first hypothesis test!
- Further exploring the components: observed test statistics and null distributions
- Tieing in the
shuffle()
(i.e. random simulation) idea from PS-08 Question 1.b) to hypothesis testing
Homework
- Download these files to your computer and upload them to your
problem_sets
folder on RStudio Server: - PS-09 Discussion
- PS-09 Discussion source code file
PS-09_discussion.Rmd
Problem Set 08
- Assigned Fri 11/4
- Due Fri 11/11 11am
Learning Goals
- Understanding the two places were randomness plays a part in this class:
- Random sampling: used for taking a sample from a population
- Random assignment: used in experiments
- Learning to compute probabilities not using mathematical formulas, but rather
via random simulation using
- the data manipulation tools in the
dplyr
package - the sampling tools in the
mosaic
package
- the data manipulation tools in the
Homework
- Download these files to your computer and upload them to your
problem_sets
folder on RStudio Server:PS-08.Rmd
- Download the OpenIntro Statistics 3rd Edition open-source statistics textbook (the textbook I used previously for MATH 116) and save this to your computer. Questions for the rest of the course will come from here.
- PS-08 Discussion
- PS-08 Discussion source code file
PS-08_discussion.Rmd
Problem Set 07
- Assigned Fri 10/28
- Due Fri 11/4 11am
Learning Goals
- Practice turning pseudocode into code.
- More imporantly doing research: generating answers to scientific questions using data.
Homework
- Download these files to your computer and upload them to your
problem_sets
folder on RStudio Server: - PS-07 Discussion
- PS-07 Discussion source code file
PS-07_discussion.Rmd
Problem Set 06
- Assigned Fri 10/21
- Due Fri 10/28 11am
- PS-06 Discussion
- PS-06 Discussion source code file
PS-06_discussion.Rmd
Learning Goals
- Tackling your first “real” analysis using your data toolbox.
- Introducing what are in my opinion effective approaches to tackle problems of this type, instead of taking approaches that could lead to this.
- Practice, practice, practice. Much like learning a language, the only way to get better is practice.
Homework
- Download this file to your computer and upload it to your
problem_sets
folder on RStudio Server:PS-06.Rmd
Problem Set 05
- Assigned Tue 10/18
- Due Fri 10/21 11am
- PS-05 Discussion
- PS-05 Discussion source code file
PS-05_discussion.Rmd
Learning Goals
- Wrap up the Grammar of Graphics
- Start wrangling data!
- Start providing useful summaries.
Homework
- Download this file to your computer and upload it to your
problem_sets
folder on RStudio Server:PS-05.Rmd
Problem Set 04
- Assigned Sat 10/8
- Due Fri 10/14 11am
- PS-04 Discussion
- PS-04 Discussion source code file
PS-04_discussion.Rmd
Learning Goals
- This problem set assumes you are now further comfortable with the R, RStudio, and R Markdown workflow, and thus the complexity of the questions asked is increased.
- Using more of the 5NG tools for data visualization to answer meaningful questions using real data.
- Slowly introducting notions of data manipulation/wrangling.
- Putting statistical and data sciences in a greater social context via Hans Rosling’s 20 minute TED Talk on The best stats you’ve ever seen (bold title, I know) on international development data.
Homework
- As described in Lec03, install the following packages
- The
okcupiddata
package containing theprofiles
data set: profile information for ~60K San Francisco OkCupid users in June 2012 - The
gapminder
package containing thegapminder
data set: international development data
- The
- Download this file to your computer:
PS-04.Rmd
- Upload it to RStudio server into the
problem_sets
folder:- In the Files panel, navigate to your
problem_sets
folder - Then click “Upload”
- In the Files panel, navigate to your
Problem Set 03
- Assigned Fri 9/30
- Due Fri 10/7 11am
- PS-03 Discussion
- PS-03 Discussion source code file
PS-03_discussion.Rmd
Learning Goals
- Ramping up the use of the
ggplot2
package for data visualization - Exploring real time series data
- Using Google as a research tool
Homework
- As described in Lec03, before tackling the problem set, install the following packages
- The
Quandl
package for making it amazingly easy to get financial and economic data from quandl.com - The
lubridate
package with consistent and memorable syntax that makes working with dates easier
- The
- Download this file to your computer:
PS-03.Rmd
- Upload it to RStudio server into the
problem_sets
folder:- In the Files panel, navigate to your
problem_sets
folder - Then click “Upload”
- In the Files panel, navigate to your
Problem Set 02
- Assigned Fri 9/23
- Due Fri 9/30 11am
- Discussion
- PS-02 Discussion source code file
PS-02_discussion.Rmd
Learning Goals
- Taking your first baby steps using the
ggplot2
package for data visualization: an R-based implementation of the “Grammar of Graphics”
Homework
- In RStudio, on the top right of the screen, next to the cube with “R” on it, if it says
problem_sets
: click on it and select “Close Project”Project: (None)
: do nothing
- Download this file to your computer:
PS-02.Rmd
- Upload it to RStudio server into the
problem_sets
folder - Open it and work on it from there
- Don’t forget to answer the questions in the “Please Indicate” section
Problem Set 01
- Assigned Fri 9/16
- Due Fri 9/23
Learning Goals
- This week’s problem set doesn’t involve much content, but rather is about familiarizing yourselves with the problem set workflow and submission format using R Markdown.
- In particular, we’ll go over how to share your analyses over the web with a couple of clicks of the mouse!
- You’ll start
- seeing what I mean by “computers are stupid”
- develop the skill of “debugging”: identifying and removing errors from code. In our case, if your R Markdown file won’t knit AKA load AKA compile AKA render, follow the steps in R Markdown debugging (also posted on the Resources page). This usually solves about 85% of problems; if you’re still stuck after going through the steps, speak to your peers or me.
- In my experience, there are always a few hiccups with R Markdown at the beginning, but by the third assignment everyone is on board.
Homework
- Download this file to your computer, then upload it to the RStudio Server as described in Lec03:
PS-01.Rmd
Submit your homework using this submission form. See below.
Problem Set Submission Process
Using RStudio Server project sharing (which you’ll also be using for your group projects)! The grader and I will go over and leave comments directly on your problem set files.
Only do this once:
You will create a project (i.e. an organizational folder) that you will share with me and the grader:
- In the top right of RStudio Server click on the cube with “R” in it -> New Project… -> Click “save” when prompted
- New Directory -> Empty Project -> Enter
problem_sets
as the Directory name and click “Create Project” - On the top right it should say
problem_sets
next to the cube with “R” in it. Click on that -> Share Project… - In the box with the blinking cursor add
aykim
andtsingh
- Copy the Project URL and press OK
- Paste your URL in the appropriate row in this Google Sheet
Only for problem set 1:
Move the file PS-01.Rmd
to the problem_sets
shared project folder so the grader and I can access it:
- In the Files panel -> Click on the house icon
- Click the checkboxes next to
PS-01.Rmd
andPS-01.html
- Click the gear icon “More” -> Move…
- Select
problem_sets
For all future problem sets:
- When uploading the relevant
PS-XX.Rmd
file to RStudio Server, upload it directly to theproblem_sets
folder.