While I encourage you to discuss problem sets with your peers, you must submit your own answers and not simple rewordings of another’s work. Furthermore, all collaborations must be explicitly acknowledged at the top of your submissions.

## Final Group Project

• Final group project instructions.
• Final group project template file `Final_Project.Rmd` posted. One group member to upload to group’s RStudio Shared Project Folder.
• Please write your group’s info in this Google Sheet. Note that one group member will also be publishing the analysis on the web on rpubs.com for all to see:
1. Knit HTML your document as usual
2. On the top right click “Publish” -> Select RPubs -> Publish -> Then login to RPubs.
3. Give your file an appropriate title and URL name `Final_Project`.
4. Copy/paste the URL into the Google Sheet above.
5. Update your publication as need by repeating this process.

## Problem Set 11

• Assigned Sun 12/4
• Due Fri 12/9 11am

#### Learning Goals

• Perform a more realistic confidence interval calculation: where you don’t know the real population parameter
• Solidify understanding of confidence intervals

## Problem Set 10

• Assigned Mon 11/21
• Due Mon 11/28 5pm (note special date and time)

#### Learning Goals

• Perform a start-to-finish hypothesis test and state the conclusion both a scientific and statistical conclusion.
• More exploratory data analysis

## Problem Set 09

• Assigned Wed 11/16
• Due Fri 11/18 11am

#### Learning Goals

• Only one question: based on `Lec25.R` from Lecture 25
• Baby’s first hypothesis test!
• Further exploring the components: observed test statistics and null distributions
• Tieing in the `shuffle()` (i.e. random simulation) idea from PS-08 Question 1.b) to hypothesis testing

## Problem Set 08

• Assigned Fri 11/4
• Due Fri 11/11 11am

#### Learning Goals

• Understanding the two places were randomness plays a part in this class:
• Random sampling: used for taking a sample from a population
• Random assignment: used in experiments
• Learning to compute probabilities not using mathematical formulas, but rather via random simulation using
• the data manipulation tools in the `dplyr` package
• the sampling tools in the `mosaic` package

## Problem Set 07

• Assigned Fri 10/28
• Due Fri 11/4 11am

#### Learning Goals

• Practice turning pseudocode into code.
• More imporantly doing research: generating answers to scientific questions using data.

## Problem Set 06

#### Learning Goals

• Introducing what are in my opinion effective approaches to tackle problems of this type, instead of taking approaches that could lead to this.
• Practice, practice, practice. Much like learning a language, the only way to get better is practice.

#### Homework

• Download this file to your computer and upload it to your `problem_sets` folder on RStudio Server: `PS-06.Rmd`

## Problem Set 05

#### Learning Goals

• Wrap up the Grammar of Graphics
• Start wrangling data!
• Start providing useful summaries.

#### Homework

• Download this file to your computer and upload it to your `problem_sets` folder on RStudio Server: `PS-05.Rmd`

## Problem Set 04

#### Learning Goals

• This problem set assumes you are now further comfortable with the R, RStudio, and R Markdown workflow, and thus the complexity of the questions asked is increased.
• Using more of the 5NG tools for data visualization to answer meaningful questions using real data.
• Slowly introducting notions of data manipulation/wrangling.
• Putting statistical and data sciences in a greater social context via Hans Rosling’s 20 minute TED Talk on The best stats you’ve ever seen (bold title, I know) on international development data.

#### Homework

• As described in Lec03, install the following packages
• The `okcupiddata` package containing the `profiles` data set: profile information for ~60K San Francisco OkCupid users in June 2012
• The `gapminder` package containing the `gapminder` data set: international development data
• Download this file to your computer: `PS-04.Rmd`
• Upload it to RStudio server into the `problem_sets` folder:
• In the Files panel, navigate to your `problem_sets` folder

## Problem Set 03

#### Learning Goals

• Ramping up the use of the `ggplot2` package for data visualization
• Exploring real time series data
• Using Google as a research tool

#### Homework

• As described in Lec03, before tackling the problem set, install the following packages
• The `Quandl` package for making it amazingly easy to get financial and economic data from quandl.com
• The `lubridate` package with consistent and memorable syntax that makes working with dates easier
• Download this file to your computer: `PS-03.Rmd`
• Upload it to RStudio server into the `problem_sets` folder:
• In the Files panel, navigate to your `problem_sets` folder

## Problem Set 02

#### Learning Goals

• Taking your first baby steps using the `ggplot2` package for data visualization: an R-based implementation of the “Grammar of Graphics”

#### Homework

• In RStudio, on the top right of the screen, next to the cube with “R” on it, if it says
• `problem_sets`: click on it and select “Close Project”
• `Project: (None)`: do nothing
• Download this file to your computer: `PS-02.Rmd`
• Upload it to RStudio server into the `problem_sets` folder
• Open it and work on it from there
• Don’t forget to answer the questions in the “Please Indicate” section

## Problem Set 01

• Assigned Fri 9/16
• Due Fri 9/23

#### Learning Goals

• This week’s problem set doesn’t involve much content, but rather is about familiarizing yourselves with the problem set workflow and submission format using R Markdown.
• In particular, we’ll go over how to share your analyses over the web with a couple of clicks of the mouse!
• You’ll start
• seeing what I mean by “computers are stupid”
• develop the skill of “debugging”: identifying and removing errors from code. In our case, if your R Markdown file won’t knit AKA load AKA compile AKA render, follow the steps in R Markdown debugging (also posted on the Resources page). This usually solves about 85% of problems; if you’re still stuck after going through the steps, speak to your peers or me.
• In my experience, there are always a few hiccups with R Markdown at the beginning, but by the third assignment everyone is on board.

## Problem Set Submission Process

Using RStudio Server project sharing (which you’ll also be using for your group projects)! The grader and I will go over and leave comments directly on your problem set files.

Only do this once:

You will create a project (i.e. an organizational folder) that you will share with me and the grader:

• In the top right of RStudio Server click on the cube with “R” in it -> New Project… -> Click “save” when prompted
• New Directory -> Empty Project -> Enter `problem_sets` as the Directory name and click “Create Project”
• On the top right it should say `problem_sets` next to the cube with “R” in it. Click on that -> Share Project…
• In the box with the blinking cursor add `aykim` and `tsingh`
• Copy the Project URL and press OK
• Paste your URL in the appropriate row in this Google Sheet

Only for problem set 1:

Move the file `PS-01.Rmd` to the `problem_sets` shared project folder so the grader and I can access it:

• In the Files panel -> Click on the house icon
• Click the checkboxes next to `PS-01.Rmd` and `PS-01.html`
• Click the gear icon “More” -> Move…
• Select `problem_sets`

For all future problem sets:

• When uploading the relevant `PS-XX.Rmd` file to RStudio Server, upload it directly to the `problem_sets` folder.