While I encourage you to discuss problem sets with your peers, you must submit your own answers and not simple rewordings of another’s work. Furthermore, all collaborations must be explicitly acknowledged at the top of your submissions.




Final Group Project

  • Final group project instructions.
  • Monday 11/14 updates:
    • Final group project template file Final_Project.Rmd posted. One group member to upload to group’s RStudio Shared Project Folder.
    • Please write your group’s info in this Google Sheet. Note that one group member will also be publishing the analysis on the web on rpubs.com for all to see:
      1. Knit HTML your document as usual
      2. On the top right click “Publish” -> Select RPubs -> Publish -> Then login to RPubs.
      3. Give your file an appropriate title and URL name Final_Project.
      4. Copy/paste the URL into the Google Sheet above.
      5. Update your publication as need by repeating this process.




Problem Set 11

  • Assigned Sun 12/4
  • Due Fri 12/9 11am

Learning Goals

  • Perform a more realistic confidence interval calculation: where you don’t know the real population parameter
  • Solidify understanding of confidence intervals

Homework




Problem Set 10

  • Assigned Mon 11/21
  • Due Mon 11/28 5pm (note special date and time)

Learning Goals

  • Perform a start-to-finish hypothesis test and state the conclusion both a scientific and statistical conclusion.
  • More exploratory data analysis

Homework




Problem Set 09

  • Assigned Wed 11/16
  • Due Fri 11/18 11am

Learning Goals

  • Only one question: based on Lec25.R from Lecture 25
  • Baby’s first hypothesis test!
  • Further exploring the components: observed test statistics and null distributions
  • Tieing in the shuffle() (i.e. random simulation) idea from PS-08 Question 1.b) to hypothesis testing

Homework




Problem Set 08

  • Assigned Fri 11/4
  • Due Fri 11/11 11am

Learning Goals

  • Understanding the two places were randomness plays a part in this class:
    • Random sampling: used for taking a sample from a population
    • Random assignment: used in experiments
  • Learning to compute probabilities not using mathematical formulas, but rather via random simulation using
    • the data manipulation tools in the dplyr package
    • the sampling tools in the mosaic package

Homework

  • Download these files to your computer and upload them to your problem_sets folder on RStudio Server: PS-08.Rmd
  • Download the OpenIntro Statistics 3rd Edition open-source statistics textbook (the textbook I used previously for MATH 116) and save this to your computer. Questions for the rest of the course will come from here.
  • PS-08 Discussion
  • PS-08 Discussion source code file PS-08_discussion.Rmd




Problem Set 07

  • Assigned Fri 10/28
  • Due Fri 11/4 11am

Learning Goals

  • Practice turning pseudocode into code.
  • More imporantly doing research: generating answers to scientific questions using data.

Homework




Problem Set 06

Learning Goals

  • Tackling your first “real” analysis using your data toolbox.
  • Introducing what are in my opinion effective approaches to tackle problems of this type, instead of taking approaches that could lead to this.
  • Practice, practice, practice. Much like learning a language, the only way to get better is practice.

Homework

  • Download this file to your computer and upload it to your problem_sets folder on RStudio Server: PS-06.Rmd




Problem Set 05

Learning Goals

  • Wrap up the Grammar of Graphics
  • Start wrangling data!
  • Start providing useful summaries.

Homework

  • Download this file to your computer and upload it to your problem_sets folder on RStudio Server: PS-05.Rmd




Problem Set 04

Learning Goals

  • This problem set assumes you are now further comfortable with the R, RStudio, and R Markdown workflow, and thus the complexity of the questions asked is increased.
  • Using more of the 5NG tools for data visualization to answer meaningful questions using real data.
  • Slowly introducting notions of data manipulation/wrangling.
  • Putting statistical and data sciences in a greater social context via Hans Rosling’s 20 minute TED Talk on The best stats you’ve ever seen (bold title, I know) on international development data.

Homework

  • As described in Lec03, install the following packages
    • The okcupiddata package containing the profiles data set: profile information for ~60K San Francisco OkCupid users in June 2012
    • The gapminder package containing the gapminder data set: international development data
  • Download this file to your computer: PS-04.Rmd
  • Upload it to RStudio server into the problem_sets folder:
    • In the Files panel, navigate to your problem_sets folder
    • Then click “Upload”




Problem Set 03

Learning Goals

  • Ramping up the use of the ggplot2 package for data visualization
  • Exploring real time series data
  • Using Google as a research tool

Homework

  • As described in Lec03, before tackling the problem set, install the following packages
    • The Quandl package for making it amazingly easy to get financial and economic data from quandl.com
    • The lubridate package with consistent and memorable syntax that makes working with dates easier
  • Download this file to your computer: PS-03.Rmd
  • Upload it to RStudio server into the problem_sets folder:
    • In the Files panel, navigate to your problem_sets folder
    • Then click “Upload”




Problem Set 02

Learning Goals

  • Taking your first baby steps using the ggplot2 package for data visualization: an R-based implementation of the “Grammar of Graphics”

Homework

  • In RStudio, on the top right of the screen, next to the cube with “R” on it, if it says
    • problem_sets: click on it and select “Close Project”
    • Project: (None): do nothing
  • Download this file to your computer: PS-02.Rmd
  • Upload it to RStudio server into the problem_sets folder
  • Open it and work on it from there
  • Don’t forget to answer the questions in the “Please Indicate” section




Problem Set 01

  • Assigned Fri 9/16
  • Due Fri 9/23

Learning Goals

  • This week’s problem set doesn’t involve much content, but rather is about familiarizing yourselves with the problem set workflow and submission format using R Markdown.
  • In particular, we’ll go over how to share your analyses over the web with a couple of clicks of the mouse!
  • You’ll start
    • seeing what I mean by “computers are stupid”
    • develop the skill of “debugging”: identifying and removing errors from code. In our case, if your R Markdown file won’t knit AKA load AKA compile AKA render, follow the steps in R Markdown debugging (also posted on the Resources page). This usually solves about 85% of problems; if you’re still stuck after going through the steps, speak to your peers or me.
  • In my experience, there are always a few hiccups with R Markdown at the beginning, but by the third assignment everyone is on board.

Homework

  • Download this file to your computer, then upload it to the RStudio Server as described in Lec03: PS-01.Rmd
  • Submit your homework using this submission form. See below.




Problem Set Submission Process

Using RStudio Server project sharing (which you’ll also be using for your group projects)! The grader and I will go over and leave comments directly on your problem set files.

Only do this once:

You will create a project (i.e. an organizational folder) that you will share with me and the grader:

  • In the top right of RStudio Server click on the cube with “R” in it -> New Project… -> Click “save” when prompted
  • New Directory -> Empty Project -> Enter problem_sets as the Directory name and click “Create Project”
  • On the top right it should say problem_sets next to the cube with “R” in it. Click on that -> Share Project…
  • In the box with the blinking cursor add aykim and tsingh
  • Copy the Project URL and press OK
  • Paste your URL in the appropriate row in this Google Sheet

Only for problem set 1:

Move the file PS-01.Rmd to the problem_sets shared project folder so the grader and I can access it:

  • In the Files panel -> Click on the house icon
  • Click the checkboxes next to PS-01.Rmd and PS-01.html
  • Click the gear icon “More” -> Move…
  • Select problem_sets

For all future problem sets:

  • When uploading the relevant PS-XX.Rmd file to RStudio Server, upload it directly to the problem_sets folder.