1. Syllabus

  • Click the syllabus link above.

2. What is Data Science?

3. Setting Up

Please complete the following steps before the next lecture to get set up for this class. This should take between 60-90 minutes. If you get stuck, please speak to me after class or during office hours.

Software

  1. Install the following software. If you already have R & RStudio installed, please re-install both.
    • R programming language and software environment for statistical computing and graphics
    • RStudio (preview version) integrated development environment (IDE) for R.
      • On Macs, when prompted to install command line developper tools, select “Install”
      • On Windows, you should get a similar prompt.
    • LaTeX:
    • Git open source distributed version control system
  2. Ensure you can login to RStudio Server from your browser at go/rstudio. If you are off-campus you must first log into the Middlebury VPN.

R Markdown

  1. Open RStudio and starting in the menu bar, go to File -> New File -> R Markdown…
  2. If prompted to install any packages, say yes.
  3. Give it an arbitrary title and select the PDF output format.
  4. A document Untitled1 should pop-up. In that panel, click on Knit.
  5. Give the file a name and save

A PDF document should pop-up. Then

  1. Click on the downward point black arrow next to the Knit button and select Knit to HTML
  2. On the top right of the pop-up click “Publish”
  3. Select RPubs -> Publish
  4. Your browser should pop-up. Create an account on RPubs.
  5. Give your file an arbitrary title and a desired URL.

The same analysis as the PDF above should appear on a webpage.

Installing Packages

We now describe how to install R packages, or extensions to R, from the CRAN repository of packages.

  • In one of the panels in RStudio, there is a tab Packages.
  • Click Install and in the Packages field type ggplot2 dplyr to install both those packages.
  • If prompted to restart R, say yes.
  • In another panel, there is a tab Console. Type library(ggplot2) and library(dplyr) and ensure the resulting messages does not contain any error messages.

GitHub

GitHub is a web-based Git repository hosting system.

  • Go to GitHub, create an account using your @middlebury.edu account, and verify your email.
  • Edit your profile:
    • Change your profile picture to a cropped picture of you (this will help me learn your names faster)
    • Add your name
    • (Optional) Add your personal email
  • Go to GitHub Education
    • Request an “Individual Account” discount
    • For “How do you plan to use GitHub?” type in: For my Middlebury College MATH 216 Introduction to Data Science course https://github.com/2016-09-Middlebury-Data-Science

RStudio and GitHub

  • In the RStudio menu bar, go to File -> New Project… -> If prompted, don’t save current workspace -> New Directory -> Empty Project
  • Check the “Create a git repository” box.
  • Give it the project an arbitrary directory name and save it any place you choose -> Create Project
  • Follow all the steps on this link before the section “Create New project AND git”, however in the final step replace
    • mail@ewenharrison.com with your @middlebury.edu email
    • ewenharrison with your GitHub login.