Schedule

Topics:

  1. Data visualization (pink): Grammar of Graphics, Five Named Graphs (5NG), color theory.
  2. Working with data (blue): data wrangling, importing, and formatting
  3. Maps and spatial data (green): Maps and geospatial data.
  4. Learning how to learn new data science tools (yellow): SQL, TBD.

Note that while topics and topics dates may change, all problem sets (PS), project, and midterm dates will not.


Lec 06: Fri 9/17

Announcements

Today’s topics/activities

1. Chalk talk

  • In-class demo of using RMarkdown features in a classnotes.Rmd file to save lecture code
  • Take screenshots of your screen!
  • Histograms for visualizing the distribution of a numerical variable

Section 1 (Stoddard G6) Demo

Section 2 (Sabin-Reed 220) Demo

2. In-class exercise

  • If you still haven’t been able to “Knit to PDF”, please ask for help
  • Go over ModernDive reading in schedule above.

Lec 05: Wed 9/15

Announcements

  • PS02 was posted after Monday’s lecture.

Today’s topics/activities

1. Chalk talk

  • Overplotting and two approaches for addressing it
  • Linegraphs

2. In-class exercise

  • Explore the different formatting tools in R Markdown: go to RStudio top menu bar -> Help -> Markdown quick reference.
  • Sec01 in Stoddard: There was an typo in Step 8 in last lecture’s in-class exercise. If you weren’t able to Knit directly to PDF, please re-attempt Steps 8-9. Knitting directly to PDF, instead of Knitting to Word and then saving to PDF, is the preferred submission format for all problem sets. It will be less hassle for you and provide consistency for the graders.
  • Go over ModernDive reading in schedule above.

Lec 04: Mon 9/13

Announcements

  • Problem Set 02 due next Monday 5pm, now posted under Problem Sets

Today’s topics/activities

1. Chalk talk

  • Recap of previous lecture
  • “Where can I save all the code I run in class?” In an R Markdown .Rmd file; R Markdown is a tool for reproducible research
Input: An .Rmd file Output: An .html, .docx, or .pdf file.

2. In-class exercise

In-class battle-testing and practicing for PS02:

  1. At a couple of steps in this process, you will be asked to install packages. Say yes to all of them.
  2. If at any point your code won’t knit, go through these 6 R Markdown Fixes first, then seek assistance. These 6 fixes will resolve 85% of issues.
  3. Create new R Markdown .Rmd file:
    • Go to RStudio menu bar -> File -> New File -> R Markdown
    • Set “Title” to “My first R Markdown report” and “Author” as your name.
    • Save this file as testing somewhere on your computer. This will create a file called testing.Rmd
  4. Method 1: “Knit” a report to HTML:
    • Click the arrow next to “Knit” -> “Knit to HTML”.
    • An HTML webpage should pop up. However, it may be blocked by your browser. If so, in your browser’s URL bar, click on “Always allow pop-ups”.
  5. Method 1: Publish HTML report on web:
    • Click on blue “Publish” button on top right of the resulting pop-up html.
    • Select RPubs.
    • If you haven’t previously, create an account on Rpubs.com. If you have previously, login.
    • Set “Title” to “My first R Markdown report” and “Slug” to “testing”
    • You should end up with a webpage that looks like this one. This is live on the web!
  6. Method 1: Update HTML report on web:
    • Make some trivial change to your testing.Rmd file.
    • “Re-knit” your report and make sure your trivial change is reflected.
    • The blue “Publish” button should now read “Republish”
    • Click “Update existing”
    • Your updates are now live on the web!
  7. Method 2: “Knit” a report to Word
    • Click the arrow next to “Knit” -> “Knit to Word”.
    • Save the resulting Word document as a pdf file.
  8. Only if you are a macOS user:
    • Next to “Console” go to “Terminal”
    • Run this line of code:
    sudo chown -R `whoami`:admin /usr/local/bin
    • Enter your password. Note: Terminal has weird behavior whereby as you enter your password, the cursor will not move. Don’t worry your password is registering.
  9. Method 3: “Knit” a report to PDF
    • Run the following code in your console just once:
    install.packages('tinytex')
    tinytex::install_tinytex()
    • Click the arrow next to “Knit” -> “Knit to PDF”.

Lec 03: Fri 9/10

Announcements

  • Spinelli Center SDS drop-in tutoring hours now open! Get individual attention from SDS majors! In Sabin-Reed 301
    • Sunday through Thursday 7-9pm
    • Friday 2:35-3:30pm
  • By popular request:
    • Sec 01 in Stoddard G2 will now start 5 minutes later: 10:55 AM instead of 10:50AM
    • Sec 02 in Sabin-Reed 220 will now end 5 minutes earlier: 10:35 AM instead of 10:40 AM
  • I added extra instructions for Problem Set 01 after lecture, posted under Problem Sets
    • Show don’t tell how to tag questions on gradescope
  • ASA StatFest 2021 Sat 9/18 thru Sun 9/19 flyer and event webpage
    • Sunday 9/19 at 11:50AM: Opportunities in Statistics & Data Science in Academia, Government, & Non-Profit featuring SDS’s Prof. Randi Garcia!
    • Keynote address by Robert Santos, 116th President of the ASA, and President Biden’s nominee to serve as Director of the United States Census Bureau! If approved by the Senate, he would be the first Latinx Director of the Bureau!

Today’s topics/activities

1. Chalk talk

  • Recap of previous lecture
  • Grammar of Graphics
  • 5NG1: Scatterplots
  • Next time:
    • Question: Do I need to re-type my code in the Console every single time?
    • Answer: No! Save your work in an RMarkdown document

2. In-class exercise

  • Go over ModernDive reading in schedule above.

Lec 02: Wed 9/8

Announcements

  • Problem Set 01 due this Monday 5pm, posted under Problem Sets.

Today’s topics/activities

1. Chalk talk

  • Intro to Slack
  • What is difference between R and RStudio?
  • What are R packages?

2. In-class exercise

  • Go over ModernDive reading in schedule above.

About readings in this course:

  • You are responsible for completing a lecture’s readings before the next lecture. Ex: you are responsible to read all of ModernDive Chapter 1 before Wednesday.
  • I teach lectures assuming you have not done the readings beforehand. However, if it suits your learning style better, please do read beforehand.
  • While you don’t need to turn in your learning check answers, I highly recommend you still do them. The solutions are in Appendix D of the book.
  • If you have your headphones, you may listen to music during in-class reading time.

Lec 01: Fri 9/3

Announcements

Welcome!

Today’s topics/activities

  • Course webpage: bit.ly/sds192kim
  • My story
  • “Knock on wood if you’re with me”
  • What this class is about: Answering questions with data
    1. Data viz
    2. Data wrangling
    3. Maps
    4. Websites
  • Break!
  • Executive summary of syllabus
  • This weekend: Complete intro survey

Code examples from class

# Data visualization
library(fivethirtyeight)
library(ggplot2)
library(dplyr)
year_bins <- c("'70-'74", "'75-'79", "'80-'84", "'85-'89", "'90-'94",
               "'95-'99", "'00-'04", "'05-'09", "'10-'13")

bechdel <- bechdel %>%
  mutate(five_year = cut(year, breaks = seq(1969, 2014, 5), labels = year_bins))

ggplot(bechdel, aes(x = five_year, fill = clean_test)) +
  geom_bar(position = "fill", color = "black") +
  labs(x = "Year", y = "Proportion", fill = "Bechdel Test") +
  scale_fill_brewer(palette = "YlGnBu")

# Data Wranling
library(fec16)
all_transactions <- read_all_transactions()
View(all_transactions)

# Maps
library(leaflet)
leaflet() %>%
  addTiles() %>% 
  addMarkers(lng=-72.64022, lat=42.31706, popup="Smith College")