Schedule

Topics:

  1. Data visualization (pink): Grammar of Graphics, Five Named Graphs (5NG), color theory.
  2. Working with data (blue): data wrangling, importing, and formatting
  3. Maps and spatial data (green): Maps and geospatial data.
  4. Learning how to learn new data science tools (yellow): SQL, TBD.

Note that while topics and topics dates may change, all problem sets (PS), project, and midterm dates will not.


Lec 07: Fri 9/20

Announcements

  • Slack:
    • Prof. Katie Kinnaird’s TRIPODS+X - Data Science Education Investigation
    • Post on #random by Ray
  • Problem sets:
  • Announcement from Smithies in SDS:


Tweet of the Day

Today’s topics/activities

1. Chalk talk

  • Recap of histograms
  • Facets to split a visualization by the values of another variable
  • Boxplots! Powerful, but tricky!

Say we want to study the distribution of the following 12 values which are pre-sorted:

1, 3, 5, 6, 7, 8, 9, 12, 13, 14, 15, 30

They have the following summary statistics. A summary statistic is a single numerical value summarizing many values. Examples include the immediately obvious mean AKA average and median. Other less immediately obvious examples include:

  • Quartiles (1st, 2nd, and 3rd) that cut up the data into 4 parts, each containing roughly one quarter = 25% of the data
  • Minimum & maximum
  • Interquartile-range (IQR): the distance between the 3rd and 1st quartiles
Min. 1st Quartile Median = 2nd Quartile 3rd Quartile Max. IQR
1 5.5 8.5 13.5 30 8 = 13.5 - 5.5

Let’s compare the points and the corresponding boxplot side-by-side with the values on the \(y\)-axis matching:

2. In-class exercise

  • Go over ModernDive 2.6 - 2.7
  • Start PS03

I don’t mind what you do with your class time, but it is very important that you complete the reading before next lecture. Boxplots take practice.


Lec 06: Wed 9/18

Announcements

  • Prof. Katie Kinnaird’s TRIPODS+X - Data Science Education Investigation

Today’s topics/activities

1. Chalk talk

  • Recap of previous lecture
  • Live-demo of creating classnotes.Rmd, an R Markdown file of all in-class exercise code: Write and copy/paste/tweak code in classnotes.Rmd and not in console. That way you can save it!
  • Histograms for visualizing distribution of a numerical variable.

2. In-class exercise

  • Go over ModernDive 2.5

Lec 05: Mon 9/16

Announcements

  • Slack message: Abandoning RStudio Cloud in favor of RStudio Desktop.
  • The art of managing Slack notifications

Today’s topics/activities

1. Chalk talk

  • Recap of previous lecture
  • Overplotting and two approaches for addressing it
  • Linegraphs

2. In-class exercise

  • Go over ModernDive 2.3.2 - 2.4

Lec 04: Fri 9/13

Announcements

  • Screencast from last lecture posted
  • I’m currently investigating issue with RStudio Cloud being slow
  • PS02 posted under Problem Sets

Today’s topics/activities

1. Chalk talk

  • Recap of previous lecture
  • R Markdown for reproducible research
Input: An .Rmd file Output: An .html webpage

2. In-class exercise

  1. At a couple of steps in this process, you will be asked to install packages. Say yes to all of them!
  2. Fiddle with RStudio settings:
    • Go to RStudio menu bar -> Tools -> Global Options… -> R Markdown
    • Uncheck box next to “Show output inline for all R Markdown Documents”
  3. Create new R Markdown .Rmd file:
    • Go to RStudio menu bar -> File -> New File -> R Markdown
    • Set “Title” to “My first R Markdown report” and “Author” as your name.
  4. “Knit” a report:
    • Click on the disk icon and save this file as testing somewhere on your computer. This will create a file called testing.Rmd
    • Click the arrow next to “Knit” -> “Knit to HTML”.
    • An HTML webpage should pop up. However, it may be blocked by your browser. If so, in your browser’s URL bar on the right, click on “Always allow pop-ups”.
  5. Publish this report on web:
    • Click on blue “Publish” button on top right of the resulting pop-up html.
    • Select RPubs.
    • If you haven’t previously, create an account on Rpubs.com. If you have previously, login.
    • Set “Title” to “My first R Markdown report” and “Slug” to “testing”
    • You should end up with a webpage that looks like this one. This is live on the web!
  6. Update your report on web:
    • Make some trivial change to your testing.Rmd file.
    • “Re-knit” your report and make sure your trivial change is reflected.
    • The blue “Publish” button should now read “Republish”
    • Click “Update existing”
    • Your updates are now live on the web!
  7. Bonus: Play around with different formatting tools in R Markdown to customize your report! Go to RStudio menu bar -> Help -> Markdown quick reference.

Tips on R Markdown:

  1. Knit early, knit often! If you wait until only after you’ve added a ton of code to knit and something doesn’t work, you’ll have a hard time figuring out where the error is. If you make incremental changes and knit after every step, you’ll better able to isolate where errors are.
  2. If you get stuck, go through these 6 R Markdown Fixes first, then seek assistance. These 6 fixes resolve 85% of issues in my experience.

Lec 03: Wed 9/11

Announcements

  • Slack updates: custom emojis and vote in today’s poll!

Today’s topics/activities

1. Chalk talk

  • Recap of previous lecture
  • Grammar of Graphics
  • Screencast of “Doing ModernDive readings”. In particular the idea of “Running R code in RStudio”:


2. In-class exercise

  • Go over ModernDive 2 - 2.3.1.

Lec 02: Mon 9/9

Announcements

Today’s topics/activities

1. Chalk talk

  • Intro to Slack slides
  • What is difference between R and RStudio?
  • What are R packages?

2. In-class exercise

  • Set up RStudio Cloud:
    • Click here to join the “SDS192” Workspace.
    • Click on “New Project”
    • Name it “Class Notes”
  • Go over ModernDive reading in schedule above.

About readings in this course:

  • You are responsible for completing a lecture’s readings before the next lecture. Ex: you are responsible to read all of ModernDive Chapter 1 before Wednesday.
  • I teach lectures assuming you have not done the readings beforehand. However, if it suits your learning style better, please do read beforehand.
  • While you don’t need to turn in your learning check answers, I highly recommend you still do them. The solutions are in Appendix D of the book.
  • If you have your headphones, you may listen to music during in-class reading time.

Lec 01: Fri 9/6

Announcements

Welcome!

Today’s topics/activities

  • My story.
  • What this class is about: Answering questions with data.
  • Executive summary of syllabus; finalized syllabus will be published next week.
  • Coding: it’s normal to be 😱