May 3, 2016

New Class: Intro to Data Science



14 students: 6 Seniors, 6 Juniors, & 2 Sophomores. Of which:

  • Double majors:
    • Environmental Sciences-Econ
    • Econ-Linguistics
    • CS-Econ
  • Single majors:
    • Economics x 4
    • Molecular Bio & Biochem x 3
    • International Politics and Econ, Neuroscience, Bio, CS


  • Mixture of lab & lecture: students bring their own laptops to class.
  • Use real, messy, complex data.
  • Discussions in class.
  • This class uses R, but is not a class on R. I try to teach things in a language agnostic fashion.
  • "Minimizing prerequisites to research.", quote by George Cobb.

George Box


Data Science is like Writing Papers

Doing data analysis is part art/part science. There is

  • Substance: what is trying to be said
  • Style: how it is being said i.e. writing/coding

There is no instruction manual or code book for good writing/good data analysis.

Motivation for this Talk

Two goals:

  • Have students learn to code the way people learn in real life.
  • Fascilitate giving feedback.


  • Lots of collaboration and peer learning.
  • Homework as practice rather than evaluative.
  • GitHub.


GitHub is a web based repository for code widely used by the open-source coding community. It provides tools for

  • Version control
  • Collaboration


How to give Feedback?

From a colleague in the humanities:

"Don't give feedback on everything. Focus on the top 5 things they need to work on, and drill down."

Both for their sake and for your own.


Example of a student's recent HW.


The thing about GitHub…

  • When it works, it works great.
  • When it doesn't, its… xkcd