Spring 2016

  • Instructor: Albert Y. Kim
  • Email: aykim@middlebury.edu
  • Office: Warner 310
  • Times: MWF 11:15-12:05
    • Mostly Warner 506
    • Occasionally in Wilson Media Lab in the rear of the first floor of Davis Library (See Homework section below)
  • Office Hours: Held in the math lounge on the third floor of Warner. Feel free to come to the MATH 311 office hours.
    • M 3:00-4:30
    • Tu 2:00-3:30
    • W 2:30-4:00 (MATH 311)
    • F 1:00-2:30 (MATH 311)
    • or by appointment

Course Description

In this course students will gain exposure to the entire data science pipeline: forming a statistical question, collecting and cleaning data sets, performing exploratory data analyses, identifying appropriate statistical techniques, and communicating the results, all the while leaning heavily on open source computational tools, in particular the R statistical software language. We will focus on analyzing real, messy, and large data sets, requiring the use of advanced data manipulation/wrangling and data visualization packages. Students will be required to bring their own laptops as many lectures will involve in-class computational activities.

Course Schedule

All problem sets and solutions will be posted either on


  • Textbook: There is no required textbook for this class. All materials with either be provided or freely accessible on the web.
  • Software: See the Software section.


Item Weight   Notes
Final Project   45% See below.
Final Presentation   20% In-class.
Participation & Engagement   15%  
Evaluation of Other Final Presentations   10% In-class.
5 Homeworks 10% HW assigned/due on Wednesdays.


Each homework counts for only a nominal portion of the final grade. As such:

  • They should be viewed as opportunities to
    • receive feedback on the progress of one’s learning, rather than as evaluations that require an explicit numerical grade.
    • develop one’s data science toolbox necessary for the final project.
  • Collaboration on the homeworks, especially the computing components, is highly encouraged. See policy on homework collaboration below. However, collaboration taken too far is copying, and such repeated behavior will impair the development of your data science toolbox.
  • The homework schedule is below. Note, lecture will take place in the Wilson Media lab in Davis Library on dates in bold. I will remind you beforehand.

Final Project/Presentation

  • Proposal is due Wednesday April 5th.
  • You will give an in-class presentation sometime during the final 4 lectures and will be receiving (anonymized) feedback from your peers immediately after.
  • The final write-up is due the last day of exam week, Tuesday May 24th 2016.


  • Homeworks:
    • Due at the beginning of lecture.
    • All collaborations must be explicity acknowledged at the beginning of your submissions.
  • Email:
    • I much prefer answering questions in person during office hours than by email, as I’ll have a much easier time diagnosing the question and can offer better answers. Furthermore, this facilitates the transfer of tacit knowledge.
  • Absences:
    • There is no need to inform me of absences. Please consult Moodle and your peers for what you missed.

Honor Code

  1. See policy on homework collaboration above.
  2. All exams will be closed book and closed notes with no consulting, unless otherwise specified.
  3. The Honor Code Statement must be written and signed on each exam.
  4. I expect you to take the Honor System and intellectual honesty very seriously.


  • Academic accommodations for disabilities: Students with documented disabilities who believe that they may need accommodations in this class are encouraged to contact me as early in the semester as possible to ensure that such accommodations are implemented in a timely fashion. Assistance is available to eligible students through Student Accessibility Services. Please contact Jodi Litchfield, the ADA coordinator, at litchfie@middlebury.edu or 802.443.5936 for more information. All discussions will remain confidential.