MATH 216: Fall 2016

  • Instructor: Albert Y. Kim - Assistant Professor of Statistics
  • Email:
    • I will respond to emails within 24h, but not during weekends.
    • Please only email me with administrative and briefer questions as I prefer addressing more substantive questions in person.
  • Class Location/Time:
    • MWF 9:05-9:55 in Warner 506.
    • Davis Library 105A (downstairs on the right) on the following days: Wed 10/12, Wed 10/26, Wed 11/9, Mon 11/28.
    • You do not need to inform me of absences. Please consult your peers for what you missed.
  • Office Hours: Warner 310 or the math lounge just outside. Feel free to come to the MATH 116 office hours as well, but those students get priority then.
    • M 1:00-2:30 and W 2:30-4:00
    • MATH 116: M 2:30-4:00 and W 1:00-2:30
    • Or by appointment

Course Description and Objectives

In this course students will gain exposure to the entire data science pipeline: forming a statistical question, collecting and cleaning data sets, performing exploratory data analyses, identifying appropriate statistical techniques, and communicating the results, all the while leaning heavily on open source computational tools, in particular the R statistical software language. We will focus on analyzing real, messy, and large data sets, requiring the use of advanced data manipulation/wrangling and data visualization packages. Students will be required to bring their own laptops as many lectures will involve in-class computational activities.


Roughly speaking we will cover the following topics (a more detailed outline can be found here):

  1. Data manipulation and visualization
  2. Regression
  3. Dates and times
  4. Maps and spatial data
  5. Text data


1) Textbook

There is no textbook to purchase; as much as possible, we will rely on open-source and freely available materials on the web.

2) Computing and Software

We will chiefly be using R via the RStudio integrated development environment (IDE). Please see Setting Up for all the software and accounts we will be using.


There are four components to your final grade: homeworks, quizzes, engagement, and the final project.

1) Homeworks 10%

The five homeworks in this class should be viewed as low-stakes opportunities to develop one’s data science toolbox and receive feedback on the progress of one’s learning, instead of evaluative tools used by the instructor to assign grades. To reinforce this thinking, each homework is worth only a nominal portion of the final grade. However, not making an honest effort on the homeworks will ultimately hurt you for your (individual) final project.

Collaboration on the homeworks is highly encouraged as in many situations learning is best done in groups, especially when it comes to coding. However you must submit your own answers and not simple rewordings of another’s work. Furthermore, all collaborations must be explicitly acknowledged at the top of your submissions.

The typical homework workflow is:

  1. Homework assigned on a Wednesday.
  2. One week later: Pre-submission due.
  3. Two weeks later: Homework due.
  4. Three weeks later: Post-submission due. Mini-presentations and group discussion in Library 105A.

Click here for a visualization of this timetable. Since this is a college-writing course, instructor feedback on student work and writing is an essential component, thus

  • Everyone must submit work on the homework due date.
  • You must complete either the pre-submission or the post-submission, but not both. That being said, feel free to complete both!
  • You must incorporate any feedback given for the earlier submission in the later submission.

2) Quizzes 20%

There will be a weekly quiz at the beginning of lecture on Mondays. The level of these quizzes will be such that they don’t require extensive studying; if you actively participate in class and are keeping up with the material, you will do fine. Notes:

  • It is your responsibility to be on time.
  • There will be no makeup quizzes.
  • The lowest two quiz scores will be dropped.

3) Engagement 10%

It is difficult to explicit codify what constitutes “an engaged student”, so instead I present the following rough principle I will follow: you’ll only get out of this class as much as you put in. Some examples of behavior counter to this principle:

  • Merely attending lectures and not participating in discussions.
  • Leveraging previous experience in other settings to coast through this course.
  • Not coming to office hours when the situation warrants it.
  • Submitting homework that has code or content that is copied from (or only slightly modified versions of) your peers’ work, going against the philosophy of the homeworks being opportunities for practice and feedback, rather than as items to be graded on.

4) Final Project 60%

Much of this course is a build up to the final project, which is a capstone experience synthesizing everything you’ve learned over the course of the semester. There are 3 components that sum to 60%:

  • Project proposal: Due Fri 10/28
  • Write-up 40%: Due on Fri 12/16 at 2pm
  • Oral presentation 15%: 10 minute presentations during the last 4 lectures with the presentation order determined at random a few weeks before.
  • Evaluation of final presentations 5%: You will be giving (anonymized) feedback on your peers’ presentations, on which you’ll be graded on the quality of your feedback.

Academic Accommodations for Disabilities

Students with documented disabilities who believe that they may need accommodations in this class are encouraged to contact me as early in the semester as possible to ensure that such accommodations are implemented in a timely fashion. Assistance is available to eligible students through Student Accessibility Services. Please contact Jodi Litchfield, the ADA coordinator, at or 802.443.5936 for more information. All discussions will remain confidential.