## MATH 116: 2017 Spring

• Instructor: Albert Y. Kim - Assistant Professor of Statistics
• Email: aykim@middlebury.edu Slack team: midd-math116.slack.com
• I will respond to emails Slack messages within 24h, but not during weekends.
• Please only email Slack message me with administrative and briefer questions as I prefer addressing more substantive questions in person.
• Class Location/Time:
• MWF 11:15–12:05 in Warner 506 and Th 11:00-12:15 in McCardell Bicentennial Hall 530 (changed to) 503.
• You do not need to inform me of absences. Please consult your peers for what you missed.
• Office Hours: Warner 310 or the math lounge just outside.
• Drop-In Tutoring
• Specific to MATH116: CTLR conference room (rear of Davis Library) Thursdays 7pm-9pm.
• R Tutoring: Warner 203 Sundays 7:30pm-9:30pm.

## Course Description and Objectives

#### Description

A practical introduction to statistical methods and computational tools needed to make sense of data. This course is an evolution of many traditional introductory statistics courses in that computing plays a more central role than mathematics and a higher emphasis is placed on “thinking with data.” Topics include data visualization, data wrangling, confidence intervals, hypothesis testing, and regression. The course has no formal mathematics or computer science prerequisites, and is especially suited to students in the physical, social, environmental, and life sciences who seek an applied orientation to data analysis.

#### Objectives

1. Have students engage in the data/science research pipeline in as faithful a manner as possible while maintaining a level suitable for novices.
2. Foster a conceptual understanding of statistical topics and methods using simulation/resampling and real data whenever possible, rather than mathematical formulae.
3. Blur the traditional lecture/lab dichotomy of introductory statistics courses by incorporating more computational and algorithmic thinking into the syllabus.
4. Introduce best practices for reproducible research and collaboration.
5. Develop statistical literacy by, among other ways, tying in the curriculum to current events, demonstrating the importance statistics plays in society.

## Topics

Roughly speaking we will cover the following topics (a more detailed outline can be found here):

1. Introduction and Tools (R, RStudio, and R Markdown)
2. Data:
• Data representation
• Data visualization
• Data manipulation/wrangling/munging
3. Statistical inference:
• Background and terminology
• Confidence intervals
• Hypothesis testing
4. Regression:
• Simple linear regression
• Multiple regression

## Materials

• Textbook: “ModernDive: An Introduction to Statistical and Data Sciences” by Ismay and Kim, available at http://www.moderndive.com.
• Software: Instead of using the desktop version of the RStudio interface to R, we will be using the cloud-based RStudio Server, which you can access in your browser via go/rstudio/. Note if you are off-campus you must first log into the Middlebury VPN.
• Online: DataCamp. A brower based interactive tool for learning R and python.

## Evaluation

There are four components to your final grade: problem sets, engagement, midterms, and the final project.

#### 1) Weekly Problem Sets 10%

The problem sets in this class should be viewed as low-stakes opportunities to develop one’s statistics and data science muscles and receive feedback on the progress of one’s learning, instead of viewing them as evaluative tools used by the instructor to assign grades. To reinforce this thinking, each problem set is worth only a nominal portion of the final grade.

While I encourage you to discuss problem sets with your peers, you must submit your own answers and not simple rewordings of another’s work. Furthermore, all collaborations must be explicitly acknowledged at the top of your submissions.

• Assigned/due on Fridays.
• Lowest two scores dropped.
• No extensions for any problem sets will be granted.

#### 2) Engagement 10%

It is difficult to explicit codify what constitutes “an engaged student,” so instead I present the following rough principle I will follow: you’ll only get out of this class as much as you put in. Some examples of behavior counter to this principle:

• Not participating in in-class exercises.
• Engaging so little, either in class or office hours, that I don’t know what your voice sounds like.
• Submitting problem set that has code or content that is copied from (or only slightly modified versions of) your peers’ work, going against the philosophy of the problem sets being opportunities for practice and feedback, rather than as items to be graded on.

#### 3) Three Midterms 45%

• Midterm dates: Wed 3/8 (in-class), Wed 4/12 (evening), and during finals week Fri 5/19 7pm-10pm in Warner 506.
• All midterms are cumulative and may require a scientific calculator, so please have access to one (no smartphones).
• There is no extra-credit work to improve midterm scores after the fact.
• There will be no make-up nor rescheduled midterms, except in the following cases if documentation is provided:
• serious illness or death in the family.
• athletic commitments or religious obligations if and prior notice is given. In such cases, rescheduled exams must be taken before the rest of the class.

#### 4) Final Project 35%

Rather than a final exam, there will be a final capstone group project. This is an opportunity for you to flex your statistics and data science muscles developed during the problem sets and perform your own start-to-finish data analysis project. The project will involving you addressing a scientific question by choosing a data set, performing an analysis using the concepts and tools we have covered in this course, and writing a report.

• Due Tue 5/23 at noon.
• Groups of no more than three will be assigned by me.
• A system will be in place to hold your group peers accountable for their work.