Learning Goals

Only one: The first course objective in the syllabus: Have students engage in the data/science research pipeline in as faithful a manner as possible while maintaining a level suitable for novices:

Drawing

Format

  • Your final project will center around writing a “data journalism” style newspaper article suitable for publication in The Middlebury Campus newspaper. In other words, your target audience is the Middlebury community. Examples of such types of journalism can be found on:
  • You must flex your “data science” muscles you’ve built this semester. In particular, data visualization and wrangling.

Past Examples

2016-09 Fall

In the spirit of reproducible research, here is a link to all source code/data for these projects.

Group Members Title
Sophia Konanc and Maddie Maloney Where Should MiddKids Ski This J-Term?
Luisangel Osorio and Wengel Kifle Drug Use Within Age Groups
Sarah Koenigsberg, Caroline Cating, and Rebecca Conover Quantifying Middlebury’s Collective Consciousness
Zach Levitt, David Valentin, and Joe Moscatelli Who Sits at the Top of NESCAC Stack? (interactive; takes 30s to load)
Sierra Moen, Tina Chen, and Jared Whitman Sexism on the Silver Screen: Exploring Film’s Gender Divide
Sam O’Keefe and Ian Strohbehn Marijuana Retail and Production Dispersal in Colorado
Teddy Henderson and Will Perry Unemployment and Migration in the United States
Stefan Asamoah and Steven Lillis Will Your Major Field Change Your Prospects for Employment?
Jack Kagan and Joccelyn Alvarado Looking for Mental Health Care? West is Best
Annie Glassie and Julia Keith The Systematic Gender Gap in STEM Fields: Why Should We Care?

2017-02 Spring

In the spirit of reproducible research, here is a link to all source code/data for these projects.

Group Members Title
Ry Storey-Fisher, Elana Feldman, and Lisa Schroer An Analysis of Crime in Chicago
Kelsie Hoppes and Parker Peltzer Watch Out: Patterns in Drunk Driving Fatalities
Joe O’Brien, Conor Himstead, Rebecca Lightman Foodborne Illness Trends in America: To Fear or not to Fear?
Maya Gomez, Caroline Colan, & Julian Joseph Testing GREEN: Comparing the Enviromental Impact of New York City and Vermont
Jacob Volz, Zeb Millslagle Factors of Success at the Olympic Games
Claire White-Dzuro, Clare Robinson, and Dylan Mortimer Where Should MiddKidds Live After Graduation?
Thea Bean, Griffin Hall, & Jay Silverstein What makes an art museum?
Daniel Turpin and Naing Thant Phyo Opioid Use in the US

Important Dates

  1. By Fri Apr 28 - Project Proposal: Project proposal (see below) in both printed and electronic format and an electronic submission of:
    • All data files
    • An .R script that loads the data so you can View() them.
  2. By Fri May 5 - Feedback Session: To make sure your project idea is feasible, after you’ve submitted your proposal, your group must schedule to speak to me during office hours so that I can give feedback. While this must be done by Fri May 5, please note the earlier you complete this, the more breathing room this will give you.
  3. By Tue May 23 at 12pm - Electronic-Only Final Project Submission.

1. Project Proposal

Data Sets

  • Your immediate goal should be to get some data loaded into RStudio. This may take some help from me as finding the right data sets is very important, as they need to balance:
    • Being complex enough to use the data science toolbox developed this semester.
    • Being rich enough to be able to answer meaningful scientific questions with.
    • Not being so complex and rich that you are overwhelmed, as you are only novices.
  • Having two or more different data sets to join is not an explicit requirement for this project; your scientific question will dictate this need.
  • Suggested Sources:
    • Whatever you’re interested in!
    • Make an appointment with the Middlebury Data Services Librarian Ryan Clement at go/ryan/
    • Kaggle.com is a machine learning/prediction website. See their data set list.
    • data.world is a new repository of data from various disciplines.
    • The fivethirtyeight R package. See data set list. Not all data sets in this package are allowable, so be sure to consult with me as soon as you can.

Write-Up

Your group proposal (to be submitted in print and electronically on Slack) should contain the following:

  1. Title: The title of your project.
  2. Group Members: List of all group members.
  3. Purpose: Describe the general topic/phenomenon you want to explore:
    • Why should a Middlebury student be interested in your work?
    • What do you hope people will learn from your project?
  4. Scientific Question: Journalism, just like academic writing, has the goal of answering questions, but with a slightly more informal tone. Andrew Flowers of 538 referred to it as “social science on demand.” What is the scientific question you want to answer using data?
  5. Data Sources: Describe where you will find/access your data. Be as specific as you can, listing URLs and file formats if possible.
  6. Data Format: Describe what your data set looks like:
    • How many tables will you have? What are the observational units of each table?
    • How many rows does each table have?
    • How many columns does each table have and what are their names i.e. the variables? What are their units?

2. Feedback Session

All done!

3. Final Project Submission

Electronic-Only Final Project Submission:

  • Due Tuesday 5/23 at 12pm:
    1. One group member should Slack direct message me and all group members:
      1. A link to a Dropbox shared folder that
        • Is shared with me: albert.ys.kim@gmail.com. You can share this with me before the project is due.
        • Contains A Final_Project.Rmd file that completely reproduces your analysis i.e. I should have to press Knit only once to recreate the entire HTML page.
        • All necessary data files.
      2. An Rpubs.com link of your R Markdown HTML document published on the web. Instructions: After Knitting your HTML file, click the blue “Publish” button on the top right of the HTML page and follow all instructions.
    2. Individually:
      1. A Google Forms exit survey, posted here.
  • Important: Your project won’t be considered submitted until
    1. All the above components are submitted
    2. I give you confirmation on Slack that everything looks good.
  • Honor Code: This is the equivalent of an academic term paper; all honor code rules about plagarism and citations apply.

Hint: The kable() function from the knitr package is useful for outputting tables in a clean format. For example:

library(knitr)
example <- data_frame(
  x = c("A", "B", "C"),
  y = c(1, 2, 3),
  z = c(2, 3, 4)
)
example %>% 
  kable(digits=3)
x y z
A 1 2
B 2 3
C 3 4

instead of raw code output

example
## # A tibble: 3 × 3
##       x     y     z
##   <chr> <dbl> <dbl>
## 1     A     1     2
## 2     B     2     3
## 3     C     3     4