Learning Goals

Only one: The first course objective in the syllabus: Have students engage in the data/science research pipeline in as faithful a manner as possible while maintaining a level suitable for novices:

Drawing

Format

  • Your final project will center around writing a “data journalism” style newspaper article suitable for publication in The Middlebury Campus newspaper. In other words, your target audience is the Middlebury community. Examples of such types of journalism can be found on:
  • You must flex your “statistical and data sciences” muscles you’ve built this semester. In particular:
    • Data visualization and manipulation
    • Statistical Inference: hypothesis testing, confidence intervals, and regression (coming up)

Administrative Notes

  • Submission Format: Each group must create a single RStudio Server shared project that is shared amongst the group members and with me (please email the URL to me). All your work will be centrally located here; do not email me any files.
  • Project Proposal: Submitted latest by Monday Nov 21 11am
    • In class: Printed copy of project proposal
    • In the RStudio Server group shared folder:
      • Electronic copy of project proposal
      • All data files and if possible a .R script that loads them into R so that you can View() them.
  • Feedback Session: Held latest by Friday Dec 2nd
    • To make sure your project idea is feasible, after you’ve submitted your proposal and I’ve read it, your group must schedule to speak to me during office hours so that I can give feedback. While this must be done by Friday Dec 2nd, please note the earlier you complete this, the more breathing room this will give you.
  • Electronic-Only Final Project Submission: Due Sunday Dec 18 Noon
    • In the RStudio Server group shared folder:
      • A final_project.Rmd file that completely reproduces your analysis i.e. I should have to press Knit only once to recreate the entire HTML page.
      • All necessary data files.
    • Individually: A Google Forms survey, which I will email on Fri Dec 16 by 5pm.
    • Your project won’t be considered submitted until I give you email confirmation that everything looks good.
  • Office Hours: During exam week
    • Tuesday Dec 13: 9-12 and 1-4
    • Wednesday Dec 15: 9-12 and 1-4
    • Friday Dec 16: 9-12
    • I might be out of town that weekend, so email responses might be slow.
    • Saturday Dec 10: 10-4pm in Warner 310
    • Wednesday Dec 14: 10-4pm by request on Google Hangouts or FaceTime
    • Saturday Dec 17: 12-4pm by request in Warner 310
  • Honor Code: This is the equivalent of an academic term paper; all honor code rules about plagarism and citations apply.

Project Proposal

Write-Up

Your group proposal (to be submitted in print and electronically in a RStudio Server shared project) should contain the following:

  1. Title: The title of your project.
  2. Group Members: List of all group members.
  3. Purpose: Describe the general topic/phenomenon you want to explore:
    • Why should a Middlebury student be interested in your work?
    • What do you hope people will learn from your project?
  4. Scientific Question: Journalism, just like academic writing, has the goal of answering questions, but with a slightly more informal tone. What is the scientific question you want to answer using data?
  5. Data Sources: Describe where you will find/access your data. Be as specific as you can, listing URLs and file formats if possible.
  6. Data Format: As described in the Tips section of the Problem Set 06 discussion, describe what your tidy format data set looks like:
    • How many tables will you have? What are the observational units of each table?
    • How many rows does each table have?
    • How many columns does each table have and what are their names i.e. the variables? What are their units?

Data Sets

  • Your immediate goal should be to get some data loaded into RStudio. This may take a few rounds of back-and-forth discussion with me as finding the right data sets will be among the bigger challenges of this project, as it needs to balance:
    • Being complex enough to use the statistical and data sciences toolbox developed this semester.
    • Being rich enough to be able to answer meaningful scientific questions with them.
    • Not being so complex and rich that you are overwhelmed, as you are only novices.
  • Having two or more different data sets to join is not an explicit requirement for this project; your scientific question will dictate this need.
  • If you’re having trouble finding specific data that you have in mind, make an appointment with the Middlebury Data Services Librarian Ryan Clement.

Electronic-Only Final Project Submission

Roughly speaking, your final project submission will have two major components:

  • The actual journalism article. As to knowing the guidelines (length, format, tone), just ask yourself: “What does a newspaper/online news story sound like?”
  • A supplementary materials section that goes more in depth about the nitty-gritty details, that one might want to spare casual readers from, but include for people wanting to learn more.
  • I will post the template final_project.Rmd file in the Problem Sets section of the course webpage by Monday Nov 14th.