Everything in this course builds up to the final group project, where there is only one learning goal: Engage in the data/science research pipeline in as faithful a manner as possible while maintaining a level suitable for novices. To get practice however there will be 3 mini-projects first.

Drawing

Mini-Project 1

Partially assigned on Friday 9/27, completely assigned on Monday 9/30, due Friday 10/11 at 5pm.

Basic outline:

  1. Identify a question about how you use your time that you feel comfortable sharing with your partner and me.
  2. Start the data collection process: start logging time in Google Calendar, macOS Calendar, and Outlook.
  3. Import the calendar data into R. Do this early, do this often!
  4. Exchange data! You will pass your question and data to your partner, and they will do an analysis by creating two data visualizations.
  5. Write a joint reflection piece on this experience, keeping the podcast in mind. In particular
    • As someone who provides data: What expectations do you have when you give your data?
    • As someone who analyzes other’s data: What legal and ethical responsibilites do you have?
    • The joint reflection piece should be of no more than 500 words. I suggest you write it in Google Docs and then export to PDF.

The following screencast illustrates Steps 2 and 3 above:

Step 0: Identify groups

  • Look in Slack in #mp1 channel to see your group, your group number, and who your group leader is.

Step 1: Download starter code

  • Download MP1.zip and double-click it to “unzip” it
  • Rename the MP1_lastname_firstname.Rmd file to have your last and first name
  • Knit the .Rmd file once and read it over
  • Create two data visualizations and provide an analysis to answer your partner’s question

Step 2: Submit

  1. Everyone: Submit the following three files on Moodle:
    1. The .ics calendar file with your partner’s calendar data. Added 9/30: You should both be logging calendar data until Wed 10/9.
    2. The .Rmd R Markdown file analyzing your partner’s calendar data.
    3. The .html output file
  2. Everyone: Fill out the peer evaluation Google Form
  3. Group leader only: PDF of 500 word joint reflection piece.

Tip: Minimally viable product

When building a product, IMO:

Don’t: Try to do everything completely and perfectly from the beginning. This leads to perfectionist thinking, which leads to procrastination and “analysis paralysis”.

Do: Start by finishing a minimially viable product FAST:

Only you’re done your MVP, iterate and improve by adding complexity.

In other words:


Mini-Project 2

Assigned on Friday 10/18, due Friday 11/1 at 5pm.

Data

You will work with a partner to analyze Federal Election Commission data based on the 2011–2012 federal election cycle, as provided by the Federal Election Commission. Be sure to read the supporting documentation for these data.

The theme of your analysis will be follow the money as quoted by Lester Freamon from the HBO show The Wire. Only for your project the context will not be the drug war in Baltimore MD, but rather political contributions in elections.


Examples

Here are two randomly chosen examples to give a qualitative sense of the outcome. Note they both can be improved on.

Doing the project

  • Step 0: Look in Slack in #mp2 channel to see your group, your group number, and who your group leader is.
  • Step 1: Setup the MP2 RStudio project
    1. Download MP2.zip.
    2. Move MP2.zip to wherever you keep your 192 files.
      • macOS users: Double-click MP2.zip to extract the MP2 folder.
      • Windows users: Extract the contents of MP2.zip to a folder.
    3. Double click the MP2.Rproj file to open RStudio in “Project mode”:
    4. Verify that you are in RStudio Project mode by looking at the top right corner of RStudio. You should see this:
    5. Knit MP2.Rmd and go over the example R Markdown presentation tricks.
    6. View() all four data frames that are loaded into R: candidates, committees, contributions, house_elections.
  • Tips:
    1. Always work in RStudio Project mode. This will help R locate your .csv files easily.
    2. Knit early and knit often. For example:
      • If you add 5 lines of code and the R Markdown doesn’t knit, you can easily locate the error.
      • If you add 500 lines of code and the R Markdown doesn’t knit, you’ll have a harder time locating the error.
    3. If you’re confused about the meaning of variables in the four datasets, be sure to read the supporting documentation for these data.
    4. Ask questions in the #mp2 slack channel.

Grading rubric

Note: This rubric is likely not perfect. Please don’t be shy to ask for clarifications.

  1. Baseline: Projects that do not satisfy all “baseline” criteria can expect to get a grade of less than 8/10.
    • The MP2.Rmd R Markdown file must “knit” correctly when someone else knits it. This is known as producing reproducible research.
    • All visualizations have appropriately labeled axes, legends, titles, etc. Such information gives the data’s context.
    • All visualizations are mindful of the ink/information ratio.
    • From the report reader’s perspective, there should be no “superfluous” and non-informative output (examples include instructions text). You must be empathetic to your reader, or your reader will just throw your report in the trash.
    • You submit your peer evaluation.
  2. Minimally viable product: Grade: 8/10.
    • Satisfy all “baseline” criteria.
    • Create one data visualization that “follows the money” i.e. involves campaign contributions and include a written analysis.
  3. Due diligence: Grade: 9/10.
    • Satisfy all “baseline” and “minimally viable product” criteria.
    • The data visualization from the “minimally viable product” should be based on at least two joined data frames.
  4. Point of diminishing returns: Grade: 9.5/10.
    • Satisfy all “baseline”, “minimally viable product”, and “due diligence” criteria.
    • Create a second data visualization that builds on the first data visualization, but is also non-redundant to the first. Include a written analysis.
  5. Polishing the cannonball: Grade: 10/10
    • Satisfy all “baseline”, “minimally viable product”, “due diligence”, and “point of diminishing returns” criteria.
    • Written text does an exceptional job of not only addressing “What is happening?” questions, but also “Why is this happening?” questions (this criteria is subjective).

To submit

  1. Each group will make only one submission: the group leader will submit a .zip file of all the contents of the MP2 folder on Moodle.
  2. Individually: Fill out the peer evaluation Google Form

Mini-Project 3

Assigned on Monday 11/11, due Friday 11/22 at 5pm 9pm.

Data

  • Any data of your choosing, as long as you can make one interactive map using leaflet (not shiny) and one static map using the sf package.

Examples

Here are two arbitrarily chosen examples to give a qualitative sense of the outcome. Note they both can be improved and the criteria of the MP has shifted slightly.

Doing the project

  • Step 0: Determine a group leader
  • Step 1: Setup the MP3 RStudio project
    1. Download MP3.zip.
    2. Move MP3.zip to wherever you keep your 192 files.
      • macOS users: Double-click MP3.zip to extract the MP3 folder.
      • Windows users: Extract the contents of MP3.zip to a folder.
    3. Double click the MP3.Rproj file to open RStudio in “Project mode”
  • Tips:
    1. Always work in RStudio Project mode. This will help R locate your data files easily.
    2. Knit early and knit often. For example:
      • If you add 5 lines of code and the R Markdown doesn’t knit, you can easily locate the error.
      • If you add 500 lines of code and the R Markdown doesn’t knit, you’ll have a harder time locating the error.
    3. Ask questions in the #mp3 slack channel.

Grading rubric

Note: This rubric is likely not perfect. Please don’t be shy to ask for clarifications.

  1. Baseline: Projects that do not satisfy all “baseline” criteria can expect to get a grade of less than 8/10.
    • This will be strictly evaluated: The MP3.Rmd R Markdown file must “knit” correctly when someone else knits it. This is known as producing reproducible research.
    • Wherever possible, all maps have appropriately labeled axes, legends, titles, etc. Such information gives the data’s context.
    • All maps are mindful of the ink/information ratio.
    • All accompanying write-ups are coherent and respect the word count limit.
    • From the report reader’s perspective, there should be no “superfluous” and non-informative output (examples include instructions text). You must be empathetic to your reader, or your reader will just throw your report in the trash.
    • You submit your peer evaluation.
  2. Minimally viable product: Grade: 8/10.
    • Satisfy all “baseline” criteria.
    • Include either an interactive map using leaflet (not shiny) or a static map using the sf package.
  3. Due diligence: Grade: 9/10.
    • Satisfy all “baseline” and “minimally viable product” criteria.
    • Include both an interactive map and a static map. These two maps may be “redundant” in that they provide the same information.
  4. Point of diminishing returns: Grade: 9.5/10.
    • Satisfy all “baseline”, “minimally viable product”, and “due diligence” criteria.
    • Your interactive map and static map must now not be “redundant.”
  5. Polishing the cannonball: Grade: 10/10
    • Satisfy all “baseline”, “minimally viable product”, “due diligence”, and “point of diminishing returns” criteria.
    • Changed on 11/13 at 2pm Include a third map of your choice that it not redundant to the first two. Can be either interactive or static.

To submit

  1. Each group will make only one submission: the group leader will submit a .zip file of all the contents of the MP3 folder on Moodle.
  2. Individually: Fill out the peer evaluation Google Form

Final Project

Assigned on Monday 11/24, due Friday 12/20 at 12pm.

In groups of 2-3 students of your choosing you will answer any scientific question of your choosing using data. Your submission will be an RStudio Project of an RMarkdown Website.

  • Use any data of your choosing that answers your scientific question.
  • Your website must at least have the following two pages:
    • An intro index.html page in blog-post style that summarizes your work. It should have
      1. One visualization that best summarizes your work.
      2. No more than 200 words of text
    • A second more_info.html page going more in-depth for people who want more details. In should have
      1. Between 2-3 more visualizations.
      2. No more than 500 words of text
  • Publishing to the web:
    • If your data/analysis is not sensitive or private in nature, publish your R Markdown Website on the web and post the URL in the Groups Google Sheet in the #final_project Slack team.
    • If your data/analysis is sensitive or private in nature, please DM me on Slack and we’ll work something out.

Here are some arbitrarily chosen examples:

  1. America’s Leaky Pipes: The Environmental Impact of Oil Pipeline Spills
  2. 💵The Gigantic Gender Pay Gap💵
  3. Putting Food on the Table in 50 States
  4. Who Are The Noisemakers? | Analyzing the Loudest Parties in NYC
  5. Bigfoot: True or Real?

Doing the project

  1. Choose groups of 2-3 members then fill in your group information in the Google Sheet in the #final_project Slack channel. If you need a group, DM me on Slack with your Section number.
  2. Do this as soon as possible: Finding data is often the hardest part of such projects
    • Ideally, determine a research question you are interested in. Then find this data.
    • Often however you can’t find data that answers your desired question, but rather only some modified form of your question. This is fine, done is better than perfect.
    • If you are still stuck, find data first and then reverse-engineer a question. Not ideal, but again, done is better than perfect.
    • Use Smith College data resources. In particular:
  3. Setup the Final Project RStudio Project
    1. Download final_project.zip.
    2. Move final_project.zip to wherever you keep your 192 files.
      • macOS users: Double-click final_project.zip to extract the final_project folder.
      • Windows users: Extract the contents of final_project.zip to a folder.
    3. Double click the final_project.Rproj file to open RStudio in “Project mode”

Submitting the project

  • Group leader only:
    1. On Moodle: Submit a .zip archive file of the RStudio Project folder necessary to build your RMarkdown Website. There will be only one submission per group.
    2. In the #final_project Slack team, go to the Groups Google Sheet, then post the URL to your RMarkdown Website published online (as long as data/work is not private or sensitive in nature).
      You have two options for publishing your website, both being equally acceptable. Either:
      • The beginner-friendly way: Using Netlify drag-and-drop as we saw in Lec30
      • The more advanced, but more efficient way: Committing and pushing your RStudio Project to Github and publishing the content of your docs/ folder to GitHub pages: Go to your repo’s Settings page and change the GitHub Pages source link like this. For example, this is how I publish this course webpage.
  • Individually:
    1. Fill out the peer evaluation Google Form.
    2. Fill out the exit survey Google Form.

Grading rubric

Nte: his rubric is likely not perfect. Please don’t be shy to ask for clarifications.

  1. Baseline: Projects that do not satisfy all “baseline” criteria can expect to get a grade of less than 8/10.
    • The RMarkdown Website must “build” correctly when someone else builds it. This is known as producing reproducible research.
    • Wherever possible, all visualizations have appropriately labeled axes, legends, titles, etc and are mindful of the ink/information ratio.
    • Your code is clean, commented, and well-indented. See both your MP2 and MP3 video feedback for individualized feedback and the tidyverse style guide. This is being mindful of the work you create for collaborators, including future you.
    • From the perspective of both the report reader and potential collaborators who might reproduce your work, no “superfluous” and non-informative inputs or outputs are included. This is being mindful of the work you create for readers and collaborators, including future you. Ex: Do not
      1. Include my original instructions in the report.
      2. Include “Data dumps” like showing all the contents of a 1000 row data frame.
      3. Submit files in your RStudio Project folder that aren’t relevant to your analysis.
    • All accompanying write-ups are coherent and respect the word count limit.
    • You submit both your peer evaluations and your exit survey responses.
  2. Minimally viable product: Grade: 8/10.
    • Satisfy all “baseline” criteria.
    • Complete the intro index.html blog-post page.
  3. Due diligence: Grade: 9/10.
    • Satisfy all “baseline” and “minimally viable product” criteria.
    • Complete the second more_info.html page with 1 visualization.
  4. Point of diminishing returns: Grade: 9.5/10.
    • Satisfy all “baseline”, “minimally viable product”, and “due diligence” criteria.
    • Complete the second more_info.html page with 2-3 visualizations.
  5. Polishing the cannonball: Grade: 10/10
    • Satisfy all “baseline”, “minimally viable product”, “due diligence”, and “point of diminishing returns” criteria.
    • Impress me. Yes, this criteria is deliberately vague. However, you’ve had the semester to get to know me and I’ve given you all individualized and personalized feedback for three mini-projects. Think of this is as an exercise in “knowing your audience,” a skill that is part art and part science.