Extension Requests

  • You have an extension “budget” of a maximum of 5 days for the rest of the semester; it’s up to you to keep track of your budget
  • Request must be made before due date/time.
  • Google Form

Mini-Project 1

Due Wednesday 10/13 at 9pm.

Basic outline

  1. Identify a question about how you use your time that you feel comfortable sharing with your partner and me.
  2. Start the data collection process: start logging time in Google Calendar, macOS Calendar, or Outlook.
  3. Export the calendar to .ics file format and then import into R.
    • The screencast demonstrates how to do this with Google Calendar.
    • Test this early and test this often!
  4. Exchange data! You will pass your question and data to your partner, and they will do an analysis with two non-redundant data visualizations.
  5. Write a joint reflection piece on this experience. In particular
    • As someone who provides data: What expectations do you have when you give your data?
    • As someone who analyzes other people’s data: What legal and ethical responsibilities do you have?
    • The joint reflection piece should be of no more than 500 words.

Steps

1. Getting started

  • Find your group in the #mp1 channel. Please identify:
    • Your group number
    • Who your group leader is: whoever in your group is listed as member_1
  • Download MP1.zip and double-click it to “unzip” it
  • Knit the MP1.Rmd file once and read it over
  • Group leader only:
    • Open the template joint reflection piece Google Doc
    • Go to File (next to blue Google Doc icon) -> Make a copy
    • Share it with your partner so that you can both edit it

2. Suggestion: Complete your Minimally Viable Product

IMO when working on any project

  • Don’t try to do everything completely and perfectly from the beginning. This leads to perfectionist thinking, which leads to procrastination and “analysis paralysis.”
  • Do start by finishing a minimially viable product (image 1 in the bottom row of the image above). In other words:
  • Once you’re done your MVP, gather feedback on how your project works. Based on this feedback, then iterate and improve.

3. Submit three things

There are three components to your grade:

  1. Both group members: Submit the MP1.pdf of your analysis of your partner’s question on gradescope.
  2. Both group members: Fill out the peer evaluation Google Form
  3. Group leader only: Submit a PDF of your joint reflection piece on gradescope

4. Grades

On Moodle where

  1. Analysis (gradescope):
    • Viz 1: 35%
    • Viz 2: 35%
  2. Joint reflection piece (gradescope): 25%
  3. Peer evaluation (Google Form): 5%

Details

  • Has to involve intervals of time: a start time and end time. For example, not “I went to sleep at.” but “I slept during these times.”
  • Enter in at least two types of activities in your calendar. This activity type becomes the summary categorical variable with two levels.
  • How long does the analysis need to be? No firm rule, but think: if there are two equally insightful reports, one is 20 pages and the other is 2 pages. which will you read? Or think when you visit a webpage. How long does it take you to decide if you’re going to read it.
  • Graphs can’t be redundant: Think in terms of ink/information ratio: if the graphs are very redundant, then why not show just one?
  • How many much analysis? 5 sentences or less for analysis of each graph.
  • You need to collect at least 10 days worth of data

Mini-Project 2

Due Monday 11/1 at 9pm.

Words of wisdom from Detective Lester Freamon from The Wire:

Basic outline

The theme of your analysis will be follow the money as quoted in the video above. You will work with a partner to analyze Federal Election Commission data based on the 2016 federal election cycle, as provided by the Federal Election Commission. We’ll be accessing this data in the fec16 package, which was developed by Prof Ben Baumer, Rana Gahwagy, Irene Ryan, and Marium A. Tapal from Smith College. Check out Marium’s poster from the Women in Statistics and Data Science 2020 conference:

Here are two randomly chosen examples to give a qualitative sense of the outcome, there are many ways to do this project.

Steps

1. Getting started

  • Find your group in the #mp2 channel. Please identify:
    • Your group number
    • Who your group leader is: whoever in your group is listed as member_1
  • Download MP2.Rmd
  • Knit the MP2.Rmd file once and read it over
  • Read the fec16 poster Marium Tapal presented to get an overview of the package, especially the datasets: some include full data by default, others only include a sample for which if you want full data you need to download it from the web (this is because R packages have file size restrictions). See code below.
  • This is A LOT of data; it will be easy to get lost
    • View() is your best friend: simply looking at your data is so powerful, but also a step that’s so easily neglected
    • Read the help files for documentation: Know what all the variables mean, and what the values mean.
    • Start simple: Get that minimally viable first visualization done, then iteratively add complexity.
library(fec16)

# Look at help file
?contributions

# By default the contributions data frame only contains the first 1000 rows:
contributions

# Download full contributions data from web and overwrite contributions using:
# read_all_*() function.
# You will need to install the usethis package for this line of code to work
contributions <- read_all_contributions()

# Now contributions has full 516,394 rows
contributions

2. Grading

  1. Baseline: Projects that do not satisfy all “baseline” criteria can expect to get a grade of less than 7/10.
    • All visualizations have appropriately labeled axes, legends, titles, etc. Such information gives the data’s context.
    • All visualizations are mindful of the ink/information ratio.
    • All code must be visible in your PDF.
    • Citations must be included as footnotes.
  2. Minimally viable product: Grade 7/10
    • Satisfy all “baseline” criteria.
    • Create one data visualization that “follows the money” i.e. involves campaign contributions and include a written analysis.
  3. Due diligence: Grade 8/10
    • Satisfy all “baseline” and “minimally viable product” criteria.
    • The data visualization from the “minimally viable product” should be based on at least two joined data frames.
  4. Point of diminishing returns: Grade 9/10
    • Satisfy all “baseline”, “minimally viable product”, and “due diligence” criteria.
    • Create a second data visualization that builds on the first data visualization, but is also non-redundant to the first. Include a written analysis.
  5. Polishing the cannonball Grade >9/10
    • Satisfy all “baseline”, “minimally viable product”, “due diligence”, and “point of diminishing returns” criteria.
    • Written text does an exceptional job of not only addressing what money-driven effect on politics is being observed, but also attempts to address why this money-driven effect on politics is happening.

3. What to submit

There are two components to your grade:

  1. Group leader only: Submit the MP2.pdf of your analysis on gradescope
  2. Both group members: Fill out the peer evaluation Google Form

Mini-Project 3

Due Tuesday 11/23 11/30 (after break) at 9pm.

Example map courtesy Brianna Mateo and Michel Ruiz Fuentes

Basic outline

Create maps using the sf package. Here are two randomly chosen examples to give a qualitative sense of the outcome.

Note you do not need to submit a interactive map, just a static (non-interactive) map generated with the sf package. If you would like to learn how to make interactive maps on your own, check out the leaflet package.

Steps

1. Getting started

  • Find your group in the #mp3 channel. Please identify:
    • Your group number
    • Designate a group leader.
  • Add the following to the folder that has your MP3.Rproj RStudio Project you created in Lec26:
    • Download MP3.Rmd and add it to the above folder
    • In the above folder create a new directory called data/
    • Download ma_cities.csv and add it to the data/ folder
  • Knit the MP3.Rmd file once and read it over
  • Map making can get very complicated, so start simple. Get that minimally viable first visualization done, then iteratively add complexity.

2. Grading

  1. Baseline: Projects that do not satisfy all “baseline” criteria can expect to get a grade of less than 8/10.
    • This will be strictly evaluated: The MP3.Rmd R Markdown file must “knit” correctly when someone else knits it. This is known as producing reproducible research.
    • Wherever possible, all maps have appropriately labeled axes, legends, and titles and all maps are mindful of the ink/information ratio.
    • From the report reader’s perspective, there should be no “superfluous” and non-informative output (e.g. instructions text). You must be empathetic to your reader, or your reader will just throw your report in the trash.
    • You submit your peer evaluation.
  2. Minimally viable product: Grade: 8/10.
    • Satisfy all “baseline” criteria.
    • Include any map generated by the sf package.
  3. Due diligence: Grade: 9/10.
    • Satisfy all “baseline” and “minimally viable product” criteria.
    • Add to the map by layering points on top of it. The points
      • Need to be imported from an external .csv file that you will find (cite source).
      • Need to overlap the map.
  4. Point of diminishing returns: Grade: 10/10.
    • Satisfy all “baseline”, “minimally viable product”, and “due diligence” criteria.
    • Add to the map by also layering data from a shapefile on it. This shapefile data
      • Needs to be imported data from a shapefile you will find (cite source).
      • Can be points, lines, or polygons.
      • Need to overlap the map.

3. What to submit

There are two components to your grade:

  1. Group leader only: Submit the MP3.zip file of your entire RStudio Project folder on moodle.
  2. Both group members: Fill out the peer evaluation Google Form

Final Project

Assigned on Wed Dec 1, due Fri 12/17 at 2pm 9pm.

Basic outline

Using an R Markdown website you will answer any scientific question of your choosing using data:

  • Your website must at least have the following two pages:
    • An intro index.html page in blog-post style that summarizes your work. It should have
      1. One visualization that best summarizes your work.
      2. No more than 200 words of text
    • A second more_info.html page going more in-depth for people who want more details. In should have
      1. Between 2-3 more visualizations.
      2. No more than 500 words of text
  • Publishing to the web:
    • If your data/analysis is not sensitive or private in nature, publish your R Markdown Website using Netlify Drop
    • If your data/analysis is sensitive or private in nature, please DM me on Slack and we’ll work something out.

Here are some arbitrarily chosen examples. Note the topics covered this semester have shifted.

  1. America’s Leaky Pipes: The Environmental Impact of Oil Pipeline Spills
  2. 💵The Gigantic Gender Pay Gap💵
  3. Putting Food on the Table in 50 States
  4. Who Are The Noisemakers? | Analyzing the Loudest Parties in NYC
  5. Bigfoot: True or Real?

Getting started

  1. Join #final_project Slack channel
  2. Create groups of 2-3 students from any section and post in Google Sheet in #final_project channel in Slack
  3. Finding data. This is often the hardest part of such projects and I suggest you do this as soon as possible
    • Ideally, determine a research question you are interested in. Then find this data.
    • Often however you can’t find data that answers your desired question, but rather only some modified form of your question. This is fine, done is better than perfect.
    • If you are still stuck, find data first and then reverse-engineer a question. Not ideal, but again, done is better than perfect.
    • Use Smith College data resources. In particular:
  4. Setup the Final Project RStudio Project
    1. Download the following zip file: final_project.zip
    2. Move final_project.zip to your SDS192 folder on your computer
    3. Unzip final_project.zip. Windows users: be sure to “Extract all”
    4. In the resulting final_project folder, double-click the RStudio Project final_project.Rproj icon to open in RStudio Project mode

Submitting the project

  1. Group leader only: Submit a final_project.zip file of your entire RStudio Project folder on moodle.
  2. All group members:
    • Fill out the peer evaluation Google Form. Note you will need the Netlify Drop URL of the website you deployed on the web (unless your project involves sensitive non-public data).
    • Fill out exit survey Google Form

Grading rubric

  1. Baseline: Projects that do not satisfy all “baseline” criteria can expect to get a grade of less than 8/10.
    • The RMarkdown Website must “build” correctly when someone else builds it. This is known as producing reproducible research.
    • Wherever possible, all visualizations have appropriately labeled axes, legends, titles, etc and are mindful of the ink/information ratio.
    • Your code is clean, commented, and well-indented.
    • From the perspective of both the report reader and potential collaborators who might reproduce your work, no “superfluous” and non-informative inputs or outputs are included. This is being mindful of the work you create for readers and collaborators, including future you. Ex: Do not
      1. Include my original instructions in the report.
      2. Include “Data dumps” like showing all the contents of a 1000 row data frame.
      3. Submit files in your RStudio Project folder that aren’t relevant to your analysis.
    • All accompanying write-ups are coherent and respect the word count limit.
    • You submit both your peer evaluations and your exit survey responses.
  2. Minimally viable product: Grade: 8/10.
    • Satisfy all “baseline” criteria.
    • Complete the intro index.html blog-post page.
  3. Due diligence: Grade: 9/10.
    • Satisfy all “baseline” and “minimally viable product” criteria.
    • Complete the second more_info.html page with 1 visualization.
  4. Point of diminishing returns: Grade: 9.5/10.
    • Satisfy all “baseline”, “minimally viable product”, and “due diligence” criteria.
    • Complete the second more_info.html page with 2-3 visualizations.
  5. Polishing the cannonball: Grade: 10/10
    • Satisfy all “baseline”, “minimally viable product”, “due diligence”, and “point of diminishing returns” criteria.
    • Impress me by using the tools you’ve learned this semester. Yes, this criteria is deliberately vague.