Everything in this course builds up to the term group project, where there is only one learning goal: Engage in the data/science research pipeline in as faithful a manner as possible while maintaining a level suitable for novices.

# 1. Groups due: Fri 2/8 5pm

## Action items

1. Form groups of 2-3 students where all members can attend the same lab section for the labs where you’ll be working on your projects (see schedule). You do not need to all be registered for the same lab section, merely able to all attend the same lab.
2. Pick a group name. A prize will be given to the team with my favorite team name!
3. Designate a group leader who will do a little administrative work.

## Items to submit

1. Create two Slack direct messages (DM):
• One with all your group members + myself + Jenny: Say 👋 to Jenny and me. Going forward please ask all project related questions here.
• One with only your group members: Send all private messages amongst yourselves here.
2. Nothing to submit on Moodle this round.
3. Complete the following Google Form.

Everybody individually:

Nothing this time.

## Notes

• If you need a group to join, please Slack me.
• All groups members are expected to contribute and you will all be held accountable for your contributions in peer evaluations.

# 2. Data proposal due: Fri 2/22 5pm

## Action items

• Read moderndive for background tools/concepts
• Chapter 2.4.4 (new) on identification vs measurement variables.
• Chapter 5.1 on importing spreadsheet data into R.
• Read the example data proposal written by Jenny and Albert.
• Start working on your data proposal
• Only if you are not working in RStudio Server on the browser, in other words you are working on RStudio Desktop, ensure you have the janitor and readr packages installed; they are already installed on RStudio Server.
• Download the following data_proposal.Rmd template R Markdown file.
• Move it to your introstatsR folder where your work is saved.
• Knit it!
• This is likely the most time-consuming portion of the entire project, so start early: Find a dataset that matches the requirements spelled out in the above data_proposal.Rmd template R Markdown file. Possible sources:
2. Consult Spinelli Center Data Counselor Vahab Khademi (see syllabus for contact info).
4. data.world
5. Kaggle

## Items to submit

Group leader only: On Moodle submit:

1. The data_proposal.Rmd R Markdown file
2. The data_proposal.html HTML report file
3. Only if you didn’t publish your data as a .csv on Google Sheets because it has confidential/private information, then include the .csv or .xlsx spreadsheet file.

Everybody individually:

Nothing this time.

## Notes

• When you have questions:
• Please ask questions that you think the entire class would like to know the answer to in the term_project channel in Slack.
• Ask more individual questions in the group DM that includes all group members + Jenny + Albert.
• Only minimal data wrangling using the dplyr package is expected at this time; you will be doing more for the “project proposal” phase after you’re experienced. That being said, feel free to experiment!
• Don’t forget to use the tutoring hours: Sun-Thurs 7-9pm in the lecture room.

# 3. Project proposal due: Fri 3/8 5pm

## Action items

• Read Jenny and Albert’s example project proposal.
• Start working on your project proposal:
• Download the following project_proposal.Rmd template R Markdown file.
• Move it to your introstatsR folder where your project work is saved.
• Knit it!
• Attend lab on Tuesday March 5th.
• Be sure to attend the lab section where all your groupmates will be present.
• The entire lab period is devoted to working on this phase.
• There will be no problem set assigned that week.
• For data wrangling tips, check out Jenny and Albert’s Become an R Data Ninja page! You don’t need to read all the examples, just read the section headers to see if it addresses your data wrangling needs! You can also download the project_tips.Rmd file.

## Items to submit

Group leader only: On Moodle submit:

1. The project_proposal.Rmd R Markdown file
2. The project_proposal.html HTML report file
3. Only if you didn’t publish your data as a .csv on Google Sheets because it has confidential/private information, then include the .csv or .xlsx spreadsheet file.

Everybody individually:

Nothing this time.

# 4. Submission due: Fri 4/5 5pm

## Action items

• Read Jenny and Albert’s example project submission again using Massachusetts public high school data.
• Download the corresponding source project_submission.Rmd R Markdown file.
• Modify the example project_submission.Rmd to match your project. Note your contents will be different based on the nature of your data and any results.
• Move it to your introstatsR folder where your project work is saved.
• Knit it!
• Complete the following sections:
• Section 1: Introduction
• Section 2: Exploratory data analysis
• Section 3 subsections 3.1, 3.2, and 3.3: Multiple linear regression: Methods, Model Results, Interpreting the regression table.
• Do not complete the following sections: You’ll be completing these for the final stage of this project due on the last day of class after we’ve covered inference for regression:
• Section 3 subsections 3.4, 3.5: Inference for multiple regression
• Section 4: Discussion. You will write this conclusion based on the results of sections 3.4 and 3.5.
• Attend lab on Tuesday April 2nd.
• Be sure to attend the lab section where all your groupmates will be present.
• The entire lab period is devoted to working on this phase.
• There will be no problem set assigned that week.

## Items to submit

Group leader only: On Moodle submit:

1. The project_submission.Rmd R Markdown file with sections 1, 2, 3.1-3.3 completed
2. The project_submission.html HTML report file.
3. Only if you didn’t publish your data as a .csv on Google Sheets because it has confidential/private information, then include the .csv or .xlsx spreadsheet file.

Everybody individually:

Nothing this time.

# 5. Resubmission due: Thu 5/2 5pm

## Action items

• Read Jenny and Albert’s example project resubmission again using Massachusetts public high school data. Note that it is identical to the “project submission” example from above but now with the following sections now completed: Sections 3.4 and 3.5 on inference for multiple regression and section 4 on conclusions, limitations, and future work.
• Download the corresponding source project_resubmission.Rmd R Markdown file.
• Modify the example project_submission.Rmd to match your project. Note your contents will be different based on the nature of your data and any results.
• Move it to your introstatsR folder where your project work is saved.
• Knit it!
• Do the following:
• Incorporate any feedback given to you during your screencast session.
• Complete Sections 3.4 and 3.5: Inference for multiple regression
• Complete Section 4: Discussion. You will write this conclusion based on the results of sections 3.4 and 3.5.
• Attend lab on Tuesday April 30th.
• Be sure to attend the lab section where all your groupmates will be present.
• You will cover the template/example project_resubmission.Rmd R Markdown file during the lab first, then the rest of lab period is devoted to working on this phase.
• There will be no problem set assigned that week.

## Items to submit

In order to receive full credit, all the following elements must be completed. For example, many of you didn’t submit your .Rmd files for the previous submission phase.

• Group leader only: On Moodle submit:
1. The project_resubmission.Rmd R Markdown file with sections 1, 2, 3.1-3.3 completed
2. The project_resubmission.html HTML report file.
3. Only if you didn’t publish your data as a .csv on Google Sheets because it has confidential/private information, then include the .csv or .xlsx spreadsheet file.
• All group members individually must:
• Complete the “SDS/MTH 220 Exit Survey” Google Form. There are four sections: