Schedule


Lec 39: Wed 5/1

Announcements

Todays Topics/Activities

1. Chalk talk

  • Midterm III review
  • Practicing statistical inference (confidence tests & hypothesis tests) via mathematical formulas using pen & paper instead of simulations using a computer.

Lec 38: Mon 4/29

Announcements

  • Office hours during reading/exam week by appointment. If someone has already booked a time you want to attend, please come anyways; the more the merrier! Book here
  • Practice Midterm III to be posted under midterms by Wednesday.

Todays Topics/Activities

1. Chalk talk

  • Conditions for inference for regression: last element needed for term project.
  • Simple regression example:

2. In-class exercise

moderndive readings in schedule above.


Lec 37: Fri 4/26

Announcements

Todays Topics/Activities

1. Chalk talk

Inference for regression. In other words sampling scenarios 5 & 6 from moderndive Table 8.6:

Scenario Population parameter Notation Point estimate Notation.
1 Population proportion \(p\) Sample proportion \(\widehat{p}\)
2 Population mean \(\mu\) Sample mean \(\widehat{\mu}\) or \(\overline{x}\)
3 Difference in population proportions \(p_1 - p_2\) Difference in sample proportions \(\widehat{p}_1 - \widehat{p}_2\)
4 Difference in population means \(\mu_1 - \mu_2\) Difference in sample means \(\overline{x}_1 - \overline{x}_2\)
5 Population regression slope \(\beta_1\) Fitted regression slope \(\widehat{\beta}_1\) or \(b_1\)
6 Population regression intercept \(\beta_0\) Fitted regression intercept \(\widehat{\beta}_0\) or \(b_0\)

Recall from Chapter 6 our study of relationship between the following two variables for instructors of \(n\) = 463 courses at the UT Austin:

  • \(y\): instructor teaching score as given by students
  • \(x\): instructor “beauty” score as “rated” by a panel of 6 students

Recall our exploratory data visualization of the relationship in moderndive Figure 6.4:

and the corresponding regression table in moderndive Table 6.2:

term estimate std_error statistic p_value lower_ci upper_ci
intercept 3.880 0.076 50.961 0 3.731 4.030
bty_avg 0.067 0.016 4.090 0 0.035 0.099

Lec 36: Wed 4/24

Announcements

  • I will be absent on Friday. Guest lecturer: Prof. Ben Capistrant from the Smith School of Social Work and current SDS/MTH 291 Multiple Regression instructor.

Todays Topics/Activities

1. Chalk talk

  • Went over midterm II solutions.

2. In-class exercise

  • moderndive readings in schedule above.

Lec 35: Mon 4/22

Announcements

  • No office hours on Wednesday 4/24
  • Term project resubmission instructions posted under Term Project.

Todays Topics/Activities

1. Chalk talk

None

2. In-class exercise

moderndive readings in schedule above.


Lec 34: Fri 4/19

Announcements

Project:

  • All feedback sessions have held.
  • Final instructions/template will be posted on Monday.
  • During final lab on Tue 4/30 you’ll be working on project.
  • Note for those of you who did a log10-transformation of your outcome variable.

Todays Topics/Activities

1. Chalk talk

  • The 😕🤕😵🤯😱 statistical definitions, terminology, and notation for hypothesis testing.
  • Remember that at the root of all hypothesis testing, just like with confidence intervals, is sampling!
  • The question is: in real-life where we take only one sample, how can we study the effects of sampling variation? Using resampling!
    • Confidence intervals: bootstrap resampling with replacement
    • Two-group hypothesis testing: permutation resampling without replacement. i.e. shuffle it!

2. In-class exercise

moderndive readings in schedule above.

3. Tweet of the Day

For those of you who are Game of Thrones fans and those of you who are potentially interested in Machine Learning, check out this tweet. Season 8, Episode 1 spoiler alert!


Lec 33: Wed 4/17

Announcements

Todays Topics/Activities

1. Chalk talk

  • The intuition behind hypothesis testing.
  • (Time permitting) The statistical definitions/terminlogy behind hypothesis testing; in particular how hypothesis testing relates to sampling.

2. Tweet of the Day

If you care to share feedback relating to the use of this example, please fill out this Google Form. You have the option to remain anonymous and all responses will remain confidential.


Lec 32: Mon 4/15

Announcements

  • Lab tomorrow (Tue 4/16) is optional office hours; Jenny will be in Sabin-Reed 301.
  • No problem set this week!
  • Don’t forget your project feedback appointments this week.
  • Talk on Thursday April 18th 6pm in Seelye 106

Todays Topics/Activities

1. Tactile simulation

  • RStudio Desktop users only: reinstall/update the moderndive package
  • Form groups of two students
  • Watch slideshow
  • Imagine… a hypothetical world with no gender discrimination in hiring.
  • In this hypothetical world, we can switch i.e. shuffle i.e. permute the (binary) gender variable in . Do this using this code to create a new variable hypothetical

Lec 31: Fri 4/12

Announcements

  • Go over practice midterm II.

Lec 29: Mon 4/8

Announcements

  • No lecture on Wednesday, extra office hours on Friday 1-3pm.
  • Midterm II review. In particular practice midterm posted on Midterms page.
  • Project feedback: Sign up for a feedback session next week where all group members and myself will record a screencast in my office. Please be mindful of how much coordination it takes for me to schedule 17 feedback sessions and read all these instructions carefully first:
    1. Jump to the week of April 14-20 (next week) on my Google Appointments Calendar and identify which “220 Project Feedback Only” 15 minute time slots work for you.
    2. Then as a group, coordinate on a “220 Project Feedback Only” 15 minute time slot when all group members can attend.
    3. Then only the group leader will book the 15 minute appointment. Under “Description” include both 1) your group name and 2) the names of all group members.
    4. If none of the listed times work for you all, contact me in the Slack Direct Message that includes me and Jenny and all group members.

Todays Topics/Activities

1. Chalk Talk

2. In-class exercise


Lec 28: Fri 4/5

Announcements

  • Project submission due at 5pm today!
  • Midterm II next week. Review on Monday

Todays Topics/Activities

1. Chalk Talk

  • Recap of Lec 27: The “frequentist” intepretation of confidence intervals
  • Two ways to compute SE.

2. In-class exercise

  • Note: As seen in moderndive 9.5, I still need to add tactile_shovel_1 data frame to moderndive package.
  • moderndive readings in above schedule.

Lec 27: Wed 4/3

Announcements

Todays Topics/Activities

1. Chalk Talk

  • the infer bootstrapping framework (see handout below). Both the left-hand side which uses dplyr verbs and the right hand side that uses infer verbs do the same thing. However we’ll see that the infer code on the right can be used in more situations.
  • Intepreting confidence intervals
  • What determines the width of net?
    1. The confidence level: A 95% CI will be wider than an 80% CI
    2. The original sample size n. As n goes up, the CI gets narrow i.e. you have more precise results.

2. In-class exercise

Read the following in moderndive:

  • Re-read moderndive 9.4 in light of today’s discussion
  • moderndive 9.5-9.6

Lec 26: Mon 4/1

Announcements

  • No lecture today

Lec 25: Fri 3/29

Announcements

Todays Topics/Activities

2. In-class exercise

Read the following in moderndive:

  • Appendix A.2 on the Normal distribution
  • moderndive 9.3-9.4

Lec 24: Wed 3/27

Announcements

  • Please open Slack.

Todays Topics/Activities

1. Chalk Talk

  • Recap of Lec23:
    • Recall your “tactile” resampling results in this Google Sheet
    • Let’s load each of your resampled sample means and plot a histogram.
  • Today:
    • New inference scenario Number 2 in moderndive Table 8.6. Unknown value is no longer population proportion \(p\) but population mean \(\mu\)
    • Just like in Ch8 with sampling, we are going from “tactile resampling” by hand to “virtual resampling” using a computer!

2. In-class exercise

Read the following in moderndive:

  • Intro to Ch9
  • 9.1
    • Read 9.1.1
    • Skip 9.1.2 and 9.1.3. These sections will be re-written before book launch
    • Read 9.1.4
  • 9.2

Lec 23: Mon 3/25

Announcements

  • Tomorrow 12:15-1:05pm in Ford Hall Atrium: Presentation of SDS major

  • What is the tidyverse package?
    Drawing

Todays Topics/Activities

1. Chalk Talk

  • Recap of Chapter 8: Sampling. In particular two goals:
    • Study the effect of sampling variation on our estimates.
    • Study the effect of sample size on sampling variation.
  • In real-life, when we have a single sample of size \(n\) (and not 1000 like in our simulations), what do we do? Bootstrap re-sampling from the original sample!

2. In-class exercise

Resampling tactile exercise. Open this Google Sheet. Resulting 35 sample means based on a resample of size 50 are here:


Lec 22: Fri 3/22

Announcements

  • Discuss next phase of project: “Project (initial) submission” due Fri 4/5 5pm.
  • Talk today from 12:15-1:00pm in McConnell B15: Gina DelCorazon ’04, Director of Data & Analytics at The National Math and Science Initiative


Todays Topics/Activities

1. Chalk Talk

2. In-class exercise

  • moderndive readings in above schedule.

Lec 21: Wed 3/20

Announcements

  • No office hours today. I do however have appointments to book on Friday (see syllabus for link).
  • Project proposal feedback given today. On Friday I will post information about the next phase “Project (initial) submission” due Fri 4/5 5pm.

Todays Topics/Activities

1. Chalk Talk

  • Overall comments about project proposal.
  • Recap of Lec20: Simulation. Goal is to study the effect of sampling variation.
  • Sampling terminology, notation, and statistical definitions. Mastering these will take practice, practice, practice.

2. In-class exercise

  • moderndive readings in above schedule.

3. Tweet of the day

Relating to the formating of your reports and in particular it’s length. Do not include “superfluous” output as it only increases the “ink to information ratio.”


Lec 20: Mon 3/18

Announcements

  • Over spring break moderndive Chapters 6 & 7 on basic and multiple regression went through a thorough renovation, so the presentation might be a little different. Please let me know if you have comments, questions, or feedback.
  • Two events coming up:
    • Fri 3/22 12:15-1:00pm in McConnell B15: Gina DelCorazon ’04, Director of Data & Analytics at The National Math and Science Initiative
    • Tue 3/26 12:15-1:05pm in Ford Hall Atrium: Presentation of SDS major

Todays Topics/Activities

1. Chalk Talk

Sampling exercise:

  • Ask yourself “What proportion of this bowl’s balls are red?”
  • Come up to the front of the class and take a photo of the sample. Do not delete this photo as you’ll be submitting it later
  • Compute the proportion of the 50 balls that are red.
  • Post a post-it on the histogram on the blackboard where the bins are left-inclusive. In other words, if you obtain a proportion of 0.2, put a post-it in the 0.2-0.25 bin.

Why are we doing this?

  • To study the effects of sampling variation
  • Also, to update the contents of moderndive section 8.1.

2. In-class exercise

  • moderndive readings in above schedule.

Lec 19: Fri 3/8

Announcements

  • Went over Midterm I

Lec 18: Wed 3/6

Announcements

Yet far too much handcrafted work, what data scientists call “data wrangling,” “data munging” and “data janitor work”, is still required. Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.

Todays Topics/Activities

1. Chalk talk

  • Recap of Lec17
  • More on designed experiments from OpenIntro Section 1.5 page 17 on Experiments; to access PDF of OpenIntro, click on “free online” link here. Blocking in a randomized experiment:

2. Tweet of the day

Help Chester and me make this book as good as it can be before it goes to press! If something you read doesn’t make sense, let me know!


Lec 17: Mon 3/4

Announcements

  • Guest lecturer on Wed 3/6 at 11:45am: Prof Randi Garcia.
  • Reminder tomorrow is:


Todays Topics/Activities

1. Chalk talk

  • Recap of Lec11: model selection
  • Random assignment, causal inference, observational studies vs experiments.
  • Example:
  • Discussion questions:
    1. Why did I ask the question “Have you been to Africa before?”
    2. Why did I have the people with even-numbered birthdays take the “Africa Quiz” and those with odd-numbered birthdays take “Africa Experiment”?
    3. Comment on what you think the difference between the two histogram for heights will be for those who took the “Africa Quiz” vs “Africa Experiment”.

Lec 16: Fri 3/1

Announcements

  • Extra office hours on Friday 2:45-4pm.

Todays Topics/Activities

1. Chalk talk

2. In-class exercise

  • moderndive readings in above schedule.

Lec 15: Wed 2/27

Announcements

  • Clarifications of upcoming deadlines.
  • Project proposal phase posted on Term Projects page.
  • Extra office hours on Friday 2:45-4pm.

Todays Topics/Activities

1. Chalk talk

2. In-class exercise


Lec 14: Mon 2/25

Announcements

  • Midterm I details posted.
  • Term project: I will
    • Give feedback on your “data proposals” on Slack by later today.
    • Post instructions for the next “Project proposal” phase by Wednesday; this phase is due Fri 3/8 5pm and involves data wrangling and exploratory data analysis.

Todays Topics/Activities

1. Chalk talk

  • Multiple regression

2. In-class exercise

  • moderndive readings in above schedule.

3. Install development version of moderndive package

  • If you are working with RStudio Desktop, please follow these steps before tomorrow’s lab. If you get stuck, please ask Jenny for help.
  • If you are working on RStudio Server, you can ignore these steps.

We’re going to install the development AKA beta-version of the moderndive package, which includes a new function gg_parallel_slopes() allowing you to create a ggplot of the parallel slopes model.

  1. Install the devtools package as you normally would install a package. Say yes to any prompts.
  2. Run the following line in your console to install the development version of the moderndive package off of GitHub.com:
    devtools::install_github("moderndive/moderndive", ref = "geom_parallel_slopes")
  3. Run library(moderndive) to reload the package.
  4. Run ?gg_parallel_slopes and see if the help file pops up. If it does, your installation worked!
  5. Run example code at the bottom of the help file to see it in action! You should get the following plot:

Lec 13: Fri 2/22

Announcements

  • Seelye self-scheduled Midterm I next Fri 3/1 thru Sun 3/3; Midterm I review on Monday.

Todays Topics/Activities

1. Chalk talk

  • Boxplots for EDA when explanatory variable \(x\) is categorical.
  • Indicator function
  • What are fitted values and residuals when \(x\) is categorical?

2. In-class exercise

  • moderndive readings in above schedule.

3. Tweet of the day

Dr. Benn is excited for her talk at Smith College SDS! Are you?


Lec 12: Wed 2/20

Announcements

  • None

Todays Topics/Activities

1. Chalk talk

  • Recap of Lec11: What do we mean by “best” fitting line? Note in the plot below there are 3 points marked with black dots along with:
    • The “best” fitting regression line in blue
    • An arbitrarily chosen line in dashed red
    • Another arbitrarily chosen line in dashed green
  • Regression using a categorical explanatory variable
term estimate std_error statistic p_value lower_ci upper_ci
intercept 10 0.577 17.321 0.000 8.587 11.413
nameJenny 2 0.816 2.449 0.050 0.002 3.998
nameMiles -1 0.816 -1.225 0.267 -2.998 0.998

2. In-class exercise

  • Work on projects.
  • moderndive readings in above schedule.

Lec 11: Mon 2/18

Announcements

  • Part of next lecture (Lec12 on Wed 2/20) will be devoted to work on project.
  • Note in the above schedule that the topics for Lab06 and Lab07 have switched places, as originally it erroneously had you working on your project proposals after it was due.

Todays Topics/Activities

1. Chalk talk

  • Recap of Lec10
  • What is a confounding variable?
  • Fitted values & residuals via get_regression_points()
  • What do we mean by “best” when we say that the regression line is the “best fitting” line?

2. In-class exercise

  • moderndive readings in above schedule.

Lec 10: Fri 2/15

Announcements

  • Project data is due in a week!
  • We’ll devote part of Lec12 on Wed 2/20 to work on project data phase.

Todays Topics/Activities

1. Chalk talk

  • Recap of Lec09
  • Regression table via get_regression_table() and interpreting the regression line

2. In-class exercise

  • moderndive readings in above schedule.

Lec 9: Wed 2/13

Announcements

  • Winner for best group name.

Todays Topics/Activities

1. Chalk talk

  • Recap of Lec08
  • Correlation coefficient

2. In-class exercise

  • moderndive readings in above schedule.

3. Tweet of the day

Why are Jenny and Albert always on your cases about running glimpse() and View() on your data frames? Looking at your data is so deceptively simple that many people forget or ignore this step, even analysts/engineers with PhD’s at Google! Before performing any kind of analysis, you must getting a sense of:

  1. What types of variables you have in your columns? Numerical, categorical, text, dates?
  2. What values you have in your cells? Units of any measurements?
  3. What is the quality of your data? Do you have missing data? Are there crazy outliers?

These are the most fundamental steps to take before any data analysis! That’s why moderndive starts in Chapter 2 with “Data exploration” with glimpse() and View().


Lec 8: Mon 2/11

Announcements

  • Everybody join the term_project channel in Slack.
  • Discuss Data due phase of term-project.

Todays Topics/Activities

1. Chalk talk

  • Recap of Lec07:
    • What does group_by() by itself do?
    • Difference between filter() and group_by() %>% summarize()
  • Last three verbs:
    • mutate() existing variables to create new ones
    • arrange() rows in ascending or desc()ending alphanumeric order of another variable
    • select() or drop variables

2. In-class exercise

moderndive readings in above schedule.


Lec 7: Fri 2/8

Announcements

  • Term project groups are due today at 5pm. Make sure your group leader has completed all three steps, in particular the Google Form.
  • I will introduce next phase of term project on Monday: Data due: Fri 2/22 5pm.

Todays Topics/Activities

1. Chalk talk

  • Recap of Lec06
  • Computing summary statistics using summarize()
  • Adding Groups meta-data using group_by(). See example code below.
  • Computing summary statistics split by group using group_by() %>% summarize()

2. In-class exercise

moderndive readings in above schedule.


Lec 6: Wed 2/6

Announcements

Todays Topics/Activities

1. Chalk talk

  • Recap of Lec05.
  • Say I want visualize the distribution of temperature split by month. Two options
  • Starting Data Wrangling: the pipe operator %>% and filter() rows of a data frame.

2. In-class exercise

moderndive readings in above schedule.


Lec 5: Mon 2/4

Announcements

  • Added note to syllabus on office hours: If you’re having R or RStudio issues, please have your computer and RStudio loaded and ready to go.
  • ModernDive Chapters 4 & 5 are now reordered and renamed:
    • Chapter 4: Tidy Data via tidyr Data Wrangling
    • Chapter 5: Data Wrangling via dplyr Data Importing and “Tidy” Data
  • Term project groups are due this Friday 5pm; see Term Project. If you need a group, Slack me.
  • Lab tomorrow: Jenny will talk about DataCamp & cover data visualization.

Todays Topics/Activities

1. Chalk talk

  • Recap of Lec04: Histogram binning structure & facets
  • Boxplot to show the distribution of a numerical variable split by a categorical variable. Say we want to plot a boxplot of the following 12 values which are pre-sorted:

1, 3, 5, 6, 7, 8, 9, 12, 13, 14, 15, 30

They have the following summary statistics:

Min. 1st Qu. Median 3rd Qu. Max.
1 5.5 8.5 13.5 30

2. In-class exercise

  • R Markdown:
    • Don’t be afraid of error messages! In particular the line number where the error occurs!
    • Heads up! View() nor ? will prevent your .Rmd files from knitting (i.e. the HTML report won’t get created)!
  • moderndive readings in above schedule.

Lec 4: Fri 2/1

Announcements

  • Slack message and #moderndive_typoes
  • Remember the Warning: Removed 5 rows containing missing values (geom_point). warning message you got when creating a scatterplot of alaska_flights arrival and departure delays? Check out this talk by Prof. Brittney Bailey from Amherst College next Thursday 2/7 12:10pm in McConnell B15:

Todays Topics/Activities

1. Chalk talk

  • Recap of Lec03.
  • Histograms to show distribution of a numerical variable.

2. In-class exercise

  • Live demo: Some tips on workflow for in-class exercises.
    • “Typing/running code directly in console” vs “typing code in your class_notes.Rmd R Markdown file and sending it to the console to run.”
    • Quickly switching applications on your computer with “command + tab” (macOS) or “control + tab” (Windows)
    • Making code human-friendly to read! Be empathetic to your collobators by writing nice code, in particular your most important collaborator!
    • For example: hard returns between code chunks.
    • See screencast of live demo below!
  • Please close your RStudio Server window when not working! It uses up Smith server resources if you don’t!
  • ModernDive readings for Lec04 in above schedule.

3. Tweet of the day

The BBC uses ggplot2 for data journalism!


Lec 3: Wed 1/30

Announcements

  • Slack message and #moderndive typos
  • Updated Term Project page with:
    • Information on first phase: Form groups
    • Example of final “resubmission” due the last day of class. Note this example is subject to change throughout the semester.
  • Added all Term Project items to Moodle.

Todays Topics/Activities

1. Chalk talk

  • Recap of Lec02: nycflights13 package and glimpse()/View() functions.
  • What is a function? What are arguments?
  • Grammar of graphics
  • Scatterplots

2. In-class exercise

ModernDive readings for Lec03 in above schedule. Now that you’ve seen R Markdown in Lab 1:

  • Create a .Rmd file and save it as class_notes.Rmd. That way you can save all your code for re-use later, like a Word document.
  • Copy and paste any code from ModernDive into “code chunks” in class_notes.Rmd. That way you can easily tweak/modify code.
  • “Run” code in the console from the “code chunks” in class_notes.Rmd as you learned in Lab 1.
  • Again, you do not need to submit any answers for learnings checks, however you are resposible for completing the readings, running all code, and doing all learning checks doing before the next lecture.

Lec 2: Mon 1/28

Announcements

  • On Moodle
    • If you haven’t already, please complete all the steps in “start here”
    • If you are trying to register for this course, see the posted registration priority list.
  • Outside help:
    • Spinelli Center for Quantitative Learning tutoring hours (Sunday-Thursday 7-9pm in Sabin-Reed 301) start tonight.
    • My office hours are now posted on syllabus.
  • Slack
    • Did you get a Slack notification in some form for my message on Saturday at 9AM: mobile/desktop or email notification? You are responsible for staying on top of in-between lecture notifications.
    • First student question posted on #questions Slack channel🎉 !!! The 🏆 for first student answer to a student question is still up for grabs!
  • First lab with Dr. Jenny Smetzer is tomorrow. Problem set 01 (PS01) will be posted on Moodle. the Problem Sets page.

Todays Topics/Activities

1. Chalk Talk

Why chalk talks? Read the Field Notes slogan.

2. In-class exercise

Why undirected in-class exercise time? People all learn at their own pace. What to do:

  1. Open RStudio (R in the menu bar above)
  2. Open ModernDive (ModernDive in menu bar above)
  3. As indicated in the above schedule, read ModernDive Chapter 2 while running all code in the console.
  4. You can skip all videos and the DataCamp links as we’ll be talking about those in class.
  5. You do not need to turn in Learning Checks, those are for your practice. The solutions are in Appendix D.
  6. If you have questions, ask a peer. If you’re still stuck, ask me!

Recall from the “How can I succeed in this class?” discussion in the syllabus:

  • Lectures, labs, and readings:
    • “Am I actually running the code and studying the outputs in R during in-class exercises, or am I just skimming the text?”
    • “Am I completing all the ModernDive readings/in-class activites for a given lecture before the start of the next lecture?”
    • “During in-class exercises and lab time, am I taking full advantage that I’m in the same place at the same time with the instructor, the lab assistants, and most importantly your peers, or am I browsing the web/texting the whole time?”
  • Problem sets, DataCamp, and coding:
    • “When learning to code, much like learning a language, have I been really pushing myself to practice, practice, practice?”

Lec 1: Fri 1/24

Announcements

  • Please ensure you have followed all the “start here” instructions posted on Moodle. Please remember that just because you can access the moodle page does not guarantee you are registered for the course. I will post a “priority list” of waitlisted students on Moodle by tomorrow.
  • What is the difference between SDS/MTH 220 vs SDS 201?
  • Who is rudeboybert?
  • Website features
  • Slack demo
  • Final project discussion
  • Food for thought on coding: