Last updated on 2017-05-15

Lec41 - Mon 5/15: Midterm III Review


  • Fri 5/19 7pm-10pm in Warner 506
  • Not a final, but 3rd midterm. Timed at ~1h15m to 1h30m
  • Bring your cheatsheets
  • Bring a calculator or your smart phone with calculator app


  • More conceptual in nature
  • Code:
    • Reading/understanding: Fair game
    • Writing: No direct code to write, but pseudocode
  • Normal curve of distribution of difficulty


  • Lectures 01 through 38 inclusive and cummulative
    • Slides from each lecture
    • Learning Checks
    • Problem set solutions!

Major Topics: Midterm I

  • Tidy data. What are the components?
  • What is the Grammar of Graphics? How do they tie in with ggplot2?
  • What are the first four of the 5NG? What are their distinguishing features?

Major Topics: Midterm II

Major Topics: Midterm III

  • Hypothesis testing
    • Lady tasting tea.
    • There is only one test; it has 5 components.
  • Confidence intervals
    • Theory: Sampling distribution and standard errors
    • Interpretation of CI
    • If sampling distribution is normal, the general formula for creating a 95% C.I.

Major Topics: Midterm III

  • Regression
    • Regression line is best fitting line in what sense?
    • Interpret ALL regression table outputs
    • Study residuals
    • Categorical variables
    • Multiple Regression

Lec39 - Thu 5/11: Multiple Regression


So far we've seen simple linear regression

  • Simple means only one predictor/independent variable \(x\)
  • Outcome/depedendent variable \(y\)
  • \(x\) can be either numerical or categorical


In Lec 36 LC we saw the relationship between \(x =\) dep delay & \(y =\) arr delay for Alaska Airlines flights.


  • Since we only have Alaska flights, the variable carrier doesn't vary.
  • But now let's also consider Frontier Airlines (carrier == F9)

So we have:

  • \(y =\) arrival delay
  • \(x_1 =\) departure delay (numerical variable)
  • \(x_2 =\) carrier (categorical variable with \(k=2\) levels. In other words, carrier now varies.)


Is there a difference in delays between Alaska and Frontier?


Is there a difference in delays between Alaska and Frontier?

Lec38 - Wed 5/10: Interpretation + Categorical Predictors

Chalk Talk for Today

  • Continuing Regression Outputs: Lec36 Learning Check
  • Categorical Predictors

Lec37 - Mon 5/8: Least-Squares Line + Regression Output

Best Fitting Ling

What does "best fitting line"" mean?