## Philosophy

• More conceptual in nature
• Code:
• Reading/understanding: Fair game
• Writing: No direct code to write, but pseudocode
• Normal curve of distribution of difficulty

## Sources

• Lectures 01 through 38 inclusive and cummulative
• Slides from each lecture
• Learning Checks
• Problem set solutions!

## Major Topics: Midterm I

• Tidy data. What are the components?
• What is the Grammar of Graphics? How do they tie in with `ggplot2`?
• What are the first four of the 5NG? What are their distinguishing features?

## Major Topics: Midterm II

• All five of the 5NG
• Data manipulation/wrangling
• Sampling, probability, confounding variables, and designed experiments.

## Major Topics: Midterm III

• Hypothesis testing
• Lady tasting tea.
• There is only one test; it has 5 components.
• Confidence intervals
• Theory: Sampling distribution and standard errors
• Interpretation of CI
• If sampling distribution is normal, the general formula for creating a 95% C.I.

• Regression
• Regression line is best fitting line in what sense?
• Interpret ALL regression table outputs
• Study residuals
• Categorical variables
• Multiple Regression

## Recall

So far we've seen simple linear regression

• Simple means only one predictor/independent variable \(x\)
• Outcome/depedendent variable \(y\)
• \(x\) can be either numerical or categorical

In Lec 36 LC we saw the relationship between \(x =\) dep delay & \(y =\) arr delay for Alaska Airlines flights.

• Since we only have Alaska flights, the variable `carrier` doesn't vary.
• But now let's also consider Frontier Airlines (`carrier == F9`)

So we have:

• \(y =\) arrival delay
• \(x_1 =\) departure delay (numerical variable)
• \(x_2 =\) carrier (categorical variable with \(k=2\) levels. In other words, carrier now varies.)

## Today

• Continuing Regression Outputs: Lec36 Learning Check
• Categorical Predictors

