Last updated on 2017-05-15

- Fri 5/19 7pm-10pm in Warner 506
- Not a final, but 3rd midterm. Timed at ~1h15m to 1h30m
- Bring your cheatsheets
- Bring a calculator or your smart phone with calculator app

- More conceptual in nature
- Code:
- Reading/understanding: Fair game
- Writing: No direct code to write, but
**pseudocode**

- Normal curve of distribution of difficulty

- Lectures 01 through 38 inclusive and cummulative
- Slides from each lecture
- Learning Checks
- Problem set solutions!

- Tidy data. What are the components?
- What is the Grammar of Graphics? How do they tie in with
`ggplot2`

? - What are the first four of the 5NG? What are their distinguishing features?

- All five of the 5NG
- Data manipulation/wrangling
- 5MV +
`join`

- Putting it all together: Lec18 - Fri 3/24: The tao of data analysis.

- 5MV +
- Sampling, probability, confounding variables, and designed experiments.

- Hypothesis testing
- Lady tasting tea.
- There is only one test; it has 5 components.

- Confidence intervals
- Theory: Sampling distribution and standard errors
- Interpretation of CI
- If sampling distribution is normal, the general formula for creating a 95% C.I.

- Regression
- Regression line is best fitting line in what sense?
- Interpret ALL regression table outputs
- Study residuals
- Categorical variables
~~Multiple Regression~~

So far we've seen simple linear regression

- Simple means only one predictor/independent variable \(x\)
- Outcome/depedendent variable \(y\)
- \(x\) can be either numerical or categorical

In Lec 36 LC we saw the relationship between \(x =\) dep delay & \(y =\) arr delay for Alaska Airlines flights.

- Since we only have Alaska flights, the variable
`carrier`

doesn't vary. - But now let's also consider Frontier Airlines (
`carrier == F9`

)

So we have:

- \(y =\) arrival delay
- \(x_1 =\) departure delay (numerical variable)
- \(x_2 =\) carrier (categorical variable with \(k=2\) levels. In other words, carrier now varies.)

Is there a **difference in delays** between Alaska and Frontier?

Is there a **difference in delays** between Alaska and Frontier?

- Continuing Regression Outputs: Lec36 Learning Check
- Categorical Predictors

What does "best fitting line"" mean?