Mon Dec 05, 2016

Final Note about Confidence Intervals

  • ALL confidence intervals have form: \(\mbox{Point Estimate}\pm 2 \times SE\)
  • What the point estimate and SE are will change
  • CI for \(\mu\):
    • Point estimate: \(\overline{x}\)
    • SE = \(\frac{s}{\sqrt{n}}\)
    • 95% CI: \(\overline{x} \pm 2 \frac{s}{\sqrt{n}} = \left(\overline{x} - 2 \frac{s}{\sqrt{n}}, \overline{x} + 2 \frac{s}{\sqrt{n}}\right)\)
  • PS-11: CI for \(\mu_1 - \mu_2\) i.e. Difference in means of two groups
    • Point estimate: \(\overline{x}_1 - \overline{x}_2\)
    • SE = NASTY!

Correlation Coefficient

  • Measures the strength of linear association between two variables
  • Always between [-1, 1] where
    • -1 indicates perfect negative relationship
    • 0 indicates no relationship
    • +1 indicates perfect positive relationship

Correlation Coefficient

Correlation Coefficient

Two versions: Just like with \(\mu\) and \(\overline{x}\)

  • Population correlation coefficient \(\rho\)
  • Sample correlation coefficent \(r\) based on a sample of n pairs of observations

Example

Recall the nycflights data set. For Alaska Air flights, let's explore the relationship between

  • Departure delay
  • Arrival delay
library(nycflights13)
data(flights)

# Load Alaska data, deleting rows that have missing dep or arr data
alaska_flights <- flights %>% 
  filter(carrier == "AS") %>% 
  filter(!is.na(dep_delay) & !is.na(arr_delay))

ggplot(data=alaska_flights, aes(x = dep_delay, y = arr_delay)) + 
   geom_point()

Example

Example

The correlation coefficient is computed as follows:

cor(alaska_flights$dep_delay, alaska_flights$arr_delay)
## [1] 0.8373792

This is fairly strongly positively associated!

Bored?

Important Note

Correlation coefficent \(\neq\) slope of regression line. Example: Say we have 3 group of points

Important Note

Their regression lines have different slopes, but \(r= 1\) for all 3. i.e. perfect (positive) linear relationship.

Learning Check