Load Packages and Data

# Load necessary packages
library(ggplot2)
library(dplyr)
library(nycflights13)

# Load flights data set in nycflights
data(flights)

LC 5.12

Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013.

Solution

flights %>%
  group_by(dest) %>%
  summarize(largest_arrival_delay = max(arr_delay, na.rm=TRUE)) %>%
  top_n(n = 5) %>%
  arrange(desc(largest_arrival_delay))
## # A tibble: 5 × 2
##    dest largest_arrival_delay
##   <chr>                 <dbl>
## 1   HNL                  1272
## 2   CMH                  1127
## 3   ORD                  1109
## 4   SFO                  1007
## 5   CVG                   989

1272 minutes = 21.2 hour delay for a flight to Honolulu! So on top of the long, long flight, you arrive nearly a day late!

LC 5.16

What happens when you try to left_join the ten_freq_dests data frame with airports instead of airports_small? How might one use this result to answer further questions about the top 10 destinations?

Solution

We first define the necessary data frames

airports_small <- airports %>%
  select(faa, name)

ten_freq_dests <- flights %>%
  group_by(dest) %>%
  summarize(num_flights = n()) %>%
  top_n(n = 10) %>%
  arrange(desc(num_flights))

We compare the two possible joins:

orig_join <- ten_freq_dests %>%
  left_join(airports_small, by = c("dest" = "faa"))
new_join <- ten_freq_dests %>%
  left_join(airports, by = c("dest" = "faa"))

We then do a View() of both:

View(orig_join)
View(new_join)

The latter profiles more information. For example, most of the top 10 destinations have tz=-5. Looking at ?airports, we see that tz corresponds to time zone. 7 of the top 10 destinations are in the Eastern time zone, with two more being in Pacific.

LC 5.17

What surprises you about the top 10 destinations from NYC in 2013?

Solution

ten_freq_dests %>%
  left_join(airports_small, by = c("dest" = "faa"))
## # A tibble: 10 × 3
##     dest num_flights                               name
##    <chr>       <int>                              <chr>
## 1    ORD       17283                 Chicago Ohare Intl
## 2    ATL       17215    Hartsfield Jackson Atlanta Intl
## 3    LAX       16174                   Los Angeles Intl
## 4    BOS       15508 General Edward Lawrence Logan Intl
## 5    MCO       14082                       Orlando Intl
## 6    CLT       14064             Charlotte Douglas Intl
## 7    SFO       13331                 San Francisco Intl
## 8    FLL       12055     Fort Lauderdale Hollywood Intl
## 9    MIA       11728                         Miami Intl
## 10   DCA        9705      Ronald Reagan Washington Natl

Different people will have different answers, but I’m wondering: are that many people flying to Boston from NYC?