Load Packages and Data

# Load necessary packages
library(ggplot2)
library(dplyr)
library(nycflights13)

# Load necessary data sets from nycflights
data(flights)

LC 5.1, 5.3

  • How many different ways are there to select all three of dest, air_time, and distance variables from flights? Give the code showing how to do all of them you can think of.
  • Why might we want to use the select() function on a data frame?

Solution

We recall the variables we have by using the names() function:

names(flights)
##  [1] "year"           "month"          "day"            "dep_time"      
##  [5] "sched_dep_time" "dep_delay"      "arr_time"       "sched_arr_time"
##  [9] "arr_delay"      "carrier"        "flight"         "tailnum"       
## [13] "origin"         "dest"           "air_time"       "distance"      
## [17] "hour"           "minute"         "time_hour"
  • We could either
    • select(flights, dest, air_time, distance) i.e. select them directly
    • select(flights, dest:distance) i.e. select a range of them, since they are sequential columns
    • select(flights, -year, -month, -day, ETC) i.e. deselect all the other ones. Admitedly this is rather awkward.
    • many more…
  • I would probably do the first one, b/c it doesn’t assume that dest, air_time, and distance are sequentially columns.
  • select() would help pare down the number of columns so that we can easily View() them.

LC 5.4

not_summer_flights <- filter(flights, !between(month, 6, 8))

Instead of using the ! function, what is the other way we could filter only the rows that are not summer months (June, July, or August) in the flights data frame? Test it out.

Solution

Lots of different ways! Try these out!

filter(flights, 
       month == 1 | month == 2 | month == 3 | month == 4 | month == 5 | month == 9 | month == 10 | month == 11 | month == 12)

This definitely not as efficient as using the ! operator.