select()
and filter()
Learning Checks# Load necessary packages
library(ggplot2)
library(dplyr)
library(nycflights13)
# Load necessary data sets from nycflights
data(flights)
dest
, air_time
, and distance
variables from flights
? Give the code showing how to do all of them you can think of.select()
function on a data frame?We recall the variables we have by using the names()
function:
names(flights)
## [1] "year" "month" "day" "dep_time"
## [5] "sched_dep_time" "dep_delay" "arr_time" "sched_arr_time"
## [9] "arr_delay" "carrier" "flight" "tailnum"
## [13] "origin" "dest" "air_time" "distance"
## [17] "hour" "minute" "time_hour"
select(flights, dest, air_time, distance)
i.e. select them directlyselect(flights, dest:distance)
i.e. select a range of them, since they are sequential columnsselect(flights, -year, -month, -day, ETC)
i.e. deselect all the other ones. Admitedly this is rather awkward.dest
, air_time
, and distance
are sequentially columns.select()
would help pare down the number of columns so that we can easily View()
them.not_summer_flights <- filter(flights, !between(month, 6, 8))
Instead of using the !
function, what is the other way we could filter only the rows that are not summer months (June, July, or August) in the flights
data frame? Test it out.
Lots of different ways! Try these out!
filter(flights,
month == 1 | month == 2 | month == 3 | month == 4 | month == 5 | month == 9 | month == 10 | month == 11 | month == 12)
This definitely not as efficient as using the !
operator.