In this lab, we will learn how to use ifelse()
for
vectorized control flow, and to avoid writing for
loops.
Goal: by the end of this lab, you will be able to
assign values conditionally and re-write a for
loop using
map()
.
ifelse()
The if () ... else
syntax is for control flow.
However, ifelse()
is a function that returns a vector of
the same length as the vector you put in, based on some logical
conditions. These are often useful inside mutate()
.
In the starwars
data set, most characters have a
species
. However, there are many different species.
%>%
starwars group_by(species) %>%
count() %>%
arrange(desc(n))
Suppose that we wanted to lump all of the non-human and non-droid
species together. We can use ifelse()
to create a new
variable.
<- starwars %>%
sw2 mutate(
species_update = ifelse(
!species %in% c("Human", "Droid"),
"Other",
species
)%>%
) select(name, species, species_update)
Note the behavior around NA
. Some characters have
unknown species.
%>%
starwars filter(is.na(species))
Our previous construction led to everyone non-human or non-droid
being classified as Other
, when maybe some should be left
as NA
.
%>%
sw2 group_by(species_update) %>%
count() %>%
arrange(desc(n))
By capturing NA
s in our condition, we can leave them as
NA
s.
%>%
starwars mutate(
species_update = ifelse(
!species %in% c("Human", "Droid", NA),
"Other", species
)%>%
) filter(is.na(species)) %>%
select(name, species, species_update)
is_bald
and set it to
FALSE
if the character has hair of any color,
TRUE
if the character has no hair, and NA
if
the character is a droid.# SAMPLE SOLUTION
<- starwars %>%
starwars mutate(
is_bald = ifelse(species == "Droid", NA, TRUE),
is_bald = ifelse(is_bald & hair_color != "none",
FALSE, is_bald)
)
NA
s. Do you have them in all the right
places?%>%
starwars select(hair_color, is_bald) %>%
table(useNA = "always")
for
loopsAs noted in the book, there are many reasons to avoid writing loops
in R. I have never written a repeat
loop. There are only
rare occasions when a while
loop is necessary. Unless you
need to explicitly access indices, you can and should
rewrite a for
loop as a map()
statement. I
will strongly encourage you to do this!!
Many operations in R are vectorized already, so you often don’t need a loop at all.
Considering generating the first 10 number in some integer sequences. For the perfect squares, you don’t need a loop at all, because the square operator is vectorized. Recall that vectors are built into the fundamental design of R, so things are supposed to work this way!
<- 1:10
x
^2 x
## [1] 1 4 9 16 25 36 49 64 81 100
However, consider generating the Fibbonaci sequence. This can’t be
vectorized, because each entry depends on the previous two
entries. You could write a for
loop.
<- c(1, 1)
fib for (i in 3:length(x)) {
<- fib[i-1] + fib[i-2]
fib[i]
} fib
## [1] 1 1 2 3 5 8 13 21 34 55
If we had the Fibbonacci sequence already, we could
use R’s vector-based operation lag()
to decompose the
sequence.
<- tibble(
fib_df
fib, prev_x = lag(fib),
prev_prev_x = lag(fib, 2)
) fib_df
But this won’t help us generate new values in the sequence.
map()
Instead, we can write a recursive
function to generate the \(n\)th
value in the sequence, and then map()
over that
function.
<- function(x) {
fibonacci if (x == 1 | x == 2) {
return(1L);
else {
} return(fibonacci(x - 1) + fibonacci(x - 2));
}
}
map_int(x, fibonacci)
## [1] 1 1 2 3 5 8 13 21 34 55
Generally, when you have a vector x
as input, and you
want to produce a vector y
of the same length as output,
you can use one of two paradigms:
x
and compute the whole
y
vector at once. I suspect that this will be the most
efficient method in nearly every case.y
for a single value of
x
, and then map()
that function over
x
.Only if neither of these is possible, should you write a
for
loop.
Recall that we saw map()
previously in the context of list-columns.
nchar()
function to compute the
number of characters in each character’s name, without writing any kind
of loop.# SAMPLE SOLUTION
nchar(starwars$name)
## [1] 14 5 5 11 11 9 18 5 17 14 16 14 9 8 6 21 14 16 4 9 9 5 5 16 5
## [26] 6 10 12 21 9 12 11 13 13 12 10 8 5 7 13 14 10 11 11 8 7 14 10 12 9
## [51] 9 10 11 11 8 10 12 5 11 17 15 13 5 5 19 10 10 15 7 7 10 13 6 10 8
## [76] 8 8 7 15 9 10 4 3 11 3 14 13
map_int()
and
nchar()
. Make sure you understand the difference between
these two approaches.# SAMPLE SOLUTION
map_int(starwars$name, nchar)
## [1] 14 5 5 11 11 9 18 5 17 14 16 14 9 8 6 21 14 16 4 9 9 5 5 16 5
## [26] 6 10 12 21 9 12 11 13 13 12 10 8 5 7 13 14 10 11 11 8 7 14 10 12 9
## [51] 9 10 11 11 8 10 12 5 11 17 15 13 5 5 19 10 10 15 7 7 10 13 6 10 8
## [76] 8 8 7 15 9 10 4 3 11 3 14 13
map_int()
and length()
to compute
a numeric vector of the number of vehicles
associated with
each character.# SAMPLE SOLUTION
map_int(starwars$vehicles, length)
## [1] 2 0 0 0 1 0 0 0 0 1 2 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
## [39] 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0
## [77] 1 0 0 0 0 0 0 0 0 0 0
map()
and nchar()
to compute the total
number of characters in the number of starships associated with each
character. For example, Luke Skywalker primarily flew an X-wing fighter,
but also briefly piloted an Imperial shuttle in Return of the
Jedi. So the number of characters in his starships
list is 6 + 16 = 22.# SAMPLE SOLUTION
map_int(starwars$starships, ~sum(nchar(.x)))
## [1] 22 0 0 15 0 0 0 0 6 96 53 0 33 33 0 0 6 6 0 0 7 0 0 17 0
## [26] 0 0 6 0 17 0 0 0 0 0 0 20 0 0 0 0 8 0 0 0 0 0 0 0 0
## [51] 0 0 0 0 16 0 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [76] 0 24 0 0 0 0 0 0 19 0 0 48
# SAMPLE SOLUTION
%>%
starwars pull(starships) %>%
map(nchar) %>%
map_int(sum)
## [1] 22 0 0 15 0 0 0 0 6 96 53 0 33 33 0 0 6 6 0 0 7 0 0 17 0
## [26] 0 0 6 0 17 0 0 0 0 0 0 20 0 0 0 0 8 0 0 0 0 0 0 0 0
## [51] 0 0 0 0 16 0 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [76] 0 24 0 0 0 0 0 0 19 0 0 48
for
loop as a call to
map()
. The output should be a list
of length
2.<- group_split(mpg, year)
mpg_by_year
<- list()
mods
for (i in seq_along(mpg_by_year)) {
<- lm(hwy ~ displ + cyl, data = mpg_by_year[[i]])
mods[[i]] }
# SAMPLE SOLUTION
map(mpg_by_year, ~lm(hwy ~ displ + cyl, data = .x))
## [[1]]
##
## Call:
## lm(formula = hwy ~ displ + cyl, data = .x)
##
## Coefficients:
## (Intercept) displ cyl
## 35.95548 -3.67442 -0.08285
##
##
## [[2]]
##
## Call:
## lm(formula = hwy ~ displ + cyl, data = .x)
##
## Coefficients:
## (Intercept) displ cyl
## 40.5275 -0.4355 -2.5437
# SAMPLE SOLUTION
map(mpg_by_year, lm, formula = "hwy ~ displ + cyl")
## [[1]]
##
## Call:
## .f(formula = "hwy ~ displ + cyl", data = .x[[i]])
##
## Coefficients:
## (Intercept) displ cyl
## 35.95548 -3.67442 -0.08285
##
##
## [[2]]
##
## Call:
## .f(formula = "hwy ~ displ + cyl", data = .x[[i]])
##
## Coefficients:
## (Intercept) displ cyl
## 40.5275 -0.4355 -2.5437
Prompt: What
#questions
to you still have about control flow and/or loops?