In this lab, we will learn how to investigate the underlying data structures of R objects.

Goal: by the end of this lab, you will be able to determine the base class of any object.

Attributes

Objects in R can have attributes. Use the attributes() function to figure out what they are.

attributes(starwars)
## $names
##  [1] "name"       "height"     "mass"       "hair_color" "skin_color"
##  [6] "eye_color"  "birth_year" "sex"        "gender"     "homeworld" 
## [11] "species"    "films"      "vehicles"   "starships"  "is_bald"   
## 
## $row.names
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
## [76] 76 77 78 79 80 81 82 83 84 85 86 87
## 
## $class
## [1] "beanumber"  "tbl_df"     "tbl"        "data.frame"

Unlike in many other programming languages, attributes in R – including the class of an object – are changeable!

  1. Use the assignment operator (<-) and the attr() function to change the class of starwars to sds_is_awesome.
# SAMPLE SOLUTION

attr(starwars, "class") <- "sds_is_awesome"
attributes(starwars)
## $names
##  [1] "name"       "height"     "mass"       "hair_color" "skin_color"
##  [6] "eye_color"  "birth_year" "sex"        "gender"     "homeworld" 
## [11] "species"    "films"      "vehicles"   "starships"  "is_bald"   
## 
## $row.names
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
## [76] 76 77 78 79 80 81 82 83 84 85 86 87
## 
## $class
## [1] "sds_is_awesome"
  1. Is starwars a data.frame now? How do you know? Try to select() a column.
# SAMPLE SOLUTION

starwars %>%
  select(name)
  1. Once you’re done playing around attributes, use rm(starwars) to delete the bad copy. Now run starwars again. Why does this work?
# SAMPLE SOLUTION

rm(starwars)
starwars

S3 classes

S3 is the name of the simplest and most common object-oriented paradigm in R. We’ll learn more about S3 later. For now, we’ll explore common vector classes that are not atomic.

Note first that starwars has multiple classes, and these classes are ordered.

class(starwars)
## [1] "tbl_df"     "tbl"        "data.frame"

The basic data type of starwars is a list, because all tbl_dfs and data.frames are lists.

typeof(starwars)
## [1] "list"

When you type starwars at the console, what actually gets called is print(starwars). That is, the default action when you type the name of an object is to run the print() command on that object.

Thus, when you type starwars, R runs print(starwars), and since it knows that print() is a generic function, and starwars is a tbl_df, it looks for a method called print.tbl_df(). If it can’t find one, it will look for a method called print.tbl(). If it can’t find one, it will look for print.data.frame(). If it can’t find that it will look for print.default().

In this case, there are print() methods defined for tbl and data.frame. Note the difference between:

starwars
print.data.frame(starwars)
  1. Examine the output of print.data.frame(starwars) and as.data.frame(starwars). Are they the same? What is the difference between what is actually executed?

  2. Examine the output of as.numeric(starwars$name) and as.numeric(factor(starwars$name)). What is going on?

# SAMPLE SOLUTION

x <- factor(starwars$name)
attributes(x)
## $levels
##  [1] "Ackbar"                "Adi Gallia"            "Anakin Skywalker"     
##  [4] "Arvel Crynyd"          "Ayla Secura"           "Bail Prestor Organa"  
##  [7] "Barriss Offee"         "BB8"                   "Ben Quadinaros"       
## [10] "Beru Whitesun lars"    "Bib Fortuna"           "Biggs Darklighter"    
## [13] "Boba Fett"             "Bossk"                 "C-3PO"                
## [16] "Captain Phasma"        "Chewbacca"             "Cliegg Lars"          
## [19] "Cordé"                 "Darth Maul"            "Darth Vader"          
## [22] "Dexter Jettster"       "Dooku"                 "Dormé"                
## [25] "Dud Bolt"              "Eeth Koth"             "Finis Valorum"        
## [28] "Finn"                  "Gasgano"               "Greedo"               
## [31] "Gregar Typho"          "Grievous"              "Han Solo"             
## [34] "IG-88"                 "Jabba Desilijic Tiure" "Jango Fett"           
## [37] "Jar Jar Binks"         "Jek Tono Porkins"      "Jocasta Nu"           
## [40] "Ki-Adi-Mundi"          "Kit Fisto"             "Lama Su"              
## [43] "Lando Calrissian"      "Leia Organa"           "Lobot"                
## [46] "Luke Skywalker"        "Luminara Unduli"       "Mace Windu"           
## [49] "Mas Amedda"            "Mon Mothma"            "Nien Nunb"            
## [52] "Nute Gunray"           "Obi-Wan Kenobi"        "Owen Lars"            
## [55] "Padmé Amidala"         "Palpatine"             "Plo Koon"             
## [58] "Poe Dameron"           "Poggle the Lesser"     "Quarsh Panaka"        
## [61] "Qui-Gon Jinn"          "R2-D2"                 "R4-P17"               
## [64] "R5-D4"                 "Ratts Tyerell"         "Raymus Antilles"      
## [67] "Rey"                   "Ric Olié"              "Roos Tarpals"         
## [70] "Rugor Nass"            "Saesee Tiin"           "San Hill"             
## [73] "Sebulba"               "Shaak Ti"              "Shmi Skywalker"       
## [76] "Sly Moore"             "Tarfful"               "Taun We"              
## [79] "Tion Medon"            "Wat Tambor"            "Watto"                
## [82] "Wedge Antilles"        "Wicket Systri Warrick" "Wilhuff Tarkin"       
## [85] "Yarael Poof"           "Yoda"                  "Zam Wesell"           
## 
## $class
## [1] "factor"
as.numeric(x)
##  [1] 46 15 62 21 44 54 10 64 12 53  3 84 17 33 30 35 82 38 86 56 13 34 14 43 45
## [26]  1 50  4 83 51 61 52 27 37 69 70 68 81 73 60 75 20 11  5 25 29  9 48 40 41
## [51] 26  2 71 85 57 49 31 19 18 59 47  7 24 23  6 36 87 22 42 78 39 65 63 80 72
## [76] 74 32 77 66 76 79 28 67 58  8 16 55
as.numeric(starwars$name)
## Warning: NAs introduced by coercion
##  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [76] NA NA NA NA NA NA NA NA NA NA NA NA

List-columns

Since data.frames are lists, their columns can be objects of arbitrary type. In particular, they can be lists.

The films column in starwars is a list-column. Each entry contains a list of the movies that the corresponding character has appeared in.

films <- starwars %>% 
  pull(films)
films

Note that the length() of films is 87, but that each entry in films contains a list of arbitrary length. To see these lengths, we have to map() over the entries in films.

length(films)
## [1] 87
map_int(films, length)
##  [1] 5 6 7 4 5 3 3 1 1 6 3 2 5 4 1 3 3 1 5 5 3 1 1 2 1 2 1 1 1 1 1 3 1 2 1 1 1 2
## [39] 1 1 2 1 1 3 1 1 1 3 3 3 2 2 2 1 3 2 1 1 1 2 2 1 1 2 2 1 1 1 1 1 1 1 2 1 1 2
## [77] 1 1 2 2 1 1 1 1 1 1 3

nest() and unnest()

List-columns can be expanded by unnest(). This has the effect of lengthening the data frame (sort of like an accordian). Each row is duplicated for each unique value of each entry in the list-column.

Note that each row in starwars corresponds to one character, while films stores the list of films that character has appeared in. If we unnest() the data frame by expanding out the films, we get a data frame that is much longer, because each row now represents one character in one film.

library(tidyr)
starwars %>%
  unnest(films)

Note that films is no longer a list-column – it’s now a character vector.

The nest() function performs the opposite operation of “rolling up” the data frame to create a new list-column.

  1. Experiment with list-columns by expanding and contracting the other list-columns in the starwars data frame.

Mapping over list columns

Suppose now we want to add the numbers of films for each character to the starwars data set. A simple mutate() like this will not throw an error, but also won’t do what we want.

oops <- starwars %>%
  mutate(num_films = length(films)) %>%
  arrange(desc(num_films)) %>%
  select(name, num_films)
oops

This just made all of the entries equal to length(films).

all(oops$num_films == length(starwars$films))
## [1] TRUE

To get this right, we need to map() inside our mutate().

starwars %>%
  mutate(num_films_actual = map_int(films, length)) %>%
  arrange(desc(num_films_actual)) %>%
  select(name, num_films_actual, films)

Engagement

Take a minute to think about what questions you still have about vectors. Review what questions have been posted (in the #questions channel) recently by other students and either:

  • respond (e.g., react, comment, clarify, or answer)
  • post a new question

Here is prompt to prime your thinking:

Where did you stuck in this lab?