In this lab, we will learn how to put data in an R package.

Goal: by the end of this lab, you should be able to put data in an R package.

Creating a package

Please see the R Package lab for help setting up a fresh R package. In this lab we will add data to that R package.

Adding data

In this lab we will add the NYC Italian restaurants data set to our package. A CSV of the data is located at:

http://gattonweb.uky.edu/sheather/book/docs/datasets/nyc.csv

  1. Use the use_data_raw() function from usethis to add a new script for processing the raw data. Note that the first argument to this function is the name of the data set. Call it italian.

  2. Write the italian.R script. Use readr::read_csv() to read the data directly from the URL. Confirm that the data looks good, and perform any additional data cleaning you want. A good idea would be to convert the variable names to snake_case.

The data-raw folder is the place where we keep the raw data and (perhaps more importantly) the script that we wrote to get that raw data. However, the data-raw folder is not part of the package, because it is ignored in .Rbuildignore.

  1. Examine the contents of .Rbuildignore and find the line that ignores data-raw.

Note, however, that data-raw is part of the repository, because we need to keep track of these files!

  1. Examine the contents of .gitignore and note that data-raw is not ignored by any of the lines.

In order to get the data bundled with the package, we have to put it in data. To do this, use the use_data() function.

  1. Make sure the last line of the script in data-raw invokes use_data(). Run it!

If you did this correctly you should now see a file called italian.rda in the data folder.

  1. Clear your workspace and rebuild and reinstall your package. Confirm that you can run italian.

  2. Increment the version number of your package.

  3. Run R CMD check. Read about any warnings, errors, or notes.

Documenting data

Data sets need documentation just as much as functions. However, documenting a data set is different than documenting a function.

  1. Create a new file called “data.R” in the R folder (if there isn’t one already).

  2. Write "italian" in that file.

  3. Use the roxygen tag @docType data. Rebuild the package.

  4. Run ?italian and read your documentation. Flesh it out by adding important details.

Adding a README

  1. Use the use_readme_rmd() function to add a README to your package, if there isn’t one already.

Engagement

Prompt: Where did you get stuck in this lab? What specific steps could use further explanation?