In this lab, we will learn how to use GitHub for version control.

Goal: by the end of this lab, you will be able to commit, push, pull, and send pull requests.


What is version control?

Version control is a mechanism for collaborative software development that preserves histories. The major objective is to keep track of all the different changes that get made, so that nothing is lost and you can always go back to any previous state.

Version control systems have been in use for a long time, and many different systems have been used. Currently, git is the dominant verson control system. git is a standalone command line application. Interfaces to git include an official git GUI, and a built-in tab in RStudio.

Most of the things we do with GitHub can be done in RStudio, but it is occassionally necessary to use the command line (and you should embrace that!).

GitHub is a website that hosts many git projects. We will be using GitHub extensively, mainly through our dedicated GitHub organization.

For questions about git and GitHub, please see Jenny Bryan’s excellent book on the subject: Happy Git and GitHub for the useR. In particular, please read the troubleshooting chapter when you run into trouble!

In this lab, we will focus on the use of two functions, as detailed here:

Verifying your connection to GitHub

The git_sitrep() function provides comprehensive information about the status of your connection to GitHub.

library(usethis)
git_sitrep()
## Git config (global)
## • Name: 'rudeboybert'
## • Email: 'albert.ys.kim@gmail.com'
## • Global (user-level) gitignore file: <unset>
## • Vaccinated: FALSE
## ℹ See `?git_vaccinate` to learn more
## ℹ Defaulting to 'https' Git protocol
## • Default Git protocol: 'https'
## • Default initial branch name: <unset>
## GitHub
## • Default GitHub host: 'https://github.com'
## • Personal access token for 'https://github.com': '<discovered>'
## ✖ Can't get user information for this token.
##   The token may no longer be valid or perhaps it lacks the 'user' scope.
## Error in gh_process_response(raw): 
## GitHub API error (401): 
## Message: Bad credentials
## Read more at https://docs.github.com/rest
## 
## Git repo for current project
## • Active usethis project: '/Users/akim04/Documents/Teaching/6. Adv Prog/SDS270'
## ✖ The 'origin' remote is configured, but we can't determine its default branch.
##   Possible reasons:
##   - The remote repo no longer exists, suggesting the local remote should
##     be deleted.
##   - We are offline or that specific Git server is down.
##   - You don't have the necessary permission or something is wrong with
##     your credentials.
## • Default branch: 'master'
## • Current local branch -> remote tracking branch:
##   'master' -> 'origin/master'
## GitHub remote configuration
## • Type = 'maybe_ours_or_theirs'
## • Host = 'https://github.com'
## • Config supports a pull request = NA
## • origin = 'rudeboybert/SDS270'
## • upstream = <not configured>
## • Desc = 'origin' is a GitHub repo and 'upstream' is either not configured or is not a GitHub repo.
##   
##   We may be offline or you may need to configure a GitHub personal access
##   token. `gh_token_help()` can help with that.
##   
##   Read more about what this GitHub remote configurations means at:
##   'https://happygitwithr.com/common-remote-setups.html'

If you see errors in your output, investigate them!

Please see this article for comprehensive documentation.

The gh_token_help() function is also helpful for diagnosing issues with your token. Note the “Token scopes” in the output below.

gh_token_help()
## • GitHub host: 'https://github.com'
## • Personal access token for 'https://github.com': '<discovered>'
## ✖ Can't get user information for this token.
##   The token may no longer be valid or perhaps it lacks the 'user' scope.
## Error in gh_process_response(raw): 
## GitHub API error (401): 
## Message: Bad credentials
## Read more at https://docs.github.com/rest

Making a contribution

In this first group exercise, each student will work individually to send a pull request to the maintainer (me) of a single repository. When you make a contribution to someone else’s repo, this is how you will do it. (See also https://happygitwithr.com/fork-and-clone.html)

Setting up the local repo

  1. Run usethis::create_from_github("sds270-s22/git-demo", fork = TRUE)

STOP. If Step 1 worked, proceed to “Remote verification” below. If Step 1 failed, pursue the following steps as necessary. THINK before you act!

  1. Fork the course repo on GitHub.
  2. Clone your fork and make a new project in RStudio.
  3. Set up your upstream remote

Remote verification

By now, you should have your fork set up at https://github.com/sds270-s22/git-demo. Next, we will verify that you have your upstream remote set up as well.

  1. Run git remote -v in the Terminal (not in the R console). You should see something like this:
## bash: line 0: cd: /Users/akim04/Desktop/git-demo: No such file or directory
## origin   https://github.com/rudeboybert/SDS270.git (fetch)
## origin   https://github.com/rudeboybert/SDS270.git (push)

Making changes

  1. Run usethis::pr_init("<NAME>"), where in place of <NAME> you write your one-word name (all lowercase, no spaces or punctuation).
  2. Add your first and last name, with a link to your GitHub user page, to README.md.
  3. Commit your changes.
  4. Push.
  5. Run usethis::pr_push() to send a pull request.

I will resolve all pull requests.

Collaborating on a project

Sync your fork

You need to sync your fork regularly (like every time you finish a pull request). Entering these commands in the Terminal should do it.

git fetch upstream
git merge upstream main

Note that usethis::pr_init() attempts to perform this operation, so if you are able to use pr_init(), you may not have to sync your fork manually via the command line.

Merge conflicts

If two or more people commit changes to the same part of the same file, a merge conflict is inevitable. With good git hygiene and clear project roles, the probability of a merge conflict can be minimized. But they will happen and you need to know how to resolve them.

A side-by-side comparison of the set of changes is helpful. A diff is a way to view these changes. Several editors will perform this comparison. I use meld. Another program is opendiff. You can use whatever you want!

Getting credit

Please respond to the following prompt on Slack in the #questions channel:

Prompt: What would help improve your comfort level with GitHub?