Getting the most of git

Why git?

Do you have something like this in your computer?

/home/pao/Documents/thesis
├── abstract.R
├── thesis.Rmd
├── thesis_reviwed.Rmd
├── thesis_reviwed2.Rmd
├── thesis_final.Rmd
├── thesis_finalfinal.Rmd
├── this_is_it.Rmd
├── now_this_is_it_for_real_this_time_i_swear.Rmd
└── FINAL.Rmd

Probably we all have, or had something like this at one point because we didn’t use a version control system. Version control systems manage the evolution and changes of a set of files that we’ll call repository. If you ever looked at the history of a Google Docs file, it is like that but in a very controlled way. Git is one popular version control system.

If you work alone, git is great to track changes and recover previous version of your files. You can also use a remote repository (more later) to have a back up and share your work.

If you work as a team you can take advantage of all the above and also use version control as a tool to collaborate and organize the various versions of the same file present in the multiple computers you and they use.

But what we mean by version control?

Let’s imagine that we have a repository already working (later we’ll see how to create one). When you create a new file as part of the repository (or repo), that file is untracked or unversioned. This means that git will ignore the file and any change you made until you add it to the repo. At that point the file is staged and it ready to get into the repository. To do that you do a commit and save that version of the file to the repo. This workflow modify then add then commit will repeat every time you want to save a version of the file. We don’t recommend making a commit every time you save the file or change a comma, and its also not a good idea to make a commit with a billion of changes. With practice and depending on how do you work, you will find a comfortable middle ground.

The figure shows the states of a file: untracked or tracked. When it is tracked by the repository, the file can be staged, committed or modified. With the add action the file is “staged”, with the commit action the file is “saved” to the repository. This cycles repeats teach time the file is modified.

So, we mentioned add and commit, these are git commands. If you have experience working with the command line you could use git from there, but the same commands can be executed from a GUI like GitHub Desktop or GitKraken. For this version of the materials we’ll use the git commands and the command line or Terminal.

The core git commands are:

  • git add <file> to add a file to the staging area.
  • git commit -m "A very descriptive message" to commit the files on the staging area.
  • git status to check the status of each file in the repository.

Did I mentioned remote repository?

Yes, before we went through a local workflow. The repo lives in your computer and that’s it. But, you can also connect the local repository with a remote repository. For this workshop we are going to use GitHub to host remote repositories, but there are other options you can explore, such as GitLab.

Let’s imagine we have a local repository, we made some commits and we want to send those changes to the remote repository (we’ll see how to create the remote repository later). For that we push the commits to the remote repository and then, the two repos are “up to date”.

Later, a team mate makes a change to a file in their local repo and push it to the remote repository. Now, your local repo is “outdated” and you need to download those new commits from the remote repository to your computer. You pull the commits.

Concept model of a workflow using RStudio projects and git. Files are added to the staging area, and then commited to the local repository. You can push commits to the remote repository in GitHub and pull new commits from the remote repository to your computer.

Tools like GitHub also include features that help you to collaborate and manage repositories. For example, you can modify files and commit those changes using the web interface.

The git commands to work with a remote repository are:

  • git push to upload commits to the remote repository.
  • git pull to download commits from the remote repository to the local repo.

Introduce yourself to Git

Before you create your first repository you need to make sure that git knows you. If you follow the pre-workshop instructions, you have git installed in your computare and you own a GitHub account.

To introduce yourself to git, that is to let it know your name and email you can use the command line package.

git config --global user.name 'Jane Doe'
git config --global user.email 'jane@example.com'

Substituting with your name and the email associated with your GitHub account. That’s it!

Creating a new repository

There are many ways to start a new repository, locally in your computer using the command line or an interface or from GitHub (or its friends). Here we’ll show you how to create a repo from GitHub, associate it to an RStudio project (or simple use it without R or RStudio) and work with it. But keep in mind that there are many other ways of working with git.

1. Create an online repository.

  • Go to github.com and log in.
  • On the header of the page, navigate to the new “Create new” menu, and then select “New repository”

In the “Create a new repository” page navigate to complete the necesiry information:

  • Repository template: No template.
  • Repository name: myrepo or whatever you wish to name your new project.
  • Description: Any short description of the project. Write this for humans.
  • Visibility: Public.
  • No README file
  • .gitignore template: None
  • Licence: None

Finaly, select the “Create Repository” button.

Before going back to the command line, copy the url for the repositor. And example is https://github.com/paocorrales/myrepo.git

2. In the command line:

  • Browse to the parent folder you want to use for you repository.
  • type git clone <remote repository url> using the url you just copied.

You may get a warning saying: “warning: You appear to have cloned an empty repository.”. Don’t worry about that. We’ll create a file soon.

The new folder in your computer will be a git repository, linked to a remote GitHub repository. There you can work, for example with an RStudio Project. This workflow also makes sure that all the configuration between the local and remote repos are done correctly.

In this case it won’t add a .gitignore file. You may want to add one in the future, as this file includes a list of files that we don’t need to track (i.e. .Rhistory).

Again, these are the key commands:

If you already have a repote repository:

  • git clone <repo url>

If you want to start a local repository from scratch in the folder you are located:

  • git init

Local changes

It’s time to put into practice some of the thing we’ve been talking.

Add, commit

  1. Create a new file and save it. It can be any type of file.
  2. Check the status of the files in the repository with git status. The file you’ve just created will be unther “Untracked files”
  3. Add the to the staging area with git add <file name>. You can check the status of the repo again to see what changed.
  4. Commit the file with git commit -m <message>. You’ll need to add a descriptive message!
  5. Check the status of the repo again.

If you have commited all the changes in the file you will get a message saying “nothing to commit, working tree clean”.

If everything went OK, you started tracking files, made changes, committed them to the local repository. You won’t see any changes on GitHub until you push those commits. You can do this at the end of the day if you prefer but if you work with others it may be a good idea to push everything after you do each commit.

Push!

  1. Now, push the commits to the remote repository with git push.

Because is the first time you push commits to the remote repository, git will automaticaly create a new branch. We won’t worry about branch for now.

Remote changes

Let’s come back to GitHub. If you refresh the page, now you’ll see the files you committed just now. If you navigate to the “commit” or “commits” link you can access the the history of the repository. From this view, you can explore the repository at the “state” of each commit and look at the file differences. Let’s return to the previws page.

From the main page of the repository, we can try to make changes to the files.

Create a README

  1. On the main page, navigate to the bottom that says “Add a README”.
  2. Don’t change the name of the file.
  3. Add something to the content of the file. READMEs usually are written in Markdown and contain information about the repo.
  4. Navigate to the “Commit changes…” button to save and commit the file.
  5. Add a message.
  6. Navigate to the “commit file” button.
  7. Come back to the main page to see the README.

The new file and the changes you do on GitHub are only on the remote repository until you make a pull from the local repo. If you make changes on the local repo while it not up to date, you may encounter merge conflicts, which lead to headaches. This happens when the version of a file on the local repo is not compatible with its version on the remote repo. In those cases, git cannot decide which version is the right one and you have to do it yourself.

To avoid this problem (most of the time), you should do a pull before start doing anything else. Most of the time the command line will show the “Already up-to-date” but it is good to make this a habit.

Pull from GitHub

  1. Go back to Terminal.
  2. Type git pull.
  3. Git will download the new changes from GitHub and tell you how many files were changed.

Now both repositories are sincronized.

Anatomy of GitHub Repo

  • Readme files. Use a README.md file to explain what your project is, and how use it. README.md is the file that is automatically displayed when you open a GitHub repo.

  • License. The license tell people how they can use the content of your repo. Generally, we use permissive licences and people can do almost anything with the materials. Examples are the MIT Licence or Apache. Some extra resources:

  • Contributing guide. A file called CONTRIBUTING.md and guidelines for contributors so they know what they should do if they want to help you out.

  • Code of Conduct. Good projects have codes of conduct to make sure that people are treated well. Github has an Code of Conduct wizard to make it easy to add one.

  • Issues. Allows you to manage the project.

Collaborating with others

When collaborating on a project you can find yourself in one of two scenarios

  • You have write permission of the repo you are working on.
  • You don’t have write permission on the repo.

The first situation is the most common when working together with an team on a long term project. You can clone the repo following the these same instructions and use the same workflow as before. When you try to push your changes to the remote repository, GitHub will check if you have the rights to modify the repo. The only difference between this and working on your own is that you need to communicate well so that two people are not working on the same thing. But this is true with or without git.

The second situation is common when contributing to strangers’ projects, such as fixing bugs in open source packages. In this case, you need to first fork the repository, which creates a remote repository linked to your GitHub account to which you will have complete access. Now you work in this fork using the normal workflow: modify, add, commit and push. But for your changes to be integrated into the original repository, you need to create a pull request (or PR). This will let the owner of the original repo know that you made some changes that you think should be merged. The owner can then accept, reject or ask for modifications. Once they accept your pull request, your modifications will be safe in the original repository.

Concep model of the remote workflow. A foreign remote repository can be forked to an owned remote repository with the “fork” command. The remote repository is copied into a local repository with a the “clone” command. Files inside the local repository and the remote repository are synced with push and pull. The owned remote repository can be merged into the foreign remote repository with a pull request.

Let’s plant some trees

As a fun example will collaborate with the recently open Flametree galley and do some pull request on the way.

Fork the repository

  1. Go to github.com/paocorrales/flametree_gallery
  2. Fork the repository navigating to the “Fork” button on the top of the page.
  3. You don’t need to change the information.
  4. Navigate to the “Create fork” button to create a copy of the repo in you GitHub account.

Screenshot of a repository showing where the “fork” button is.

Now, you have a copy of the repo in your GitHub account.

  1. Copy the url of the repo and clone it to your computer following the instructions to create a new repo.

Screenshot of a forked repo showing where to copy the url to clone the repo.

Now you have an RStudio project (that is also a repo) that is a copy of the one in the original repo. It’s a very small project with a few file. template_script.R includes the necessary code to create beautiful images of trees from R code:

Image of trees created using R

It uses the flametree library created by Danielle Navarro. How it works is not part of the topics of the workshop (sadly!) but it’s a good excuse to practice doing pull requests.

Make a change in the repo

  1. Make a copy of the template_script.R file and change it’s name to <your-name.R>

  2. Change the value of the first 3 variables in the script:

    • name your name.
    • seed there are some random things happening there.
    • shades choose 4 colors.

You can run the code if you want.

  1. Save the file, add it to the staging area and commit it (include only the .R file you created).
  2. Push it to your remote repository.

At this point you’ve made a change in your local repo (that is a copy of the forked repo) and pushed those changes to your remote repo in GitHub. The original repo doesn’t know this.

If you want to contribute the changes you made to the original repo, it’s time to make a pull request.

Make a pull request

  1. Go to your repo in GitHub, you will see that you made the last commit.

Screenshot of a forked repo in GitHub with a new commit. It shows how to star a new pull request.
  1. Navigate to the “Contribute” button.
  2. And then to “Open pull request”.
  3. We are almost there. In this page you can change the title of the pull request and write a message for the owner of the original repo.
  4. Navigate to “Create pull request” button and press enter to finish the pull request.

Screenshot of GitHub showing how to add a title, message and finish a pull request

And that’s it!

The owner/s of the repo will receive an email. They can review the pull request, comment on it and eventually accept the contribution. When that happen the changes you made will apear in the repo.

Resources

Happy git with R, by Jenny Bryan

Taller de git developed by Yanina Bellini Saibene and Marysol Gatti (in Spanish)