Contents
Git is a version control system. You can use Git to turn any folder into a repository with version control. This means you can make changes in the folder and commit them to a new version of that folder. If you later regret the changes, or want to figure out how that nasty bug got into your code, you can check out previous versions of the folder, and revert to them.
In addition to this, Git can be used to collaborate in a team. You can make separate branches in the repository in which individual developers can work on different functionality. These branches are complete versions of the repository to which changes can be made. Each branch will have their own versioning history. Once the implementation of a new function is done and tested, the branches can be merged again.
This tutorial is split into three parts: one part focussing on version control within a local repository, one part that focuses on branching and merging, and one on collaborating using a remote repository, for example on GitHub.
Git is a command line tool. Although a plethora of GUI-based applications for interaction with Git exist, in this tutorial we are going to stick with the command line interface, as it is the most universal way to interact with Git: it will even work on remote computers, like computational servers, over SSH. Like in the Linux chapter, we are using the convention that the '$'-sign indicates a command line prompt, and that any command you type should not include this sign. It is best if you follow along with the tutorial, because the exercises assume that you have a repository with a version history. We are going to use the coding of a Vector
class in Python as an example, but it is not necessary to fully understand the Python code to follow the tutorial.
You can install Git-on-windows from this link: https://git-scm.com/download/win.
Open a Terminal window and type
$ git --version
which will automatically launch the installer.
On most Linux distributions, Git is installed by default. If Git is not installed, in Ubuntu you can install it by typing the following command in the terminal:
$ sudo apt install git
Note that this requires superuser priviledges.
You can test if Git is installed correctly by opening a Terminal window (e.g. Cygwin on Windows) and typing
$ git --version
which should print git version 2.17.1
or something similar. If that's the case you can also configure Git, by letting it know who you are:
$ git config --global user.name "Amalia van Oranje"
$ git config --global user.email "[email protected]"
Let it also know which editor you want to use, for example if you want to use Notepad++ on Windows you can type
$ git config --global core.editor "'C:/Program Files/Notepad++/notepad++.exe' -multiInst -nosession"
or to use Gedit on Linux
$ git config --global core.editor 'gedit'
or TextEdit on macOS
$ git config --global core.editor 'open -e'
Fire up your terminal application (e.g. Cygwin on Windows) and navigate to the folder you want to place the repository in, for example your documents folder. If you do not know how to do this, we refer you to the Linux tutorial.
Once in the correct folder, make a new folder called my_repository
and navigate to it by typing
$ mkdir my_repository
$ cd my_repository
Now let's turn this folder into a repository, by typing
$ git init
If Git is installed correctly, it will respond with something like
Initialized empty Git repository in /some/long/path/my_repository/.git/
Now, to check the repository's status, you can type
$ git status
which will say
On branch master
No commits yet
nothing to commit (create/copy files and use "git add" to track)
This shows that there have not been any changes to the repository yet (you have added no files or folders yet), and that you are currently on the branch called 'master'. For now, you can ignore any information on branches until the second part of this chapter.
During this chapter, we will use git status
frequently to keep track of what is happening with our repository.
Let's add a file to our repository called vector.py
. In this example file, we are going to code a class to work with vector objects in Python. Open the file and copy-paste the following code:
class Vector:
def __init__(self, *elements):
self.elements = elements
and save the file.
If you now type git status
, the following message should be shown:
On branch master
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
vector.py
nothing added to commit but untracked files present (use "git add" to track)
Git has noticed you have added a file to the repository. However, it reports the files is not tracked yet, that is, Git does not save versions of this file. (Also, if you push this repository to GitHub, the file will not be included, as we will see later.)
To let Git track the file, type
$ git add vector.py
If you now again look at the status, it will print
On branch master
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: vector.py
Git tells you that, should you commit a new version, the addition of the new file vector.py
will be part of the commit. Let's do just that. Type
$ git commit -m 'Added vector.py'
[master (root-commit) ee1af3e] Added vector.py
1 file changed, 3 insertions(+)
create mode 100644 vector.py
The -m
flag lets you include a message for this commit, that you can/should use to document what you did since the previous commit. If we now again check git status
, it prints
On branch master
nothing to commit, working tree clean
Your working tree is clean, which means: no new files have been added that are not tracked, and no new changes to the already existing files have been made. Git keeps a log of all the commits that you have made. Type
$ git log
too see this:
commit 7fa5007bc326ff8a4bf78912f41a21130d8165b9 (HEAD -> master)
Author: Amalia van Oranje <[email protected]>
Date: Wed Apr 10 13:05:31 2019 +0200
Added vector.py
(END)
You can close this view by pressing q.
Let's change the vector.py
by adding a __repr__()
method to the class:
class Vector:
def __init__(self, *elements):
self.elements = elements
def __repr__(self):
s = '['
for x in self.elements:
s += str(x) + ', '
s += ']'
return s
If you look at the status now, it says that the file has changed:
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: vector.py
no changes added to commit (use "git add" and/or "git commit -a")
If you want to see exactly what has changed between the last commit and the current state,
you can type git diff
which will show the added method only.
We can commit this change by typing
$ git add vector.py
$ git commit -m 'Added __repr__() method'
You first need to 'stage' the changed files you want to commit with git add
before committing them. If you have many changed files, this becomes quite cumbersome, which is why you can type
$ git commit -a -m 'Added __repr__() method'
instead, where -a
stands for 'all changed files'. When adding new files you will still need to explicitly use git add
however. You can again use git log
to see the history of commits, which will show the new commit above the first one.
When removing files, there is the equivalent git rm
. This will both remove the file itself, as well as stop the version tracking of it:
$ git rm vector.py
If you now type ls
, no files will be shown. Let's first commit this new change, then worry about how to get that file back:
$ git commit -a -m 'Deleted vector.py'
By the way, the history of the repository can be summarized in diagrams like below, which we will use in the next section to explain how to undo any unwanted changes. The master
and HEAD
labels will be discussed later.
The history of the repository so-far contains three commits.
To revert the commit in which we deleted the file, we want to go back to the state of the repository one commit before that. In Git, we can do this with with the git revert
command. There are two ways of specifying to which commit you want to return: by specifying the identifier of the commit that is shown in git log
or by using relative refererences.
In the git log
you can find the identifier of the commit you want to revert. In this case it is 52e352fbb0caf74c631c1054da1e6dcd4c690786.
commit 52e352fbb0caf74c631c1054da1e6dcd4c690786
Author: Amalia van Oranje <[email protected]>
Date: Wed Apr 10 14:11:36 2019 +0200
Deleted vector.py
commit 907ae2e2338cc302adca23d17cf2cfc72ed623d4
Author: Amalia van Oranje <[email protected]>
Date: Wed Apr 10 14:00:18 2019 +0200
Added __repr__() method
commit 7fa5007bc326ff8a4bf78912f41a21130d8165b9
Author: Amalia van Oranje <[email protected]>
Date: Wed Apr 10 13:05:31 2019 +0200
Added vector.py
Luckily you do not have to type in the whole thing. Only the first seven characters suffice, but, even better, if you type only the first few and press Tab it will auto-complete.
$ git revert 52e352f
This will open the editor you set in the configuration (at the start of this tutorial) in which you can type a commit message, although a
default message reading Revert "Deleted vector.py"
is provided there already.
Upon quitting the editor, the reversion is committed, and the file vector.py
should be back in the folder. The history now looks like this:
Reverting results in a new commit that does the reverse of the reverted commit.
If you know how many commits you want to go back, you can use a relative reference when reverting. You can specify a reference relative to the current HEAD. HEAD is a tag that is always attached to the last commit of the current branch (more on branches later). If you want to undo the last commit, you revert the commit tagged HEAD, by typing this:
$ git revert HEAD
This will have the same effect as reverting by identifier.
This should only be done on local repositories or branches, i.e. branches that are on your own computer. If you use it on remote repositories it can result in data loss.
You can undo multiple commits using git reset --hard
. You specify the commit you want to return to using an identifier or a relative reference. For example, you can add the identifier
$ git reset --hard 1d27a9f
or use the relative reference
$ git reset --hard HEAD~2
to reset to the state of the repository two commits back. Note that this behavior is different from revert
which reverts only those changes in the commit you specify, while reset
removes all commits after the commit you specify. reset
also does not require you to make a new commit because of this.
To test git reset
, you can reset back to the commit in which you added the __repr__()
method, which will the commits in which you deleted vector.py
and reverted this deletion. If you followed the tutorial so-far, you can establish this with
$ git reset --hard HEAD~2
If you now look at git log
you will see that the history of the repository has changed because two commits have been removed. This is exactly the reason why it is dangerous when you use it on remote repositories that are also used by others. This is reflected in this diagram:
Resetting removes any subsequent commits from the history.
Using git diff
you can also see differences between commits. For this you again need an identifier or reference. For example, if you want to know what changed in the last commit, type
$ git diff HEAD~1
Finally, there is one more useful command that you can use to check out the status of the repository in a commit. Using git checkout <IDENTIFIER>
or git checkout <RELATIVE REFERENCE>
to set the state of the folder to that commit. You can use this to check out the contents of files in that commit.
$ git checkout HEAD~1
Note: checking out 'HEAD~1'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
HEAD is now at ee1af3e Added vector.py
You are now in a detached HEAD state. That means that the commit you have checked out is no longer on a branch. HEAD is a label that points to the currently checked out commit, but the master
branch in which you have been working is no longer associated with the same commit as HEAD is.
Checking out a commit moves the HEAD to that commit. The working directory will now contain the files in the state at that commit.
To re-attach your HEAD and go back to the last commit, you type
$ git checkout master
(At least, assuming you have not changed branches, more on that later.)
Re-attached HEAD by checking out `master`.
Be careful though. If you make changes in a detached HEAD state, these will not be saved, unless you commit them to a new branch, and merge them. That is possible, but is beyond the scope of this tutorial.
-
Add a method
__len__()
to theVector
class:python def __len__(self): return len(self.elements)
Commit the change and add an appropriate message.<details><summary>Answer</summary><p> Command: `git commit -a -m 'Added __len__() method' </p></details>
-
Add a file called
test.py
with the following content:python from .vector import Vector v1 = Vector(1, 2, 3) v2 = Vector(3, 2, 1) print(v1) print(len(v2)) print(v1 + v2)
And commit this change to the repository <details><summary>Answer</summary><p> Commands: ```bash $ git add test.py $ git commit -a -m 'Added test module' ``` </p></details>
-
Delete the
test.py
and commit this change to the repositoryAnswer
Commands:
```bash $ git rm test.py $ git commit -a -m 'Removed test module' ``` </p></details>
-
Go back to the previous commit to restore the test module
Answer
Command:
```bash $ git revert HEAD ``` to revert (undo) the last commit or ```bash $ git reset --hard HEAD~1 ``` to go back to the state of the repository one commit back. </p></details>
-
Check the log of your repository.
Answer
Command:
```bash $ git log ``` This should return something like this: ```bash commit 83b56a05dcfbb3dfdcc9b0a4afc1fcaceb06c7ca (HEAD -> master) Author: Amalia van Oranje <[email protected]> Date: Wed Apr 24 11:58:54 2019 +0200 Added test module. commit ea580441c928ac34b7e7ec840154868c24236717 Author: Amalia van Oranje <[email protected]> Date: Wed Apr 24 11:58:23 2019 +0200 Added __len__() method commit 1d27a9f3b571a4a12e1d157b4abd036ec089da44 Author: Amalia van Oranje <[email protected]> Date: Wed Apr 24 11:36:30 2019 +0200 Added __repr__() method commit ee1af3e89c2d41be17c85ad3d0d6383836258ec8 Author: Amalia van Oranje <[email protected]> Date: Wed Apr 24 11:34:02 2019 +0200 Added vector.py ``` </p></details>
Consider the following situation: you have a perfectly working program in a folder full of wonderful code. Now, someone asks you to add some extra functionality to your code. This new code might break your perfect code, introduce bugs, or otherwise disturb your quiet life. Of course, Git allows you to go back in time to when things were working fine, but that would also remove all the work you did on the new functionality.
In this case, it is best to make a new branch in your repository. Whenever you make a branch, the history of the repository splits in two. You can make commits in the new branch without it affecting the history or code in your main master branch.
The way this is often used is to have a stable master branch that contains well-tested code, and a development branch that has new functionality that has not been fully-tested yet. Once the new code is fully tested, you can merge the changes in the development branch into the master branch.
Creating and switching branches can be done using the git checkout
command. In fact, switching branches is a little similar to switching to previous commits. Let's create a new branch called addition
:
$ git checkout -b 'addition'
Git will respond with Switched to a new branch 'addition'
.
In this branch you can make changes to the code. For example, we can add an __add__()
method to our Vector
class, that returns the number of elements:
class Vector:
def __init__(self, *elements):
self.elements = elements
def __repr__(self):
s = '['
for x in self.elements:
s += str(x) + ', '
s += ']'
return s
def __len__(self):
return len(self.elements)
def __add__(self, other):
assert len(self) == len(other)
sums = []
for a, b in zip(self.elements, other.elements):
sums.append(a + b)
return Vector(*sums)
Now, we commit this change:
$ git commit -a -m 'Added __add__() method'
Remember that this change is only reflected in the addition
branch we are in. The master
branch has not had the same update. Let's check that out:
$ git checkout master
Switched to branch master
$ more vector.py
class Vector:
def __init__(self, *elements):
self.elements = elements
def __repr__(self):
s = '['
for x in self.elements:
s += str(x) + ', '
s += ']'
return s
def __len__(self):
return len(self.elements)
To get an overview of all branches, use the command git branch
, which will show a list of the branches with the currently checked out branch indicated with an asterisk. You can close this view by typing q.
Let's assume the new __add__()
method has been extensively tested, and it works perfectly fine. We can now merge the feature branch addition
into the master branch. This is done by first checking out the master
branch if you haven't done so already,
$ git checkout master
Already on 'master'
and typing
$ git merge addition
If you want, you can now delete the branch by typing
$ git branch -d addition
If you do this without merging first, Git will give you a warning. If you really want to get rid of the unmerged branch, you can replace -d
with -D
to force the deletion.
Suppose you have made two new additional features to the Vector
class, each in their own branch from master. Let's call the branches feature1
and feature2
, like this (don't type these commands, they are just an example):
$ git checkout master
$ git checkout -b feature1
Switched to branch 'feature1'
... adding feature 1 ...
$ git commit -a -m commit 'Added feature 1'
[feature1 b016669] 2
1 file changed, 1 insertion(+)
$ git checkout master
Switched to branch 'master'
$ git checkout -b feature2
Switched to branch 'feature2'
... adding feature 2 ...
$ git commit -a -m commit 'Added feature 2'
[feature2 b016669] 2
1 file changed, 1 insertion(+)
Now you want to merge both into master, so you checkout master, and merge the first feature:
$ git checkout master
Switched to branch 'master'
$ git merge feature1
Updating 4e26bf7..b016669
Fast-forward
vector.py | 1 +
1 file changed, 1 insertion(+)
The master
branch and the feature1
branch are now on the same commit.
So far so good. Now, we also merge feature2
:
$ git merge feature2
This will result in a warning:
Auto-merging vector.py
CONFLICT (content): Merge conflict in vector.py
Automatic merge failed; fix conflicts and then commit the result.
The reason for this is that both the master
branch and the feature2
branch have both been changed after the branching point at commit 83b56a0
, and both have changes to the same file(s). This results in Git not knowing which of the two existing versions is the 'truth'. Should the result have feature1 but not feature2, or feature2 but not feature1, or both, or neither?
This is called a merge conflict, and they are particularly abundant when collaborating. Luckily, merge conflicts are easy to solve. If you open the vector.py
file you will see that Git has moved both features in the file, and you get to pick which version you want by removing text. For example, if the __len__
and __add__()
were added in feature1
and feature2
respectively, the file could look like this:
class Vector:
def __init__(self, *elements):
self.elements = elements
def __repr__(self):
s = '['
for x in self.elements:
s += str(x) + ', '
s += ']'
return s
<<<<<<< HEAD
def __len__(self):
return len(self.elements)
=======
def __add__(self, other):
assert len(self) == len(other)
sums = []
for a, b in zip(self.elements, other.elements):
sums.append(a + b)
return Vector(*sums)
>>>>>>> feature2
The part between <<<<<<< and >>>>>>> is different in the master
and feature2
branches. The part above the =======
is in master
, the part below in feature2
. In this case, you can resolve the conflict by simply removing the lines with <<<<<<<< HEAD
, =======
, and >>>>>>> feature2
and saving the file. Then a new commit will close the conflict definitively:
$ git commit -a -m 'Merged feature2 into master and solved merge conflict.'
In the exercises we will see how to solve a merge conflict when two versions of the same function exist in two branches.
-
Create a new branch with an appropriate name and add the following method to the
Vector
class:```python def __abs__(self): return Vector(*[abs(x) for x in self.elements]) ``` Commit the change and add an appropriate message. <details><summary>Answer</summary><p> Commands: ```bash $ git checkout -b dev $ git commit -a -m 'Added __abs__() method to Vector class' ``` </p></details>
-
Go back to the
master
branch and add the following method to theVector
class:```python def __abs__(self): c = 0 for x in self.elements: c += x ** 2 return c ** 0.5 ``` Commit the change and add an appropriate message. <details><summary>Answer</summary><p> Commands: `$ git checkout -b master` `$ git commit -a -m 'Added __abs__() method to Vector class'` </p></details>
-
Merge the first branch into the
master
branch and solve the merge conflict.<details><summary>Answer</summary><p> The merge command `$ git merge dev` will result in a merge conflict that can be solved by opening the `vector.py` file, and removing the version of the `__abs__()` method you don't like. Then, commit the new version, like this: `$ git commit -a -m 'Merged dev branch and solved merge conflict.'` </p></details>
-
Delete the
dev
branch<details><summary>Answer</summary><p> Use the following command: `$ git branch -d dev` </p></details>
So-far, the repository has been on your PC, i.e. it was a local repository. You can move your repository to GitHub or any other remote service for hosting repositories (the TU/e has its own Git hosting service available at https://gitlab.tue.nl). If you want you can try it on GitHub by making a free account. In this final section, we are going to simulate a remote repository to convey the basics of pushing to remote repositories, but it will be difficult to cover all facets without actually having a project on which you collaborate with other people. In general, you will learn most by using it in practice.
To simulate a remote repository, make a new empty folder called vector_example
, in a different location than where you would put your repositories normally. If you can't think of a good location, use your Desktop or Downloads folder. Navigate to this empty folder in the terminal, and type the following command:
$ git init --bare
This will make a bare repository. Bare repositories are usually hosted on a server. They are not meant for putting code in directly. They are only for storing a repository at a central place, such that anyone can edit and make commits to this repository. You can pretend that this repository lives on a drive in the cloud (like on GitHub).
Cloning means to make an exact copy of a remote repository, including all previous commits. The corresponding command is git clone
. For repositories on GitHub, such as the one on which this tutorial is stored, you can directly paste the link to the remote repository after git clone
. For example,
$ git clone https://github.com/tueimage/essential-skills
will copy the repository to your computer.
We can do something similar with the simulated 'remote' repository on your computer. Again, it is important you execute the following in a different folder than where you put the bare repository:
$ git clone /path/to/bare/repository/vector_example
Cloning into 'vector_example'...
warning: You appear to have cloned an empty repository.
done.
Naturally, this warning will not appear when you clone from an existing repository on GitHub. The repository you have cloned is on your PC but leads a double live on the (simulated) remote repository, where other people can change the contents.
Let's add the vector.py
file from the previous sections, and commit this file:
$ git add vector.py
$ git commit -m 'Added vector.py file'
Now, this file is added to the local version of the repository, which means it is not on the (simulated) remote version. Let's change it there as well:
$ git push origin
Counting objects: 3, done.
Writing objects: 100% (3/3), 201 bytes | 201.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To vector_example
* [new branch] master -> master
origin
is a reference to the origin of the repository, i.e. where you cloned it from. The output shows that the changes in the master branch have been pushed to the remote repository. In fact, only the changes in the master branch are updated using git push
. If you want to push different branches, you first need to check these out. Alternatively, you can also specify which branch should be pushed, for example by doing
$ git push origin feature1
The output will show how many changes have been made and to which files.
So, how does this work? The remote repository is also stored on your own computer as separate branches. For example, the master
branch in the remote repository is stored on your PC as the origin/master
branch:
There are three representations of the repository: the remote repository (on GitHub for example), the representation of that remote repository on your PC, and the working directory. The remote repository has a new commit made by someone else.
You can inspect these branches by running
$ git branch -r
where r
stands for 'remote'. You can quit from this view by pressing q. You will see that the following branches are there:
origin/HEAD -> origin/master
origin/master
These remote branches should be explicitly updated by you. You can do this by running
$ git fetch origin
which will download the exact contents of the remote repository to the origin/...
branches on you PC. If someone else has pushed changes to the repository that are not yet on your PC, git fetch
will get them to you.
git fetch will update the local representations on your PC.
However, these changes are not in your working tree (i.e. the local versions of the branches, e.g. master
). To establish that, you need to merge the remote branches into your local branches, simply by using git merge
:
$ git checkout master
Switched to branch master
Your branch is three commits behind 'origin/master'
$ git merge origin/master
The result will look like this:
git merge origin/master merges the changes in the remote repository into your own working working tree.
If your collaborators have pushed changes to the repository before you are pushing your changes, there will be an error like this
$ git push origin
! [rejected] master -> master (non-fast forward)
error: failed to push some refs to 'vector_example'
The 'non-fast forward' message means that the changes you made cannot just be applied to the remote repository. You will first need to fetch the changes your collaborators made and merge them with your local branch.
$ git fetch origin
$ git checkout master
Switched to branch master
Your branch is three commits behind 'origin/master'
$ git merge origin/master
... Solve merge conflicts here ...
$ git push origin
In effect, this is not much different from using branches on your local repository. To get a good grasp of using remote repositories is to use them in practice. You can use this tutorial as a reference for the basic commands.
To create a new branch on the remote repository, you can simply create the branch on your local version and push like normal. The new branch will automatically appear on the remote repository.
To delete a remote branch, you need to issue an extra command on top of git branch -d branch_name
, namely
$ git push origin --delete branch_name
The reason for this, is extra safety. It can happen that someone else is working on the branch you just haphazardly deleted, which is why you need to explicitly push branch deletions to the remote repository.
This tutorial was written using the following references, which are useful to learn more advanced applications of Git:
- Pro Git * The go-to reference for Git
- Learn Git Branching * An interactive tutorial on advanced Git branching with diagrams
This tutorial used the command line to use Git, however there are many GUI applications that you can use as well:
- Sublime Merge * Fully functional trial can be used indefinitely for free, available for Windows, macOS, and Linux
- SourceTree * Free, available for Windows and macOS
- GitKraken * Free, requires account, available for Windows, macOS, and Linux
- GitHub Desktop * Free, requires Github account, available for Windows and macOS