Up To Schedule - Back To Mobility: Using Version Control at Work and Home
Based on material by Katy Huff, Anthony Scopatz, Sri Hari Krishna Narayanan, and Matt Gidden
This section will outline an exercise to get your feet wet in using some of GitHub's features. We'll be continuing our work on testing as an example.
For the rest of this section, I'll assume that there are two collaborators, Alpha and Beta. I'll assume that they have super-easy GitHub names, and that their repositories are at github.com/alpha and github.com/beta.
Let's start off by relocating back to the original simplestats repository.
$ cd ~/simplestats
To put this in more realistic terms, imagine that the upstream repository
(UW-Madison-ACI) is managed by your PI and the alpha and beta forks are students
working on a project, tasked with implementing some stats functions. Like good
SWC followers, we'll be working in a branch, called median
, which I will now
create. Once I have, update your local copies and remotes:
$ git fetch upstream
$ git checkout median
$ git push origin median
Step 1 : Group up in pairs
Step 2 : Add your collaborator as a remote and check to make sure you're connected, e.g., Beta would type the following
$ git remote add alpha https://github.com/alpha/simplestats
$ git remote -v
origin https://github.com/YOU/simplestats (fetch)
origin https://github.com/YOU/simplestats (push)
upstream https://github.com/UW-Madison-ACI/simplestats (fetch)
upstream https://github.com/UW-Madison-ACI/simplestats (push)
alpha https://github.com/alpha/simplestats (fetch)
alpha https://github.com/alpha/simplestats (push)
$ git fetch alpha
and Alpha would type
$ git remote add beta https://github.com/beta/simplestats
$ git remote -v
origin https://github.com/YOU/simplestats (fetch)
origin https://github.com/YOU/simplestats (push)
upstream https://github.com/UW-Madison-ACI/simplestats (fetch)
upstream https://github.com/UW-Madison-ACI/simplestats (push)
beta https://github.com/beta/simplestats (fetch)
beta https://github.com/beta/simplestats (push)
$ git fetch beta
From GitHub's website, a pull request
lets you tell others about changes you've pushed to a GitHub repository. Once a pull request is sent, interested parties can review the set of changes, discuss potential modifications, and even push follow-up commits if necessary.
For Beta:
Step 1 : Modify the stats.py module to add the median function (shown below).
def median(vals):
vals.sort()
z = len(vals)
index = z / 2
if z % 2 == 0:
return mean([vals[index], vals[index - 1]])
else:
return vals[index]
Step 2 : Commit your changes
$ git add stats.py
$ git commit -m "I added a median function."
Step 3 : Update your remote
$ git push origin median
Step 4 : Issue a Pull Request to Alpha's median
branch
- Go to your remote's page (github.com/beta/simplestats)
- Click Pull Requests (on the right menu) -> New Pull Request -> Edit
- choose the base fork as alpha/simplestats, the base branch as median, the head fork as beta/simplestats, and the compare branch as median
- write a descriptive message and send it off.
For Alpha:
Step 1 : Review the pull request
- Is the code clear? Does it need comments? Is it correct? Does something need clarifying? Feel free to provide in-line comments. Beta can always update their version of commits during a pull request.
Step 2 : Merge the pull request using the merge button
Step 3 : Update your local repository. At this point, all the changes exist only on the remote repository.
$ git checkout median
$ git fetch origin
$ git merge origin/median
Ok, so we've successfully issued a pull request and merged the updated code base. Let's swap the roles of pull requester and reviewer. This time, Alpha will add some tests to the median function.
For Alpha:
Step 1 : Modify the test_stats.py module to add tests for the median function.
Now continue the exercise as was done previously with roles swapped.
Step 2 : Commit your changes
$ git add test_stats.py
$ git commit -m "I added tests to the median function."
Step 3 : Update your remote
$ git push origin median
Step 4 : Issue a Pull Request
- Go to your remote's page (github.com/beta/simplestats)
- Click Pull Requests (on the right menu) -> New Pull Request -> Edit
- choose the base fork as beta/simplestats, the base as median, the head fork as alpha/simplestats, and the compare as median
- write a descriptive message and send it off.
For Beta:
Step 1 : Review the pull request
- Is the code clear? Does it need comments? Is it correct? Does something need clarifying? Feel free to provide in-line comments. Alpha can always update their version of commits during a pull request.
Step 2 : Merge the pull request using the merge button
Step 3 : Update your local repository
$ git checkout median
$ git fetch origin
$ git merge origin/median
This is the trickiest part of version control, so let's take it very carefully.
Alpha and Beta have made changes to that file in sync with each other. What happens if the PI (upstream) also makes changes on the same lines? A dreaded conflict...
Now, I will assume the roll of PI. Instead of waiting around for my grad students to finish their work, let's say that I decided to take my own stab at the median function (implemented poorly..). I'll add something to stats.py and push it to the upstream repository. Sadly, this addition overlaps with your recent median addition.
Step 1 : Experience the Conflict
$ git fetch upstream
$ git merge upstream/median
remote: Counting objects: 2, done.
remote: Total 2 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (2/2), done.
From [email protected]:UW-Madison-ACI
d063879..90fbb5e median -> upstream/median
Auto-merging stats.py
CONFLICT (content): Merge conflict in stats.py
Automatic merge failed; fix conflicts and then commit the result.
Now what?
Git has paused the merge. You can see this with the git status
command.
On branch median
Your branch and 'upstream/median' have diverged,
and have 1 and 1 different commit each, respectively.
(use "git pull" to merge the remote branch into yours)
You have unmerged paths.
(fix conflicts and run "git commit")
Unmerged paths:
(use "git add <file>..." to mark resolution)
both modified: stats.py
If you open your stats.py file, you'll notice that git has added some strange characters to it. Specifically, you'll see something like:
<<<<<<< HEAD:stats.py
** your version of the code **
=======
** upstream's version of the code **
>>>>>>> upstream:stats.py
Now, your job is to determine how the code should look. For this example, that
means you should replace the PI's median
function with yours.
Step 1 : Resolve the conflict by editing your stats.py file. It should run as expected and should look exactly like your version, but with the PI's changes included.
Step 2 : Add the updated version and commit
$ git add stats.py
$ git commit -m "Updated from PI's commit"
$ git push origin median
Let's take a look at Issues and Milestones, both of which are great project planning tools.
Repeat the median function exercise with a mode function. You might find the defaultdict container useful -- it provides default values for key-value pairs! Here's an example of its use.
In [1]: from collections import defaultdict
In [2]: number_frequencies = defaultdict(int)
In [3]: number_found = 42
In [4]: number_frequencies[number_found]
Out[4]: 0
In [5]: number_frequencies[number_found] += 1
In [6]: number_frequencies[number_found]
Out[6]: 1
You might also ask how to get the maximum value in a python dictionary. Here's one way.
In [6]: max_counts = max(number_frequencies, key=number_frequencies.get)
In [7]: max_counts
Out[7]: 42
It works great, right? Maybe we should add a test for bimodal distributions...
Gitolite is a way for you to host your own multi-user git repositories. I'm not going to go into details here, but all you need is a machine with some drive space and network access. You can install minimal ubuntu, then sudo apt-get install gitolite will pull in everything you need. At that point, your collaborators will only need to send you their public ssh keys for you to configure pull and push access to the repos.
Feel up to testing all of your skills? Check out this excellent website. We haven't taught you all the things you'll need to progress through the entire exercise, but feel free to take a look and try it out!
Feel free to read this blog post, which talks about the usefulness of version control in scientific work. Furthermore, how many people use Google Drive to get work done, especially in a collaborative mode? It turns out that the Drive app includes version control as one of its features, albeit in a limited mode (you can't really control how often it commits, nor can you give messages to your commits). Here's an example.
Take the time to do a little background reading. There are arguments for and against sharing your code. Why would you, personally, choose to do so or not?
Up To Schedule - Back To Mobility: Using Version Control at Work and Home