Skip to content

Latest commit

 

History

History
267 lines (197 loc) · 7.09 KB

git-archaeology.mkd

File metadata and controls

267 lines (197 loc) · 7.09 KB

name: inverse layout: true class: center, middle, inverse


Archaeology with Git

Radovan Bast

Licensed under CC BY 4.0. Code examples: OSI-approved MIT license.


layout: false

Archaeology with Git

  • Inspecting commits
  • Finding out who introduced a modification
  • Finding out when a modification has been introduced
  • Grepping code
  • Finding removed code
  • Finding a developer to talk to (large projects)
  • Going back in time
  • Finding out when something broke
  • Finding commits not merged upstream

Inspecting commits

  • At any moment we can inspect individual commits
$ git show 49dc419

commit 49dc419c8a44051cfe7826b85ee0a23e5faf3975
Author: Radovan Bast <[email protected]>
Date:   Sat Nov 22 15:33:15 2014 +0100

    do not recompute powers

diff --git a/triangle.py b/triangle.py
index cc52fe2..fa35eab 100644
--- a/triangle.py
+++ b/triangle.py
@@ -6,7 +6,10 @@ m = int(sys.argv[1])

 # loop over all a < b < c <= m
 for c in xrange(1, m+1):
+    cp = c*c
     for b in xrange(1, c):
+        bp = b*b
         for a in xrange(1, b):
-            if a*a + b*b == c*c:
+            ap = a*a
+            if ap + bp == cp:
                 print("(%i, %i, %i)" % (a, b, c))
  • We see that the start of the hash is enough if it is unique

Finding out who introduced a modification and when

  • git blame is a fun and useful command to find out when a specific line got introduced and by whom
  • This is important to judge the impact of a bug (When was it introduced? For how long has this bug been around? Has this bug been released?)
$ git blame <filename>

faa4617a (Radovan Bast      2012-07-02 11:04:45 +0200) ! get the one electron Hamiltonian
e564cfe8 (A. J. Thorvaldsen 2012-12-25 00:59:21 +0100) H1 = 0*S
e564cfe8 (A. J. Thorvaldsen 2012-12-25 00:59:21 +0100) call mat_ensure_alloc(H1, only_alloc=.true.)
faa4617a (Radovan Bast      2012-07-02 11:04:45 +0200) call interface_scf_get_h1(H1)
faa4617a (Radovan Bast      2012-07-02 11:04:45 +0200)
e6cfa2cf (Radovan Bast      2012-03-01 10:06:39 +0100) ! get the two electron contribution (G) to Fock matrix
e564cfe8 (A. J. Thorvaldsen 2012-12-25 00:59:21 +0100) G = 0*S
e564cfe8 (A. J. Thorvaldsen 2012-12-25 00:59:21 +0100) call mat_ensure_alloc(G, only_alloc=.true.)
810e14d8 (Radovan Bast      2012-07-01 17:19:56 +0200) call interface_scf_get_g(D, G)
761d5c27 (Radovan Bast      2012-05-18 16:28:34 +0200)
e6cfa2cf (Radovan Bast      2012-03-01 10:06:39 +0100) ! Fock matrix F = H1 + G
761d5c27 (Radovan Bast      2012-05-18 16:28:34 +0200) F = H1 + G
  • "Who the %&!@!!! wrote this?!?"
  • In 90% of cases you yourself
  • It is useful for projects with 20+ people where you do not have complete overview

Grepping code

$ git grep fixme

densfit/df_vxc.F:!fixme call XC integrator
dirac/dirgrd.F:!fixme: there is some issue with one-step - investigate later
dirac/dirrdn.F:!       fixme num grid report should be independent of DFT
dirac/dirscf.F:        !fixme: michal for InteRest
dirac/dirscf.F:          !fixme: michal for InteRest
dirac/dirscf.F:          !fixme: michal for InteRest
embedding/get_vemb_mat.F:!fixme irep can be < 0 in NONREL runs
eri/eri2out.F:!fixme
eri/eri2out.F:!fixme
eri/eri2out.F:!fixme
gp/gpkrmc.F:C     fixme: dette ser ikke ud til at virke for complex groups.
interest/module_interest_hrr.f90:  !-- fixme --!
interest/module_interest_interface.F90:    !> fixme: contracted basis sets
  • Greps entire repository below current directory
  • Super fast

Finding removed code

  • There used to be a line containing "break"
  • Now it is gone
  • We can figure out when it disappeared
$ git log -c -S'break'

commit 41997dffb4c5dd47c81adb1fd1b616e72415e6e5
Author: Radovan Bast <[email protected]>
Date:   Sun Nov 23 22:44:31 2014 +0100

    remove line

commit bf39f9d3f2b2867e1c66d58d3c4a32842c7f80d5
Author: Radovan Bast <[email protected]>
Date:   Sat Nov 22 15:33:37 2014 +0100

    break loop a if triple found

Finding removed code

$ git show 41997df

commit 41997dffb4c5dd47c81adb1fd1b616e72415e6e5
Author: Radovan Bast <[email protected]>
Date:   Sun Nov 23 22:44:31 2014 +0100

    remove line

diff --git a/triangle.py b/triangle.py
index c693105..fa35eab 100644
--- a/triangle.py
+++ b/triangle.py
@@ -13,4 +13,3 @@ for c in xrange(1, m+1):
             ap = a*a
             if ap + bp == cp:
                 print("(%i, %i, %i)" % (a, b, c))
-                break # no need to continue loop a

Finding a developer to talk to (large projects)

  • Example: you want to know who implemented a specific keyword/functionality
  • git grep for the keyword
  • Then with git blame find the person who introduced it
  • With git show check the commit that introduced the change (it could be a file rename done by someone else)
  • In the latter case check out a version prior to the move and git grep and git blame there

Going back in time

  • We do not have to branch from the tip of the branch (HEAD)
  • We can branch from arbitrary (earlier) hash
$ git checkout -b <name> <hash>
  • This is the recommended mechanism to inspect old code (hash abc123):
$ git checkout -b museum abc123 # create branch called "museum" from hash abc123
  # do some archaeology ...
  # "what was I thinking back then!?"
$ git checkout master           # after you are done switch back to "master"
$ git branch -d museum
  • Branches help us to keep the repository organized

Finding out when something broke

  • Sometimes you realize that something broke
  • You know that it used to work
  • You do not know when it broke
  • First find out a commit in past when it worked
  • Then bisect your way until you find the commit that broke it
$ git bisect start
$ git bisect good 75c737cf5c9 # this is a commit that worked
$ git bisect bad HEAD         # last commit is broken
  # now compile and test
  # after that decide whether
$ git bisect good
  # or
$ git bisect bad
  # now compile and test
  # after that decide whether
$ git bisect good
  # or
$ git bisect bad
  # iterate until commit is found
  • This can even be automatized with git bisect run <script>
  • For this you write the script that returns zero/non-zero (success/failure)

Finding out when something broke

  • Example
$ git bisect start
$ git bisect bad HEAD
$ git bisect good 234aa131972cc11e8f4216dcc2b9d6dd0a76f8ea

Bisecting: 499 revisions left to test after this (roughly 9 steps)
[e811288ff07a13aa712532516720da6a81f19cb2] commit number 503
  • Then roughly 9 steps later we see
$ git bisect good

59507147fad403e63dd2252dae29fb842e67167f is the first bad commit
commit 59507147fad403e63dd2252dae29fb842e67167f
Author: Radovan Bast <[email protected]>
Date:   Fri Sep 5 00:23:24 2014 +0200

implement this and that

Finding commits not merged upstream

  • So you would like to know which commits from current branch have not been merged to <branch>
  • No problem
$ git cherry <branch>