Skip to content

Latest commit

 

History

History
638 lines (453 loc) · 12.3 KB

slides.asciidoc

File metadata and controls

638 lines (453 loc) · 12.3 KB

Git Internals

Simple Single File Model

Basic Requirements

  • Store each revision of the project.

  • Retrieve the state of the project at a specified revision

Single File Model

proj/
+ .git/
| + objects/
| | + file.txt.v1
| | + file.txt.v2
| | + file.txt.v3
| + master
|
+ file.txt
  • Repository has directory called objects

  • When user wants to take a snapshot: copy file.txt into objects

  • The filename is suffixed with revision no. of the file

Single File Model (Contd.)

proj/
+ .git/
| + objects/
| | + file.txt.v1
| | + file.txt.v2
| | + file.txt.v3
| + master
|
+ file.txt
  • When a new version is added the file will have the suffix .v4

  • master holds the name of the latest revision

DVCS

dvcs

Distributed versions control systems allows each user to create commits, in their local repo.

SHA1

  • Idea: Use SHA1 sum of the content to identify the file.

  • SHA1 is similar to checksum algorithms: but 20-bytes long

  • Interesting property: no two files will have the same checksum. (very low probability)

  • Depicted as a 40 digit hex number: 4c7be7b2c3641a5e489c4ce667699eeee4e994c9

  • Only first 7 digits are shown: 4c7be7b

SHA1 Demo

$ echo "Hello World" > test.txt
$ sha1sum test.txt
648a6a6ffffdaa0badb23b8baf90b6168dd16b3a  test.txt

$ echo "Hallo World" > test.txt
$ sha1sum test.txt
c54c65218154f15c32ca252946786e0ad09aa99b  test.txt

Using SHA1

dvcs sha1

SHA1 Single File Model

Storing file using SHA1

proj/
+ .git/
| + objects/
| | + 4c7be7b...
| | + c9360f5...
| | + da39a3e...
| + master
|
+ file.txt
  • Using SHA1 avoids collision

  • But the ordering of the commits is lost

  • Latest revision is in master

  • But previous revision is not known!

  • No meta information about the changes: who? why? when?

Ordering Lost

no commit objects

Ordered Single File Model

Requirement

Add support for storing order of commits.

Commit Object

  • Regular files stored in the repo are called "blob"s

  • Objects representing commits are stored in the repo: commit object

  • Each commit object gets its own SHA1

  • Contains the following information

    • log message

    • author name

    • time of commit

    • SHA1 of the blob, that belongs to the commit

    • SHA1 of the parent commit object

Ordering Regained

commit objects

Blobs + Commit Objects

proj/
+ .git/
| + objects/
| | + 4c7be7b... # B
| | + c9360f5... # B
| | + da39a3e... # B
| | + 5d7be7c... # C
| | + da470f6... # C
| | + cb2aa3f... # C
| |
| + master
|
+ file.txt
  • Repo stores commit objects and blobs.

  • The master contains SHA1 of the commit object, now.

Detached Head Model

Requirement

Add support for tracking current checkout revision.

Which is my head?

  • If latest revision is checked out, the model works fine

  • If an older revision is checked out, we need to keep track of it

  • This is required for example, when a diff is done

Tracking Head

proj/
+ .git/
| + objects/
| | + 4c7be7b...
| | + c9360f5...
| | + da39a3e...
| | + 5d7be7c...
| | + da470f6...
| | + cb2aa3f...
| |
| + master
| + HEAD
|
+ file.txt
  • master should always point to the latest revision

  • Expand our model, with a another file HEAD

    • Contains the text master, if working copy is the latest revision

    • Contains the SHA1SUM of the commit, otherwise

Multi-Branch Model

Requirement

Add support for tracking multiple branches.

Storing Branches

proj/
+ .git/
| + objects/
| | + 4c7be7b...
| |   ...
| |
| + refs
| | + heads
| |   + master
| |   + year-fix
| |
| + HEAD
|
+ file.txt
  • A directory called refs/heads contains one file for each branch

  • Each file contains the SHA1 of the latest commit

  • HEAD contains

    • The path to the current branch

    • Or the SHA1 of a commit, if detached

Storing Branches (Contd.)

proj/
+ .git/
| + objects/
| | + 4c7be7b...
| |   ...
| |
| + refs
| | + heads
| |   + master
| |   + year-fix
| |
| + HEAD
|
+ file.txt
  • HEAD specifies the branch to advance when a commit is done

  • Branches are extremely light weight.

  • Branch creation: creating a file, with the SHA1 of commit.

Merge Commit Model

Requirement

Add support for tracking branch merges.

Branch Merges

merge commit
  • When two branches that have diverged are merged, a merge commit is created

  • Unlike regular commit objects, the merge commit will have more than one parent

  • The commit objects, will store the SHA1 of both the parent commits

Multi-File Model

Requirement

Add support for handling multiple files.

Multiple Files

tree
  • Multiple files are present in the project

  • Each file is stored in the object database just as before

  • Another object called the Tree object, is used to group the files

  • The Tree object, contains the filenames and their SHA1s

  • Commit object refers to the Tree object

Hierarchy of Files and Directories

proj/
  a.txt
  b.txt
  mydir/
    c.txt
    d.txt
  • Tree object can be used recursively to store a hierarchy of files and directories

  • A single SHA1 specifies the state of entire project hierarchy

Hierarchy of Files and Directories

sub tree
  • Tree object can be used recursively to store a hierarchy of files and directories

  • A single SHA1 specifies the state of entire project hierarchy

Space Savings

duplicate files
  • SHA1 of files with duplicate content is the same

  • Duplicate files are stored only once

Space Savings Across Commits

duplicate files across commits

Quiz

Which of the following objects are stored in the object database?

  • [A] Blobs

  • [B] Tree Objects

  • [C] Commit Objects

  • [D] Branch Objects

Quiz

Which of the following are True?

  • [A] Each object in the object database is identified by SHA1 of its contents.

  • [B] Commit objects have a reference to their previous commit

  • [C] Commit objects have a reference to a tree object, corresponding to the top most directory in the project.

  • [D] Tree object have references to files / directories present with the the corresponding directory

Quiz

Which of the following are True?

  • [A] If the contents of the file are the same, it is stored only once in the object database.

  • [B] The probability of the two different files having the same SHA1 is very very low.

  • [C] The HEAD file points to the latest commit.

  • [D] On commit, the branch file is updated to point to newly created commit.

Exploring a Git Repo

Demo: Initial Contents

$ mkdir proj
$ cd proj
$ git init
$ find .git
  • Does the directory structure look familiar?

Demo: Exploring a Commit

$ echo "This is a simple hello world file." > hello.txt
$ git add hello.txt
$ git commit -a -m "Added hello.txt."
$ find .git
  • Use git show --pretty=raw <sha1> to view the commit object

  • Use git ls-tree <sha1> to view the tree object

  • Use git show to view the blob

Demo: Handling Duplicates

  • Create a copy of hello.txt and commit it

    $ cp hello.txt world.txt
    $ git add world.txt
    $ git commit -a -m "Added world.txt."
  • How many objects are there in the object database?

    • [A] 3

    • [B] 4

    • [C] 5

    • [D] 6

Demo: Branches

  • Find out the current commit SHA1, using git log

  • Create a branch called my-branch

    $ git checkout -b my-branch
  • Verify that a branch has been using the find command

  • Verify the file .git/refs/heads/mybranch, contains the current commit SHA1

  • Verify .git/HEAD points to the branch

Demo: Merges

  • Create commit on the branch

  • Checkout master and create another commit

  • Merge the branch

  • Verify that the commit object has two parents

Staging

Tangled Working Copy Problem

  • You are working on Chapter 2, of the book

  • Adding a paragraph at the beginning that provides an overview of the chapter

  • You are not happy with the changes made

  • It’s late into the night, you turn-off your computer and go to sleep

Tangled Working Copy Problem (Contd.)

  • You wake-up next morning, and review the contents of the book, while sipping your morning coffee

  • You notice that there is a spelling mistake in chapter-1.txt and chapter-2.txt.

  • The word "freedom" is spelled out as "fredom"

  • You immediately do a find and replace on the word and correct the spelling mistake

Tangled Working Copy Problem (Contd.)

  • You want to commit this spelling mistake fix, and you do a diff to verify before the commit

  • It is now that you realize that you were already working on a change, to Chapter 2.

  • You do not want to commit the paragraph that you added to Chapter 2, but you want commit the spelling fixes

Tip: This situation is what we call the "Tangled Working Copy Problem."

Staging

  • Git’s staging feature addresses exactly this problem

  • Changes to be part of the commit can be "staged"

  • And staged changes can be committed

Using Staging (1)

stage 1

Using Staging (2)

stage 2

Using Staging (3)

stage 3

Using Staging (4)

stage 4

Staging Commands

  • Files can be staged using git stage

    $ git stage chapter-1.txt
  • Staged files can be committed using, git commit, without the -a option.

    $ git commit

Staging Commands

  • View files that have been staged

    $ git status
  • Show the following categories of files

    • Files that have been staged

    • Files that have been changed but not staged

    • Files that are not tracked

Staging Hunks

  • chapter-2.txt has two unrelated changes in a single file

  • Specific hunks in a file can be staged using --patch option

    $ git stage --patch chapter-2.txt
  • The file to be committed will have only those changes

Staging Hunks (2)

  • For each modified hunk, Git will prompt

    Stage this hunk [y,n,q,a,d,/,K,j,J,g,e,?]?
    • y - stage the hunk

    • n - do not stage the hunk

    • q - do not stage this hunk, and any of the remaining ones

    • a - stage this hunk, and all later hunks in the file

    • s - split the hunk into smaller hunks

Unstaging Files and Hunks

  • Files can be unstaged using git reset

    git reset chapter-2.txt
  • Hunks in a file can be unstaged using

    git reset --patch chapter-2.txt

Three Diff Types

diff

Tip: git add == git stage

Tip: staging == index == cache

Try Out

  • Stage the changes corresponding to the spelling fixes

  • Verify the staged changes

  • Commit the staged changes

Quiz

Find the odd one out.

  • [A] cache

  • [B] staging

  • [C] index

  • [D] stash

Quiz

True or False:

  • [A] (git diff --cached) + (git diff) == (git diff HEAD)

  • [B] (git stage) == (git add)

  • [C] (git commit -a) == (git add + git commit)

  • [D] (git reset) == (git checkout)