Skip to content
Duke Manh edited this page Oct 20, 2021 · 57 revisions

Lab 5

Due Date

Friday October 15 by Midnight.

Overview

This week we are going to practice rewriting git history. We're also going to give our SSG project a bit of a clean-up, and start to refactor the code.

This lab will help you practice the following:

  • creating branches to work on evolving your code
  • refactoring existing code to make it easier to read, easier to maintain, and more modular
  • using git rebase to rewrite history on a branch
  • using git commit --amend to add things to a previous commit and update a commit message
  • using git merge

NOTE: it is highly recommended that you watch this week's video on git rebasing before you start this lab.

Refactoring

As code evolves and grows new features, the complexity of the code often grows with it. We add features and in doing so we are forced to create new code paths, functions, classes, etc. It's easy to start losing control of the code.

This process of quickly extending software without managing the quality of the code is known as technical debt. Just as we must borrow in order to buy a home, writing large software projects typically involves taking on debt, as we merge code that is good enough, but could be better if we had more time. Eventually, we'll need to pay down this technical debt, and fix our code, or risk going bankrupt!

Understanding Refactoring

Refactoring is a technique for improving the structure and maintainability of your code without altering its behaviour. We aren't adding new features or fixing bugs; we're improving the quality of the code itself.

Let's walk through a simple example of how we might refactor some code. Imagine the following program in file.js, which defines, analyzes, and prints some data:

// age group 0-19
let total1 = 15 + 16 + 17;
let average1 = (15 + 16 + 17) / 3;
console.log(`Case data for population age group 0-19: total=${total1} average=${average1}`);

// age group 20-39
let total2 = 56 + 2 + 67;
let average2 = (56 + 2 + 67) / 3;
console.log(`Case data for population age group 20-39: total=${total2} average=${average2}`);

Running our program produces the following output:

$ node file.js
Case data for population age group 0-19: total=48 average=16
Case data for population age group 20-39: total=125 average=41.666666666666664

The program works, but the code is not as good as it could be. For one thing, we've copy/pasted some code in order to go quickly, and our variable names are really bad. When we need to add some additional age groups, doing more "copy/paste" programming is only going to make things worse.

We really want to make this code easier to work with, easier to understand, and easier to extend. Our goal is to keep the output the same, but improve the readability, maintainability, and structure of the code.

Improving Code through Refactoring

There are all kinds of common refactoring patterns we could use, but let's focus on a couple of obvious improvements.

First, we're repeating code, which is going to make it harder to change or maintain (i.e., if we find a bug in one spot, we have to remember to go and fix it in all the other copy/pasted versions). It would be nice if we could reduce the amount of duplication we're seeing.

Second, our variable naming is terrible, and it's taking too long to figure out what each line is doing.

Let's refactor and improve our code by:

  1. extracting functions
  2. extracting classes to avoid duplication;
  3. renaming variables
  4. splitting the code into multiple files

Step 1: Refactoring with Git, Creating a Refactoring Branch

Because we'll be making significant changes to our code, we want to be careful not to break our current, working program. It might not be pretty, but at least it works!

Let's move our development off of the main branch (substitute master if that is your default branch) and create a separate and isolated refactoring branch:

# Make sure we have everything from GitHub in our local repo before we start
git checkout main
git pull origin main
# Create a new topic branch off of main
git checkout -b refactoring main

Now we can commit changes as we go without fear of destroying our current program. If at any time we need to quickly re-run our working program, we can git checkout main and everything will still be there.

Step 2: Extracting a Function

Our program is calculating averages in multiple places and repeating the same code each time. This is error prone, since we could make a mistake in one of the calculations. Also, it's impossible to test our program's logic without manually testing every single instance of our average calculation (e.g., two of them could work, but a third could be wrong).

Let's extract the average functionality into its own function, which we can call in multiple places:

// Calculate the average for a set of numbers
function average(...numbers) {
  let total = 0;
  let count = numbers.length;
  for(let i = 0; i < count; i++) {
    total += numbers[i];
  }
  return total / count;
}

// age group 0-19
let total1 = 15 + 16 + 17;
let average1 = average(15, 16, 17);
console.log(`Case data for population age group 0-19: total=${total1} average=${average1}`);

// age group 20-39
let total2 = 56 + 2 + 67;
let average2 = average(56, 2, 67);
console.log(`Case data for population age group 20-39: total=${total2} average=${average2}`);

This is getting better. Instead of repeating our average calculation over and over, we're now able to do it once in a single function. The name of the function helps us understand its purpose, and if we get the calculation right, it will work perfectly every time we call it.

After testing that everything still works, we commit to our refactoring branch:

$ git add file.js
$ git commit -m "Extract average() function"

Step 3: Extracting a Second Function

Our first refactoring step revealed that we're actually repeating ourselves in multiple places: our average() function requires a total, which we also need to calculate for each age group.

Luckily, we've already written the code for calculating a total, and can extract that too into a reusable function, which will simplify our average() implementation as well:

// Calculate the total for a set of numbers
function sum(...numbers) {
  let total = 0;
  let count = numbers.length;
  for(let i = 0; i < count; i++) {
    total += numbers[i];
  }
  return total;
}

// Calculate the average for a set of numbers
function average(...numbers) {
  return sum(...numbers) / numbers.length;
}

// age group 0-19
let total1 = sum(15, 16, 17);
let average1 = average(15, 16, 17);
console.log(`Case data for population age group 0-19: total=${total1} average=${average1}`);

// age group 20-39
let total2 = sum(56, 2, 67);
let average2 = average(56, 2, 67);
console.log(`Case data for population age group 20-39: total=${total2} average=${average2}`);

This is even better. We only have a single function for calculating totals and averages, and once we know they work, they will work everywhere. If we find a bug in one of them, we can fix it once, and it will be fixed for the whole program.

After testing that everything still works, we commit to our refactoring branch a second time:

$ git add file.js
$ git commit -m "Extract sum() function, refactor average() to re-use sum()"

Step 3: Extracting a Class

One of the things we're noticing about the code is that the total and average are closely connected: we end needing one to calculate the other, and we're having to type the same data over and over again. Every time we type the same thing, or copy/paste code, it feels like it's too easy to make mistake. It seems like this data is really part of something larger, and it might make more sense to combine it together into a single class:

class Data {
  constructor(...numbers) {
    this.numbers = numbers;
  }

  // Calculate the total for the set of numbers
  sum() {
    let numbers = this.numbers;
    let total = 0;
    let count = numbers.length;
    for(let i = 0; i < count; i++) {
      total += numbers[i];
    }
    return total;
  }      

  // Calculate the average for the set of numbers
  average() {
    return this.sum() / this.numbers.length;
  }
}

// age group 0-19
let data1 = new Data(15, 16, 17);
console.log(`Case data for population age group 0-19: total=${data1.sum()} average=${data1.average()}`);

// age group 20-39
let data2 = new Data(56, 2, 67);
console.log(`Case data for population age group 20-39: total=${data2.sum()} average=${data2.average()}`);

This is great. We've cut in half the number of times we have to enter the lists of data, which reduces the chances we make a mistake. Now we're encapsulating our data in a single object that can manage the analysis without knowledge of the rest of the program.

After testing that everything still works, we commit to our refactoring branch a third time:

$ git add file.js
$ git commit -m "Refactor sum() and average() into a Data class"

Step 3: Code Splitting

Our program isn't huge, but it's getting a bit larger than it needs to be. Our new Data class is really something separate from our main program. We decide to break our file.js into two files: our main program in index.js; and our Data class in data.js:

data.js

class Data {
  constructor(...numbers) {
    this.numbers = numbers;
  }

  // Calculate the total for the set of numbers
  sum() {
    let numbers = this.numbers;
    let total = 0;
    let count = numbers.length;
    for(let i = 0; i < count; i++) {
      total += numbers[i];
    }
    return total;
  }      

  // Calculate the average for the set of numbers
  average() {
    return this.sum() / this.numbers.length;
  }
}

// Export our Data class
module.exports.Data = Data;

index.js

We need to rename file.js to index.js in git:

$ git mv file.js index.js

Now we can update its contents:

// Import our Data class
const { Data } = require('./data');

// age group 0-19
let data1 = new Data(15, 16, 17);
console.log(`Case data for population age group 0-19: total=${data1.sum()} average=${data1.average()}`);

// age group 20-39
let data2 = new Data(56, 2, 67);
console.log(`Case data for population age group 20-39: total=${data2.sum()} average=${data2.average()}`);

This is way better. Our program is now made up of a few smaller units, each of which is easy to read and stands on its own. Having programs split across multiple files also makes things easier when working with git, since it means there's less chance that two people will be working on the same line(s) in the same file(s).

After testing that everything still works, we commit to our refactoring branch a fourth time:

$ git add index.js data.js
$ git commit -m "Split file.js into index.js and data.js"

Step 4: Improving Names

There's one last obvious improvement that we can make: our variable names are terrible. Using names like data1 and data2 is often known as a "bad code smell." Something is "off" with this, you can just sense it, even if you aren't sure exactly why. Names that have to be looked up in order to know what they are don't work (quick: which age group does data2 store?).

Let's use better names for our data:

// Import our Data class
const { Data } = require('./data');

// age group 0-19
let ageGroupUnder19 = new Data(15, 16, 17);
console.log(`Case data for population age group 0-19: total=${ageGroupUnder19.sum()} average=${ageGroupUnder19.average()}`);

// age group 20-39
let ageGroup20to39 = new Data(56, 2, 67);
console.log(`Case data for population age group 20-39: total=${ageGroup20to39.sum()} average=${ageGroup20to39.average()}`);

We can argue about the best names to use here, but no matter what we settle on, we've made a big improvement over data2.

After testing that everything still works, we commit to our refactoring branch a fifth time:

$ git add index.js
$ git commit -m "Rename age group variables to be more clear"

Step 5: Rebasing and Squashing

Now that we've finished our refactoring, let's get ready to merge this work into main. However, before we do, let's squash everything into a single commit. It took us 5 commits to refactor this code, but in our mind, the end product is really one single thing: we don't care about the steps it took to get here, just the end result.

$ git rebase main -i

This will start an interactive rebase, opening our editor, and allowing us to specify what to do with each commit:

pick 4710009 Extract average() function
pick 0c01069 Extract sum() function, refactor average() to re-use sum()
pick fd8e932 Refactor sum() and average() into a Data class
pick f9d85bf Split file.js into index.js and data.js
pick bfac3d3 Rename age group variables to be more clear

# Rebase ac38daf..bfac3d3 onto ac38daf (2 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup <commit> = like "squash", but discard this commit's log message
# x, exec <command> = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
# .       create a merge commit using the original merge commit's
# .       message (or the oneline, if no original merge commit was
# .       specified). Use -c <commit> to reword the commit message.
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#

We want to squash the last 4 commits (squash) into the first (pick):

pick 4710009 Extract average() function
squash 0c01069 Extract sum() function, refactor average() to re-use sum()
squash fd8e932 Refactor sum() and average() into a Data class
squash f9d85bf Split file.js into index.js and data.js
squash bfac3d3 Rename age group variables to be more clear

After we save the file, git will create a new commit that uses the changes from all 5 commits in a single commit. The log message will include the combined log messages of all 5 commits.

Try running git log to confirm that it worked.

Step 5: Amending a Commit

Just as we finish our last step, we realize that our commit message wasn't quite right. We really want to change it before we merge. To do that, we'll do an amended commit, which will change the previous commit instead of making a new one:

$ git commit --amend

This will open your editor again with the previous log message, which we change like so:

Refactoring file.js to improve code maintainability:

  * Extract average() function
  * Extract sum() function, refactor average() to re-use sum()
  * Refactor sum() and average() into a Data class
  * Split file.js into index.js and data.js
  * Rename age group variables to be more clear

We save and the commit is updated with the new message. Once again, check that it worked with git log and git show to see the changes.

NOTE: you can also use git --amend --no-edit if you want to update the previous commit to add/change something, but don't want to change the commit message (--no-edit means leave the commit message as it is).

Step 6: Merging our Work

Our refactoring branch is finished, and we're ready to merge to main:

git checkout main
git merge refactoring
git push origin main

Our main branch should now include the changes from refactoring in a single commit.

Your Turn to Refactor!

Based on the example refactoring walk-through you just read, it's your turn to improve some code. You are asked to take your SSG repo and refactor it to improve the code's structure, readability, modularity, and maintainability.

Use a similar method to the walk-through above:

  1. create a refactoring branch off of your latest main
  2. look through your code to find at least 3 improvements you could make. For example:
    1. get rid of global variables
    2. use functions instead of variables to calculate things vs. storing them
    3. separate command-line argument parsing from file parsing from HTML generation from output
    4. improve variable naming
    5. reduce code duplication through shared functions, classes, etc
    6. talk to your friends to see if they have ideas of other improvements you could make
  3. each refactoring step should be done in a separate commit. Test your code before you commit changes and don't commit anything that breaks the program. Use good commit messages to describe what you did
  4. each refactoring step can build on the previous one, but don't mix them (e.g., don't solve two separate problems in a single refactoring commit, use two commits instead).
  5. when you are done, do an Interactive Git Rebase to squash all of your refactoring commits into a single commit
  6. use an Amended Git Commit to update the commit message to make it more clear
  7. do a git merge on your main branch to integrate your refactoring branch's changes
  8. do a git push origin main to share your changes publicly on GitHub

We will be writing automated tests against this code later, so spend some time refining and improving your code as much as you can.

Write a Blog Post

Write a blog post about the process of refactoring your code. What did you focus on in your improvements? How did you fix your code so it would be better in each step? How did your interactive rebase go? Did you find any bugs in your code while you did this? Did you break your program while changing things? How did it go using Git to change your project's history?

Submission

When you have completed all the requirements above, please add your details to the table below.

Name Blog Post (URL) Single commit on main for merged refactoring work
Example Name https://example.com/blog/1 ddeaf180
Tuan Phong Nguyen https://blog.andrewnt.dev/post/osd-600-series-rebase 9c07965
Trang Nguyen https://tracy016.medium.com/osd600-refactor-and-rebase-208c66b195ca 8fac32
Anatioliy Serputov https://medium.com/@aserputov/osd-lab-b5f0c62ee13 dc62d33
Minsu Kim https://dev.to/mkim219/code-refactoring-3577 970cb1b
Ahmad Rokay https://dev.to/ar/refactoring-and-rebasing-lab-5-400k 0ffa06c5
Gus McCallum Blog Commit
Chi Kien Nguyen https://dev.to/kiennguyenchi/rebase-squash-amend-in-git-20ih Commit
Thanh Cong Van https://dev.to/tcvan0707/refactoring-my-code-git-rebase-4lod 0173054
Minh Quan Nguyen https://dev.to/mqnguyen/refactoring-rebasing-go0 f42e5b6
Tuan Thanh Tan https://dev.to/tuanthanh2067/rewriting-history-in-git-428f Commit
Kunwarvir Dhillon https://dev.to/dhillonks/refactoring-existing-code-and-rebasing-5ae b406d2f
Minh Hang Nguyen https://dev.to/minhhang107/code-refactoring-and-rebase-1cnp 2e0b2e9
Joshua Li https://dev.to/jli/refactoring-lennah-ssg-141b fcb95d9
Xiongye Jiang https://dev.to/derekjxy/refactoring-with-git-4cn4 48515f1
Reza Poursafa https://medium.com/@rezapscodes/osd600-rebase-squash-and-git-ammend-8b6b09437ea5 b84b79b
James Mai https://dev.to/beamazedvariable/week-6-refactoring-and-git-rebase-1ef7 6f20f71
Vivian Vu https://dev.to/vivianvu/osd600-refactoring-my-ssg-5d7l 92e007d
Le Minh Pham [DPS909] Refactoring My SSG Code Commit
Japneet Singh Kalra Refactoring 2271269
Andre Willomitzer Code Custodian 4dfe023
Andrew Tassone Refactoring Commit
Luigi Zaccagnini Refactoring Code in Octo! 4372ad8
Eugene Chung Refactoring Code cmd-ssg 6a0e543
Gustavo Tavares Blog Post 7cd99f5f
Francesco Menghi https://dev.to/menghif/refactoring-and-git-rebase-245o 4a009bb
Jia Hua Zou https://medium.com/@jhzou/git-refactor-and-rebase-78e8e5495e19 207e0d5
Roxanne Lee https://medium.com/@Roxanne.Lee/refactoring-and-git-rebase-d3d37c5387dd 6b5a1b1
Kevan Yang https://dev.to/pandanoxes/lab-5-refectory-code-and-git-rebase-359d b092da9
Leyang Yu https://dev.to/lyu4321/refactor-and-rebase-5ad2 fc14895
Roman Rezinkin https://dev.to/romanrezinkin/dps909-lab-5-11hb d7fba98
Suhhee Kim https://dev.to/suhhee1011/refactoring-the-codes-4hgj 8690eb4
Oliver Pham https://dev.to/oliverpham/how-i-refactored-my-code-ibf 22003aa
Jun Song My Blog 10b5254
Jiyun Jung https://dev.to/jjung99/improving-code-through-refactoring-58p9 a9a2b5e
Mizuho Okimoto Open Source: Rewriting git history(amend/rebase) 9a1ea54
Amasia Nalbandian Five birds one stone 2a95ec1
Tengzhen Zhao My blog f37b03e
Antonio Bennett Blog 239c5ed7
Tue Nguyen Blog 949bf80
Andrei Batomunkuev Refactoring & Working with Git Rebase 3196947
Emily Phan Blog c25f78
Irene Park Blog Post 63ee4eb
Qiwen Yu Blog 85a4e67
Alex Romanova https://dev.to/sirinoks/git-rebase-amend-and-refactoring-3m9b 8d5ff17
Diana Belokon blog merge
Hung Nguyen blog merge
Duc Bui Manh blog c0cb195