Book: SE@Google Ch: Version Control and Branch Management #20

hhdqirui · 2023-01-20T07:12:23Z

Book: Software Engineering at Google
Chapter: Version Control and Branch Management

Summary:

Version Control

Version Control System (VSC) is a system that tracks revisions (versions) of files over time. It maintains some metadata about the set of files being managed, and collectively a copy of the files and metadata is called a repository. It helps coordinate activities of teams by allowing multiple developers to work on the same set of files simultaneously.

Why Is Version Control Important?

We can conceptualize VCS as a way to extend a standard filesystem. A filesystem is a mapping from filename to contents. A VCS extends that to provide a mapping from (filename, time) to contents, along with the metadata necessary to track last sync points and audit history. Version control makes the consideration of time an explicit part of the operation: unnecessary in a programming task, critical in a software engineering task.

In big tech companies, there would be people (new hires) who have little or no experience with code that is worked on by more than one person or for more than a couple weeks. Version control is solving a problem that programmers do not have such experiences or knowledge.

Additionally, version control helps us bridge the gap between single-developer and multi-developer processes. It allows us to scale up teams and organizations. Development is inherently a branch-and-merge process, both when coordinating between multiple developers or a single developer at different points in time. A VCS removes the question of “which is more recent?” Use of modern version control automates error-prone operations like tracking which set of changes have been applied. Version control is how we coordinate between multiple developers and/or multiple points in time.

VCS is also used for legal and regulatory purposes. VCS allows a formal record of every change to every line of code, which is increasingly necessary for satisfying audit requirements. When mixing between in-house development and appropriate use of third-party sources, VCS helps track provenance and origination for every line of code.

Centralized VCS vs. Distributed VCS

Centralized VCS

In centralized VCS implementations, the model is one of a single central repository. Although a developer can have files checked out and accessible on their local workstation, operations that interact on the version control status of those files need to be communicated to the central server. Any code that is committed by a developer is committed into that central repository.

The early centralized VCS focused on locking and preventing multiple simultaneous edits. If you wanted to edit a file, you might need to acquire a lock, enforced by the VCS, to ensure that only you are making edits. When you’ve completed an edit, you release the lock. This sort of simplistic locking has inherent problems with scale: it can work fine for a few people, but has the potential to fall apart with larger groups if any of those locks become contended.

Distributed VCS

A Distributed VCS (DVCS) world does not enforce the constraint of a central repository: if you have a copy (clone, fork) of the repository, you have a repository that you can commit to as well as all of the metadata necessary to query for information about things like revision history. The DVCS model allows for better offline operation and collaboration without inherently declaring one particular repository to be the source of truth.

Branch Management

Work in progress is akin to a branch. Uncommitted local changes aren’t conceptually different than committed changes on a branch.

A development branch (dev branch) is a halfway point between “this is done but not committed” and “this is what new work is based on.”

If the period between releases or the release lifetime for a product is longer than a few hours, it may be sensible to create a release branch that represents the exact code that went into the release build for your product. If any critical flaws are discovered between the actual release of that product into the wild and the next release cycle, fixes can be cherry-picked from trunk to your release branch.

The primary difference between a dev branch and a release branch is the expected end state: a dev branch is expected to merge back to trunk, and could even be further branched by another team. A release branch is expected to be abandoned eventually.

Version Control at Google

The “One-Version” Rule: Developers must never have a choice of “What version of this component should I depend upon?”

No Long-Lived Branches: development branches should be minimal, or at best be very short lived.

Monorepo: The monorepo approach has some inherent benefits, and chief among them is that adhering to One Version is trivial: it’s usually more difficult to violate One Version than it would be to do the right thing. However, monorepo approach is not the perfect answer for everyone. If every project in your organization has the same secrecy, legal, privacy, and security requirements, a true monorepo is a fine way to go. Otherwise, aim for the functionality of a monorepo, but allow yourself the flexibility of implementing that experience in a different fashion.

hhdqirui added the BookChapter label Jan 20, 2023

hhdqirui self-assigned this Jan 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Book: SE@Google Ch: Version Control and Branch Management #20

Book: SE@Google Ch: Version Control and Branch Management #20

hhdqirui commented Jan 20, 2023 •

edited

Loading

Book: SE@Google Ch: Version Control and Branch Management #20

Book: SE@Google Ch: Version Control and Branch Management #20

Comments

hhdqirui commented Jan 20, 2023 • edited Loading

Version Control

Why Is Version Control Important?

Centralized VCS vs. Distributed VCS

Centralized VCS

Distributed VCS

Branch Management

Version Control at Google

hhdqirui commented Jan 20, 2023 •

edited

Loading