Skip to content
This repository has been archived by the owner on Aug 16, 2018. It is now read-only.

Code metrics

Paul Buonopane edited this page Nov 18, 2013 · 1 revision

CLOC

Simple code metrics can be calculated with CLOC by running metrics.bat (Windows) or metrics.sh (Linux) in the repository root.

Column key

  • Language: The programming/markup language that CLOC automatically detected. Stats are broken down by language.
  • files: Number of matching files scanned
  • blank: Number of blank (whitespace-only) lines in matching files
  • comment: Number of comment lines in matching files
  • code: Number of code lines in matching files
  • scale: The multiplier used to determine third-generation equivalents, based on the language's conciseness. A more compact language will have a larger scale, indicating that more is achieved per-line.
  • 3rd gen. equiv: Equivalent number of third-generation code lines. This is often a more accurate representation of functionality than raw code line count, as some languages tend to do more than others in a single line. Personally, I find the accuracy of these values quite questionable, and the sources of the underlying gearing ratios are beyond shady. However, I've included them as a reminder that the efficiency of languages varies widely.

Interpreting the data

The most important columns are code and comment. Proper development places a moderate amount of emphasis on documentation comments. This is mostly necessarily for complex programming languages, and is often left out of simpler markup languages. For an object-oriented language like Java, there should be 33-66% as many comment lines as code lines. Any less, and the code is not modular enough, or documentation comments are being neglected. Any more, and developers are probably heavily commenting procedural code, which is unnecessary if the application is designed properly.

Blank lines are important for programming (not markup) languages, as they are used to group code. Too few blank lines indicates poor formatting, or developers who don't understand their own code. Indentation is another indicator of this, but it is not measured by CLOC. A language like Java should have roughly 20% as many blank lines as code lines. As more procedural complexity is introduced, this percentage may increase.

It's important that code be divided over a large number of files, as this indicates organization and modularity, especially with Java. It's typically unacceptable to have a Java file with more than a few hundred code lines. Large classes and methods should be avoided, as they indicate poor design.

Examples

2013-11-18 c80a6bfee19f9b1644dfd7f6cdc5b3bb127e06fc:

      17 text files.
      17 unique files.
     219 files ignored.

http://cloc.sourceforge.net v 1.60  T=0.08 s (156.1 files/s, 12710.6 lines/s)
-------------------------------------------------------------------------------
Language          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
Java                  7        89       216       403 x   1.36 =         548.08
Maven                 2        15         2       178 x   1.90 =         338.20
YAML                  1         0         0        32 x   0.90 =          28.80
Bourne Shell          1         7         0        32 x   3.81 =         121.92
DOS Batch             1         0         0         3 x   0.63 =           1.89
-------------------------------------------------------------------------------
SUM:                 12       111       218       648 x   1.60 =        1038.89
-------------------------------------------------------------------------------
Clone this wiki locally