Completeness

For the analysis performed by Oak, we measure completeness with respect to the generated output. To do so, we approach both measurements with different metrics.

Measurement

In order to measure completeness, we compare the outcome of found string literals to those who are expected to be part of the output: A string literal is called an output candidate if there exists an execution path from at least one entry point to an output-generating statement (e.g., echo or print) with the result that this literal is part of the program’s output.

Construct Validity

In absence of ground truth for string literal coverage, we use a heuristic in order to determine whether or not a string literal is an output candidate. The heuristic considers the following properties of a prospective output candidate to contain at least an less-than (<) or a greater-than character (>) or both.

For this heuristic we manually evaluated precision and recall from a sample of 400 string literals of the PHP corpus. For our corpus of PHP systems precision was ~94% and recall was ~51%. Thus, our analysis in precise in that we have ~ 6% responsive false positives and that half of the relevant output actually responds to our heuristic. We attempted to further optimize the heuristic and reached ~70% recall, nevertheless the results became rather imprecise.

We hence expect the symbolic interpreter to only generate output, that does not have more than 50% non-responsive output, this was the case for any system we analysed so far.

Reach Coverage / Output coverage

In order to measure how good our approximated symbolic output is, we measure its similarity to the expected output (all output candidates). This is addressed by two different metrics, reach coverage and output coverage.

Reach coverage describes the ratio of output candidates (strings responding to our heuristic) that are reached by the symbolic execution engine, for instance, if an expression is evaluated, and the total number of output candidates in the system.

Output coverage describes the ratio of output candidates that are part of the symbolic output and the total number of output candidates in the system. By definition, the output coverage is not greater than the reach coverage. The ratio of output and reach coverage indicates the loss of reached ("touched") string literals and the actual output (coverage loss).

function button() {
   echo "<form>";
   echo "<button label='Click me' />";
   echo "</form>";
}

$title = "<h1>Wiki page</h1>";
$title_ = unknown_function($title);

$subtitle = "<h2>Sub</h2>";

echo $title_ . $subtitle; // Symbol[unknown_function()] . "<h2>Sub</h2>"

Considering the example snippet above, we can see that

the function unknown_function() is never called, thus its output is dead code
variable $title is passed to an unknown function and its return value consequently will be symbolic ($title_)
variable subtitle will be part of the output. So, we have five output candidates of which two will actually be reached during the execution and only one will be part of the output. Hence, for this snippet the reach coverage is 2/5 or 40 % and the output coverage is 1/5 or 10 %.

Analysis results

We evaluated reach and output coverage respectively for different PHP systems, the results can be found here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Completeness

Measurement

Construct Validity

Reach Coverage / Output coverage

Analysis results

Clone this wiki locally