-
Notifications
You must be signed in to change notification settings - Fork 2
Completeness
For the analysis performed by Oak, we measure completeness with respect to the generated output. To do so, we approach both measurements with different metrics.
In order to measure completeness, we compare the outcome of found string literals to those who are expected to be part of the output: A string literal is called an output candidate if there exists an execution path from at least one entry point to an output-generating statement (e.g., echo
or print
) with the result that this literal is part of the program’s output.
In absence of ground truth for string literal coverage, we use a heuristic in order to determine whether or not a string literal is an output candidate. The heuristic considers the following properties of a prospective output candidate to contain at least an less-than (<
) or a greater-than character (>
) or both.
For this heuristic we manually evaluated precision and recall from a sample of 400 string literals of the PHP corpus. For our corpus of PHP systems precision was ~94% and recall was ~51%. Thus, our analysis in precise in that we have ~ 6% responsive false positives and that half of the relevant output actually responds to our heuristic. We attempted to further optimize the heuristic and reached ~70% recall, nevertheless the results became rather imprecise.
We hence expect the symbolic interpreter to only generate output, that does not have more than 50% non-responsive output, this was the case for any system we analysed so far.
In order to measure how good our approximated symbolic output is, we measure its similarity to the expected output (all output candidates). This is addressed by two different metrics, reach coverage and output coverage.
Reach coverage describes the ratio of output candidates (strings responding to our heuristic) that are reached by the symbolic execution engine, for instance, if an expression is evaluated, and the total number of output candidates in the system.
Output coverage describes the ratio of output candidates that are part of the symbolic output and the total number of output candidates in the system. By definition, the output coverage is not greater than the reach coverage. The ratio of output and reach coverage indicates the loss of reached ("touched") string literals and the actual output (coverage loss).
function button() {
echo "<form>";
echo "<button label='Click me' />";
echo "</form>";
}
$title = "<h1>Wiki page</h1>";
$title_ = unknown_function($title);
$subtitle = "<h2>Sub</h2>";
echo $title_ . $subtitle; // Symbol[unknown_function()] . "<h2>Sub</h2>"
Considering the example snippet above, we can see that
- the function
unknown_function()
is never called, thus its output is dead code - variable
$title
is passed to an unknown function and its return value consequently will be symbolic ($title_
) - variable
subtitle
will be part of the output. So, we have five output candidates of which two will actually be reached during the execution and only one will be part of the output. Hence, for this snippet the reach coverage is 2/5 or 40 % and the output coverage is 1/5 or 10 %.
We evaluated reach and output coverage respectively for different PHP systems, the results can be found here.