2023 02 28 enu vs test results

Enumeration vs. Test by results

Note: we ultimately didn’t go there

Research questions

RQ1: Is enumeration better or worse than testing at catching instabilities in practice?

(I guess, this is the question I had in mind for some time.) See next section.

RQ2: Can tests be used to improve enumeration procedure?

This may be handled via some kind of sampling that we mentioned last time. Although we discussed sampling from a larger space of all types “in the program”.

It’s a bit hard to imagine how to realize this.

RQ3: Can enumeration procedure be used to improve tests?

At least, we can report that tests were good but we found some counterexamples or dark corners (existentials/Any), and be done with it: the user should decide if the tests should be extended.

Not exactly this, but close: enumeration can become a part of the test suite. I already developed several utilities to that end. The user will be able to assert stability of methods (with possible exceptions).

RQ1: enumeration vs tests

We have two possibilities for tests-based analysis:

all runs are stable or
there was an unstable run.

And three possibilities for enumeration-based analysis:

stable / unstable / partial

(the procedure would say “Partial”, meaning partially covered search space: some generic types we couldn’t process, in this case).

One disagreement is (ought to be) impossible:

“tst unstable && enu stable”

Both possible agreements are not particularly interesting:

“tst unstable && enu unstable”
Here, tests contained a counterexample that we found using the heavy machinery. (Perhaps, we found some other counterexample.)
“tst stable && enu stable”
No counterexample is possible, so tests didn’t find any.

Here are the disagreements that may look interesting:

“tst unstable && enu partial”
Tests can flag instability that the enumeration procedure cannot. If not many, that’s good: it means we’re no worse than tests. If many, we should perhaps accept the view that “Partial” probably means “unstable, most likely”.
“tst stable && enu unstable”
This may be because either
- tests are incomplete (good!)
- we enumerated some weird types that the user doesn’t care about (bad)
“tst stable && enu partial”
Hard to say. Maybe tests are incomplete, but we can’t find a counterexample. Or code is generic but tests go only in relevant cases. Overall, similar to (2). And partial does look similar to unstable in how I think about it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly