Skip to content

2023 02 28 enu vs test results

Artem Pelenitsyn edited this page Mar 1, 2023 · 2 revisions

Enumeration vs. Test by results

Note: we ultimately didn’t go there

Research questions

RQ1: Is enumeration better or worse than testing at catching instabilities in practice?

(I guess, this is the question I had in mind for some time.) See next section.

RQ2: Can tests be used to improve enumeration procedure?

This may be handled via some kind of sampling that we mentioned last time. Although we discussed sampling from a larger space of all types “in the program”.

It’s a bit hard to imagine how to realize this.

RQ3: Can enumeration procedure be used to improve tests?

At least, we can report that tests were good but we found some counterexamples or dark corners (existentials/Any), and be done with it: the user should decide if the tests should be extended.

Not exactly this, but close: enumeration can become a part of the test suite. I already developed several utilities to that end. The user will be able to assert stability of methods (with possible exceptions).

RQ1: enumeration vs tests

We have two possibilities for tests-based analysis:

  • all runs are stable or
  • there was an unstable run.

And three possibilities for enumeration-based analysis:

  • stable / unstable / partial

(the procedure would say “Partial”, meaning partially covered search space: some generic types we couldn’t process, in this case).

One disagreement is (ought to be) impossible:

  • “tst unstable && enu stable”

Both possible agreements are not particularly interesting:

  • “tst unstable && enu unstable”

    Here, tests contained a counterexample that we found using the heavy machinery. (Perhaps, we found some other counterexample.)

  • “tst stable && enu stable”

    No counterexample is possible, so tests didn’t find any.

Here are the disagreements that may look interesting:

  1. “tst unstable && enu partial”

    Tests can flag instability that the enumeration procedure cannot. If not many, that’s good: it means we’re no worse than tests. If many, we should perhaps accept the view that “Partial” probably means “unstable, most likely”.

  2. “tst stable && enu unstable”

    This may be because either

    • tests are incomplete (good!)
    • we enumerated some weird types that the user doesn’t care about (bad)
  3. “tst stable && enu partial”

    Hard to say. Maybe tests are incomplete, but we can’t find a counterexample. Or code is generic but tests go only in relevant cases. Overall, similar to (2). And partial does look similar to unstable in how I think about it.