Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce stress on "do not use a reference solution" #465

Merged
merged 4 commits into from
Jan 20, 2023
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 28 additions & 25 deletions content/authoring/guidelines/submission-tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,18 +27,42 @@ There are many kinds of kata, and some guidelines might simply not apply to some
- **Test cases should be independent**, they should not rely on any state passed from one test case to another.


## Reference Solution
## Fixed Tests

Fixed tests are tests with predetermined inputs and outputs, and do not change between test runs.

- **Each described requirement should have a corresponding fixed test.** Every aspect of the specification in the kata description should be explicitly tested with at least one, possibly more, dedicated and properly labeled assertion(s). Fixed tests are usually easier to debug, so ideally a fixed test should fail before the corresponding scenario is tested with random tests.
- **Tests should check the solution with edge cases and cases that require special handling within the context of the task.** For example, unless the kata description _explicitly_ states that such inputs do not need to be considered, empty arrays should be tested for problems involving arrays, arrays of lengths 0 and 1 for problems seeking pairs of values in arrays and empty strings for problems involving strings, etc.


## Random Tests

Some test suites require a reference solution to generate the expected value(s) and compare them to what the user's solution is returning. While such practice is common and sometimes necessary, there are things which have to be handled carefully:
Random tests are uncommon in "real life" coding and are somewhat specific to Codewars. They are required to reject invalid approaches based on input probing, hard-coding, and other workarounds. The goal of random tests is to make the expected return values and their order unpredictable so that only solutions that are actually solving the task may pass.

- **Avoid using a reference solution at all if possible.** Sometimes there's no other way, but quite often a reference solution is simply not necessary and this can avoid a lot of trouble. Test cases can effectively be generated with the answer known upfront, which eliminates an entire class of potential problems associated with the use of reference solutions, like the following:
- **Random tests should generate test cases for all scenarios** which cannot be completely tested with fixed tests. If necessary, build different kinds of random input generators. If a specific kind of input has a very low chance of occurring purely at random (e.g. generating a palindrome), it's better to build a specific random generator that can enforce this kind of input rather than rely on 1000 random tests and just pray for the specific case to come up. Sometimes it can be a good idea to keep one fully random generator, because it may generate cases you didn't think about.
- **Random tests should ensure that it's infeasible to pass tests by counting test cases.** Cases shouldn't be grouped by output type or behavior, especially if the expected output is a boolean variable (e.g. checking that some input satisfies some criteria), or when it comes to error checking (solution throwing an exception in some specific situations). The order of tested scenarios should be unpredictable. One possible way to achieve this is to generate and collect a set of random inputs for all required scenarios and shuffle them before the actual testing. If there are some fixed tests for particularly tricky scenarios which can be skipped by counting, they should be shuffled into the set of random inputs.
- **Keep the amount of random tests reasonable if random tests are not used to verify the performance of the user's solution.** Keep their amount as small as possible as long as a good coverage is still guaranteed: in some situations, only a bunch of inputs are actually testable, so no need to tests each of them 10 times - use randomized testing instead. Otherwise, 100 random tests are generally enough, or maybe less, depending on the task/situation.
- For some types of problems (for example, kata simulating chess, or grid puzzles like sudoku or nonograms) it may be difficult, or even impossible, to generate valid input configurations randomly. In such rare cases, it may be acceptable to use a predefined, hard-coded set of inputs (possibly with expected outputs too). Before the tests are run, the set of inputs should be shuffled or randomly sampled at each run to make the hard-coding of results more tedious for the user. If possible, some additional transformations can be randomly applied to the inputs if it can be easily accounted for in the result of the test (for example arrays can be reversed, game boards can be rotated, sides can be flipped, etc.)
- Some problems have a small set of possible inputs, and it's possible to easily enumerate their whole input domain. For this kind of tasks, it's not necessary to randomly generate inputs. It is allowed to pregenerate or hard-code all possible inputs, and before the tests are run, shuffle or randomly sample them at each run to render solutions which abuse order of inputs infeasible.
hobovsky marked this conversation as resolved.
Show resolved Hide resolved
- **Random tests should be run after fixed tests.** Not all testing frameworks allow for easy ordering of tests, but fixed tests should run, and eventually fail, before random tests.
- **Use appropriate random utilities available in your language**. Know how to use random number generators, how to randomly generate types of inputs you need, be it numbers in a range, large numbers, strings, big integers, floating-point values, collections, etc. Know how to do random sampling of collections, how to shuffle them, how to avoid duplicates. See this [obligatory XKCD comic](https://xkcd.com/221/) for how NOT to do random tests.
- **Difficulty should be consistent between test runs.** When ranges of random inputs are very large, it becomes possible for some users to receive many small, easy inputs while other users receive the exact opposite. Make sure that your random tests are built in a way that minimizes the chances for such a situation to occur. If you want to test difficult inputs, split the test cases into a set of easy ones and a set of difficult ones, and test them separately.
- **Make debugging of random tests easy if you need to rely on them to extensively check the correctness of the user's solution**, if it's not possible to build fixed tests in a way they'd be able to catch most of the holes in the logic of the user. Sometimes, when random tests fail, they are very difficult to debug because the input is very large and cannot be easily read. If necessary, split your random tests into two batches: one with small, debuggable inputs, and the other one with proper, large input values. Note that both parts should still contain all applicable scenarios.


## Reference Solution

Some test suites require a reference solution to generate the expected value(s) and compare them to what the user's solution is returning. While such practice is common and in many cases necessary, for some types of kata a reference solution can be avoided. Test cases can effectively be generated with the answer known upfront, which eliminates an entire class of potential problems associated with the use of reference solutions, like the following:
- Extra time spent on running a reference solution.
- Reference solution being accessible to users by mistake.
- Input mutation by the user solution which can affect the input passed to the reference solution, or make assertion messages confusing.
- Incorrect implementation of the reference solution leading to the rejection of valid users' solutions.
- **The reference solution, if used, does not have to be the same as the one in the "Reference Solution" snippet.** While the "Reference Solution" snippet serves its specific purpose and is [controlled by its own set of quality guidelines][authoring-guidelines-reference-solution], the reference solution used by performance tests can use a different, more efficient approach, to make sure that it does not consume too much of a time limit available for the user solution.

For problems which allow such approach, it is advised to try and build tests in a way which generates inputs with known answers and does not depend on a reference solution. However, if an author decides to use a reference solution in their tests, they should conform to guidelines below:

- **The reference solution should not be revealed to the user.** When an assertion fails or the test suite crashes, some testing frameworks print fragments of source code which caused the failure to the console. It may happen that such printed failure messages or stack traces expose information about the solution which should not be revealed, so the place where the expected solution is computed is not a trivial choice at all.
- **The reference solution shouldn't be accessible to the user solution.** It should not be possible to call the reference solution directly, or implement the user solution as an alias or wrapper around the reference solution. The reference solution should be completely inaccessible outside the submit tests. For some languages it's not a problem at all, but for some of them authors need to make additional effort to make the reference solution inaccessible for the user. Check the reference page and tutorials for [your language][languages] to see how to prevent this problem in your tests.
- **The reference solution, if used, does not have to be the same as the one in the "Reference Solution" snippet.** While the "Reference Solution" snippet serves its specific purpose and is [controlled by its own set of quality guidelines][authoring-guidelines-reference-solution], the reference solution used by performance tests can use a different, more efficient approach, to make sure that it does not consume too much of a time limit available for the user solution.


## Input mutation
Expand All @@ -53,27 +77,6 @@ Issues caused by input mutation are particularly difficult to deal with, because
- **Input which could be potentially modified by a user solution _must not_ be used afterwards.** It must not be used as an input for the reference solution, to compose diagnostic messages, or anything else. If necessary, a (deep) copy should be created and passed to the user solution.


## Fixed Tests

Fixed tests are tests with predetermined inputs and outputs, and do not change between test runs.

- **Each described requirement should have a corresponding fixed test.** Every aspect of the specification in the kata description should be explicitly tested with at least one, possibly more, dedicated and properly labeled assertion(s). Fixed tests are usually easier to debug, so ideally a fixed test should fail before the corresponding scenario is tested with random tests.
- **Tests should check the solution with edge cases and cases that require special handling within the context of the task.** For example, unless the kata description _explicitly_ states that such inputs do not need to be considered, empty arrays should be tested for problems involving arrays, arrays of lengths 0 and 1 for problems seeking pairs of values in arrays and empty strings for problems involving strings, etc.


## Random Tests

Random tests are uncommon in "real life" coding and are somewhat specific to Codewars. They are required to reject invalid approaches based on input probing, hard-coding, and other workarounds. The goal of random tests is to make the expected return values and their order unpredictable so that only solutions that are actually solving the task may pass.

- **Random tests should generate test cases for all scenarios** which cannot be completely tested with fixed tests. If necessary, build different kinds of random input generators. If a specific kind of input has a very low chance of occurring purely at random (e.g. generating a palindrome), it's better to build a specific random generator that can enforce this kind of input rather than rely on 1000 random tests and just pray for the specific case to come up. Sometimes it can be a good idea to keep one fully random generator, because it may generate cases you didn't think about.
- **Random tests should ensure that it's infeasible to pass tests by counting test cases.** Cases shouldn't be grouped by output type or behavior, especially if the expected output is a boolean variable (e.g. checking that some input satisfies some criteria), or when it comes to error checking (solution throwing an exception in some specific situations). The order of tested scenarios should be unpredictable. One possible way to achieve this is to generate and collect a set of random inputs for all required scenarios and shuffle them before the actual testing. If there are some fixed tests for particularly tricky scenarios which can be skipped by counting, they should be shuffled into the set of random inputs.
- **Keep the amount of random tests reasonable if random tests are not used to verify the performance of the user's solution.** Keep their amount as small as possible as long as a good coverage is still guaranteed: in some situations, only a bunch of inputs are actually testable, so no need to tests each of them 10 times - use randomized testing instead. Otherwise, 100 random tests are generally enough, or maybe less, depending on the task/situation.
- Under some rare circumstances, it is allowed to use so-called [**randomized tests**][randomized-tests] instead of fully random ones. For some types of problems (for example, kata simulating chess, or problems with a small set of possible inputs) it may be too complex or infeasible to generate inputs randomly. In such rare cases, it may be acceptable to use a predefined, hard-coded set of inputs (possibly with expected outputs too). Before the tests are run, the set of inputs should be shuffled or randomly sampled at each run to make the hard-coding of results more tedious for the user. If possible, some additional transformations can be randomly applied to the inputs if it can be easily accounted for in the result of the test (for example arrays can be reversed, game boards can be rotated, sides can be flipped, etc.)
- **Random tests should be run after fixed tests.** Not all testing frameworks allow for easy ordering of tests, but fixed tests should run, and eventually fail, before random tests.
- **Use appropriate random utilities available in your language**. Know how to use random number generators, how to randomly generate types of inputs you need, be it numbers in a range, large numbers, strings, big integers, floating-point values, collections, etc. Know how to do random sampling of collections, how to shuffle them, how to avoid duplicates. See this [obligatory XKCD comic](https://xkcd.com/221/) for how NOT to do random tests.
- **Difficulty should be consistent between test runs.** When ranges of random inputs are very large, it becomes possible for some users to receive many small, easy inputs while other users receive the exact opposite. Make sure that your random tests are built in a way that minimizes the chances for such a situation to occur. If you want to test difficult inputs, split the test cases into a set of easy ones and a set of difficult ones, and test them separately.
- **Make debugging of random tests easy if you need to rely on them to extensively check the correctness of the user's solution**, if it's not possible to build fixed tests in a way they'd be able to catch most of the holes in the logic of the user. Sometimes, when random tests fail, they are very difficult to debug because the input is very large and cannot be easily read. If necessary, split your random tests into two batches: one with small, debuggable inputs, and the other one with proper, large input values. Note that both parts should still contain all applicable scenarios.

## Performance Tests

Some kata require solutions to be fast enough. For example, the author may only wish to accept solutions completing in (sub-)linear time. Building such test suites is not an easy task!
Expand Down