Further test improvements #9

Fiwo735 · 2024-02-06T23:18:11Z

Improved Python testing, making it the default and recommended testing option.
Removed duplicated test case (same functionality as _example)
Updated AST diagram to the new version

Fiwo735 · 2024-02-06T23:18:54Z

@Jpnock please let me know if you have any further suggestions regarding testing.
I'd also appreciate if you took a look into the Github actions as I've updated that to use the new scripts/test.py.

Jpnock · 2024-02-07T03:11:35Z

Many thanks for updating the diagram : )

I've summarised the differences between the Python and bash script below which should probably be addressed. As for the GitHub Actions issue, this is likely due to there being no terminal TTY available so these code paths need disabling in CI.

Python test script differences

If make fails, the script continues to execute (there's no exit code check under both the normal and coverage paths)
There's no exit code check on make clean
The make clean stage doesn't use the "-C", PROJECT_LOCATION argument like all of the other make commands
ASAN_OPTIONS=exitcode=0 environment variable needs to be set when executing the student's compiler binary to prevent ASAN from forcing the test to fail if there's a memory leak etc.
While --silent is a nice addition, there's no output from make until it completes, so it looks like the script is stuck for about 10 seconds when you run it (I suggest a one line print to say that we're building the compiler)
The verbose output should be updated to match the output of the bash testing script to improve readability (see test_sh.log vs test_py_verbose.log)
There's no timeout on the C90 compilation stage to prevent one bad test hanging the rest (previously 15s I believe)
Subject to opinion but I think the verbose output should probably be the default, with a --short option to just show the progress bar (since the full output introduces students to where the appropriate log/assembly files are for each test without them having to add a flag to figure this out)

GitHub Actions issue

I assume a TTY doesn't exist in the actions runtime, so the following line likely returns a non zero exit code.

Regardless, the progress bar code should be disabled in GitHub Actions otherwise you get output that looks like test_py.log. I think we'd also want the full verbose output in the actions logs too.

    _, max_line_length = os.popen("stty size", "r").read().split()
ValueError: not enough values to unpack (expected 2, got 0)

Other notes

Have we got a plan for how we're going to update the marking code to fit in with this python script? My concern is that the marking script currently extends the existing test.sh bash script -- so if we start running this script in CI, we now have two scripts with potentially differing results for the number of seen tests passing.

At 400 LOC the Python script is much more daunting from the students' perspective to read and understand, so I suggest that we at least move the ProgressBar class to a different file.

test_sh.log

scripts/test.sh > test_sh.log

compiler_tests/_example/example.c
	> Pass

compiler_tests/array/declare_global.c
	> Failed to compile testcase: 
	 ./bin/output/array/declare_global/declare_global.compiler.stderr.log 
	 ./bin/output/array/declare_global/declare_global.compiler.stdout.log 
	 ./bin/output/array/declare_global/declare_global.s 
	 ./bin/output/array/declare_global/declare_global.s.printed

compiler_tests/array/declare_local.c
	> Failed to compile testcase: 
	 ./bin/output/array/declare_local/declare_local.compiler.stderr.log 
	 ./bin/output/array/declare_local/declare_local.compiler.stdout.log 
	 ./bin/output/array/declare_local/declare_local.s 
	 ./bin/output/array/declare_local/declare_local.s.printed

test_py.log

scripts/test.py > test_py.log

Running Tests [                                                                ]
Pass: 0 | Fail: 0 | Remaining: 87
See logs for more details (use -v for verbose output).
�[3A
Running Tests [                                                                ]
Pass:  0 | Fail:  0 | Remaining: 87 
See logs for more details (use -v for verbose output).
�[3A
Running Tests [�[92m#�[0m                                                               ]
Pass:  1 | Fail:  0 | Remaining: 86 
See logs for more details (use -v for verbose output).
�[3A
Running Tests [�[92m#�[0m�[91m#�[0m                                                              ]
Pass:  1 | Fail:  1 | Remaining: 85 
See logs for more details (use -v for verbose output).
�[3A
Running Tests [�[92m#�[0m�[91m#�[0m                                                              ]
Pass:  1 | Fail:  2 | Remaining: 84 
See logs for more details (use -v for verbose output).
�[3A
Running Tests [�[92m#�[0m�[91m#�[0m�[91m#�[0m                                                             ]
Pass:  1 | Fail:  3 | Remaining: 83 
See logs for more details (use -v for verbose output).
�[3A
Running Tests [�[92m#�[0m�[91m#�[0m�[91m#�[0m�[91m#�[0m                                                            ]
Pass:  1 | Fail:  4 | Remaining: 82 
See logs for more details (use -v for verbose output).
�[3A
Running Tests [�[92m#�[0m�[91m#�[0m�[91m#�[0m�[91m#�[0m�[91m#�[0m                                                           ]
Pass:  1 | Fail:  5 | Remaining: 81 
See logs for more details (use -v for verbose output).
�[3A

test_py_verbose.log

scripts/test.py -v > test_py_verbose.log`

Running Tests [                                                                ]
Pass: 0 | Fail: 0 | Remaining: 86
See logs for more details (use -v for verbose output).
�[3A
Running Tests [                                                                ]
�[92mPass:  0�[0m | �[91mFail:  0�[0m | Remaining: 86 
See logs for more details (use -v for verbose output).
/workspaces/langproc-2022-cw/compiler_tests/_example/example.c
	> Pass
/workspaces/langproc-2022-cw/compiler_tests/array/declare_global.c
	> Fail: see /workspaces/langproc-2022-cw/bin/output/array/declare_global/declare_global.compiler.stderr.log and /workspaces/langproc-2022-cw/bin/output/array/declare_global/declare_global.compiler.stdout.log
/workspaces/langproc-2022-cw/compiler_tests/array/declare_local.c
	> Fail: see /workspaces/langproc-2022-cw/bin/output/array/declare_local/declare_local.compiler.stderr.log and /workspaces/langproc-2022-cw/bin/output/array/declare_local/declare_local.compiler.stdout.log
/workspaces/langproc-2022-cw/compiler_tests/array/index_constant.c
	> Fail: see /workspaces/langproc-2022-cw/bin/output/array/index_constant/index_constant.compiler.stderr.log and /workspaces/langproc-2022-cw/bin/output/array/index_constant/index_constant.compiler.stdout.log
/workspaces/langproc-2022-cw/compiler_tests/array/index_expression.c
	> Fail: see /workspaces/langproc-2022-cw/bin/output/array/index_expression/index_expression.compiler.stderr.log and /workspaces/langproc-2022-cw/bin/output/array/index_expression/index_expression.compiler.stdout.log
/workspaces/langproc-2022-cw/compiler_tests/array/index_variable.c

Fiwo735 · 2024-02-07T04:05:16Z

Is there a point of doing a return code check on make clean given it's only using rm -f which cannot fail?

Jpnock · 2024-02-07T11:54:24Z

Is there a point of doing a return code check on make clean given it's only using rm -f which cannot fail?

rm -f can fail in a few cases, notably permissions errors (e.g. if someone accidentally runs the script as sudo or root and then later runs as non root)

root@host:~# cd /tmp
root@host:/tmp# mkdir test
root@host:/tmp# cd test/
root@host:/tmp/test# touch abc

normaluser@host:/tmp$ rm -rf test/
rm: cannot remove 'test/abc': Permission denied
normaluser@host:/tmp$ echo "$?"
1

…ndled

…oring when redirecting output

Fiwo735 · 2024-02-09T16:58:08Z

@Jpnock I believe all your suggestions have been addressed through the recent commits. Some remarks:

I have addressed, but haven't tested ASAN_OPTIONS and TTY ValueError
Splitting test.py - in my opinion, it should stay as one file as it is less intimidating that way and the program flow is not too difficult to trace for students who wish to dive deeper into Python (but noone realistically has to ever open that file for the coursework)

.github/workflows/integration.yml

Jpnock · 2024-02-14T02:07:40Z

Dockerfile

+# Install Python packages
+RUN pip install --no-cache-dir --upgrade pip && \
+    pip install --no-cache-dir colorama
+


Is there a way we can avoid this? I'm not the biggest fan of getting people to rebuild their container after they've already started. If needed, move to the end of the Dockerfile such that the rest of the layers will likely be cached, making rebuilding much shorter.

Additionally, we should likely be tracking any dependencies in a requirements.txt file

Good point. The reason for colorama is that it's a portable (hopefully covers all systems students might want to use) and light-weight way of printing in colour, much better than directly emitting the special exit codes (\033... etc.) and likely less confusing for those trying to understand test.py.

I'm slightly in favor of holding off with requirements.txt until we have more than 1 dependency - while it's obviously a better solution, for now the current way (hopefully) seems more straightforward for students.

compiler_tests/default/test_RETURN.c

docs/coverage.md

Jpnock · 2024-02-14T02:20:50Z

scripts/test.py

 import shutil
 import subprocess
 import queue
 from pathlib import Path
 from concurrent.futures import ThreadPoolExecutor, as_completed
+from typing import List, Optional
+from colorama import init, Fore


Is there a special reason for needing this library? If not then I think removing this simplifies a bunch of things and wouldn't require everyone to rebuild their docker image or manually install the library before their test script starts working again.

Not sure what it's for, so just speculating: If this is for trying to get this to work inside Actions CI, just check on sys.stdout.isatty() instead and don't initialise the progress bar if there's no tty.

[reason for colorama explained in other comment]
sys.stdout.isatty() sounds better that the current way with try... except in main() when constructing ProgressBar, but is it as safe in your opinion @Jpnock?

Jpnock · 2024-02-14T02:22:01Z

scripts/test.py

+            )
+        return e.returncode
+    except subprocess.TimeoutExpired as e:
+        print(f"{e.cmd} took more than {e.timeout}")


fail_testcase is not called here

What do you exactly mean here @Jpnock?

scripts/test.py

test.sh

Jpnock · 2024-02-14T02:44:10Z

@Jpnock I believe all your suggestions have been addressed through the recent commits. Some remarks:

Thanks for the changes :) I've replied with some follow up comments which are hopefully useful

Also, just a note on your previous point: but noone realistically has to ever open that file for the coursework. Last year I was asked by quite a lot of people to walk them through the testing script, so even if nobody has to open the file, lots of people definitely did try to understand it -- so we should bear this in mind. Hiding parts away like the progress bar and junit CI test file creation in separate files seems like a good idea since you don't need to understand those at all to understand how the main part of the script works.

Fiwo735 · 2024-02-14T06:21:57Z

Thanks for the suggestions! I've left 3 comments unresolved - please have a look at them.

Re: splitting test.py - I'm still slightly more on the opinion that 1 file is less scary than 2 (or preferably even more as just throwing everything non-crucial into a helper file is not the best practice) and makes it look like there is not that much Python involved in this coursework = less intimidating. But that's not a very strong opinion, so I'm gonna check what John thinks about that.

Fiwo735 added 3 commits February 6, 2024 23:14

Improved Python testing and updated docs and coverage accordingly

841ccb0

Removed duplicated test case (through _example)

e5db25c

Updated example AST tree diagram

ed54a86

Fiwo735 requested a review from Jpnock February 6, 2024 23:19

Fiwo735 self-assigned this Feb 6, 2024

Fiwo735 added 2 commits February 7, 2024 03:29

make clean uses -C flag

0f8c958

Implemented timeout and return code check

941ce83

Jpnock mentioned this pull request Feb 7, 2024

Missing / in test.sh #10

Closed

Jpnock referenced this pull request Feb 7, 2024

Revert to test.sh script while test.py tweaks are being resolved

5d02258

Fiwo735 added 7 commits February 9, 2024 04:53

Exit code and timeout check for make clean

db65985

run_subprocess implemented to reduce code repetition

c3fa6d5

Made verbose default, improved how short (new verbose opposite) is ha…

7a28b6b

…ndled

Colored text using colorama to be cross-platform and auto disable col…

4ad295f

…oring when redirecting output

Changed ASAN_OPTIONS in env

502d10b

Added missing typing support and docs

157d910

Added exception handling for TTY progress bar error

564cb8a

Jpnock reviewed Feb 14, 2024

View reviewed changes

Fiwo735 added 6 commits February 14, 2024 05:31

Made all code snippets consistent in docs

be3e19c

Adjusted test.sh

6cb1a03

Rearranged colorama install to exploit docker caching

fffed3d

Added colorama install to .yml

d00e3f8

Added comments and Windows terminal fix for colorama

68379ed

Sanitized group names for test checking

4b7bbb0

Fiwo735 merged commit eed378d into main Feb 21, 2024

Fiwo735 mentioned this pull request Feb 21, 2024

Revert "Further test improvements" #13

Merged

Jpnock mentioned this pull request Feb 21, 2024

Further test improvements (2) #18

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further test improvements #9

Further test improvements #9

Fiwo735 commented Feb 6, 2024 •

edited

Loading

Fiwo735 commented Feb 6, 2024 •

edited

Loading

Jpnock commented Feb 7, 2024 •

edited

Loading

Fiwo735 commented Feb 7, 2024

Jpnock commented Feb 7, 2024

Fiwo735 commented Feb 9, 2024

Jpnock Feb 14, 2024

Fiwo735 Feb 14, 2024

Fiwo735 Feb 14, 2024

Jpnock Feb 14, 2024 •

edited

Loading

Fiwo735 Feb 14, 2024

Jpnock Feb 14, 2024

Fiwo735 Feb 14, 2024 •

edited

Loading

Jpnock commented Feb 14, 2024

Fiwo735 commented Feb 14, 2024

Further test improvements #9

Further test improvements #9

Conversation

Fiwo735 commented Feb 6, 2024 • edited Loading

Fiwo735 commented Feb 6, 2024 • edited Loading

Jpnock commented Feb 7, 2024 • edited Loading

Python test script differences

GitHub Actions issue

Other notes

test_sh.log

test_py.log

test_py_verbose.log

Fiwo735 commented Feb 7, 2024

Jpnock commented Feb 7, 2024

Fiwo735 commented Feb 9, 2024

Jpnock Feb 14, 2024

Choose a reason for hiding this comment

Fiwo735 Feb 14, 2024

Choose a reason for hiding this comment

Fiwo735 Feb 14, 2024

Choose a reason for hiding this comment

Jpnock Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

Fiwo735 Feb 14, 2024

Choose a reason for hiding this comment

Jpnock Feb 14, 2024

Choose a reason for hiding this comment

Fiwo735 Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

Jpnock commented Feb 14, 2024

Fiwo735 commented Feb 14, 2024

Fiwo735 commented Feb 6, 2024 •

edited

Loading

Fiwo735 commented Feb 6, 2024 •

edited

Loading

Jpnock commented Feb 7, 2024 •

edited

Loading

Jpnock Feb 14, 2024 •

edited

Loading

Fiwo735 Feb 14, 2024 •

edited

Loading